【MIT6.S081】xv6进程调度分析

发表于 2024-11-16 分类于操作系统标签 MIT6-S081 , OS , xv6 Disqus：本文字数： 1.1k 阅读时长 ≈ 4 分钟

在做thread lab的时候，阅读xv6的源码后对于进程调度的实现有了大致的了解，但是其中锁的获取与释放顺序让我困惑了好久：在yield函数中，不是先获取了进程p的锁吗，那么之后在调度器中又获取p的锁，那不是会死锁吗？在调度器内使用swtch发生进程切换后，又会跳转到哪里？

而在我观摩大佬的一些博客和视频后，发现我之前的想法有很大的问题，归根结底是没有弄明白xv6何时发生了切换，切换后应该从哪里开始运行。这篇笔记就是对于分析xv6进程调度的总结。

在使用allocproc创建进程时，会为创建的proc设置ra的值，初始会设置为forkret函数的地址。

static struct proc*
allocproc(void)
{
  ...
  memset(&p->context, 0, sizeof(p->context));
  p->context.ra = (uint64)forkret;
  p->context.sp = p->kstack + PGSIZE;

  return p;
}

在调度器scheduler开始运行后，遍历进程表，先对遍历到的当前进程p上锁，如果这个进程是可运行的（RUNNABLE），则执行swtch函数切换上下文，保存当前cpu的上下文，同时加载p的上下文到寄存器中。

执行完swtch后会跳转到ra寄存器所保存的地址，也就是这个进程初始设置的forkret的位置，这个函数中会释放在scheduler中对进程p上的锁。如果这个函数是第一次执行，那么会初始化文件系统。之后执行usertrapret函数。

// A fork child's very first scheduling by scheduler()
// will swtch to forkret.
void
forkret(void)
{
  static int first = 1;

  // Still holding p->lock from scheduler.
  release(&myproc()->lock);

  if (first) {
    // File system initialization must be run in the context of a
    // regular process (e.g., because it calls sleep), and thus cannot
    // be run from main().
    fsinit(ROOTDEV);

    first = 0;
    // ensure other cores see first=0.
    __sync_synchronize();
  }

  usertrapret();
}

usertrapret中，将会关闭中断，最后这会让当前进程从内核态返回到用户态（回到用户空间），切换用户页表并恢复用户态寄存器。

//
// return to user space
//
void
usertrapret(void)
{
  ...
  // we're about to switch the destination of traps from
  // kerneltrap() to usertrap(), so turn off interrupts until
  // we're back in user space, where usertrap() is correct.
  intr_off();

  ...

  // jump to userret in trampoline.S at the top of memory, which 
  // switches to the user page table, restores user registers,
  // and switches to user mode with sret.
  uint64 trampoline_userret = TRAMPOLINE + (userret - trampoline);
  ((void (*)(uint64))trampoline_userret)(satp);
}

在之后如果该进程会触发中断（例如时间片轮转调度中执行时间到了后触发时钟中断）或者使用了系统调用，将会执行usertrap函数（也即处理陷入），在其中会执行yield函数来让该进程让出cpu，并进行进程切换。

//
// handle an interrupt, exception, or system call from user space.
// called from trampoline.S
//
void
usertrap(void)
{
  ...
  
  if(r_scause() == 8){
    // system call
    ...
  } else if((which_dev = devintr()) != 0){
    // ok
  } else {
    printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
    printf("            sepc=%p stval=%p\n", r_sepc(), r_stval());
    setkilled(p);
  }

  if(killed(p))
    exit(-1);

  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2)
    yield();

  usertrapret();
}

yield函数会先对当前进程p上锁，然后执行shed函数，shed函数会先进行一些检查，然后调用swtch函数进行上下文切换，其保存p的上下文，并加载之前cpu的上下文。这样，ra寄存器中的值就变成了scheduler中的swtch所在地址的后一条指令的地址。在schduler中，重置当前cpu所运行的进程后，释放进程p的锁，之后开启新一轮调度（遍历进程表找到下一个可运行的进程）。

void
sched(void)
{
  int intena;
  struct proc *p = myproc();

  if(!holding(&p->lock))
    panic("sched p->lock");
  if(mycpu()->noff != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched running");
  if(intr_get())
    panic("sched interruptible");

  intena = mycpu()->intena;
  swtch(&p->context, &mycpu()->context);
  mycpu()->intena = intena;
}

// Give up the CPU for one scheduling round.
void
yield(void)
{
  struct proc *p = myproc();
  acquire(&p->lock);
  p->state = RUNNABLE;
  sched();
  release(&p->lock);
}

这里yield中使用了acquire和release函数来进行锁的操作，但是这个锁的获取与释放时机并不是如同代码中的这样是一个“顺序”的过程。

当执行顺序为从调度器选择可调度进程（RUNNABLE）后进行进程切换：

当执行顺序为进程触发中断等陷入时，放弃cpu资源，需要执行进程调度时：

可以看到，在yield函数中，进程p的锁的使用顺序并不是acquire() 获取锁-> 执行shed()函数 -> release()释放锁这样线性的，在其中会发生进程切换因此锁的“获取、释放”逻辑并不发生在一个函数中。