Linux内核分析：深入理解进程切换

标签：task mm x8 next switch 切换 Linux prev 内核

我们知道进程切换就是变更进程上下文，而实现上下文切换的函数就是context_switch函数，该函数为kernel/sched/core.c文件中，代码如下：

/*
 * context_switch - switch to the new MM and the new thread's register state.
 */
static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
           struct task_struct *next, struct rq_flags *rf)
{
    prepare_task_switch(rq, prev, next);

    /*
     * For paravirt, this is coupled with an exit in switch_to to
     * combine the page table reload and the switch backend into
     * one hypercall.
     */
    arch_start_context_switch(prev);

    /*
     * kernel -> kernel   lazy + transfer active
     *   user -> kernel   lazy + mmgrab() active
     *
     * kernel ->   user   switch + mmdrop() active
     *   user ->   user   switch
     */
    if (!next->mm) {                                // to kernel
        enter_lazy_tlb(prev->active_mm, next);

        next->active_mm = prev->active_mm;
        if (prev->mm)                           // from user
            mmgrab(prev->active_mm);
        else
            prev->active_mm = NULL;
    } else {                                        // to user
        membarrier_switch_mm(rq, prev->active_mm, next->mm);
        /*
         * sys_membarrier() requires an smp_mb() between setting
         * rq->curr / membarrier_switch_mm() and returning to userspace.
         *
         * The below provides this either through switch_mm(), or in
         * case 'prev->active_mm == next->mm' through
         * finish_task_switch()'s mmdrop().
         */
        switch_mm_irqs_off(prev->active_mm, next->mm, next);

        if (!prev->mm) {                        // from kernel
            /* will mmdrop() in finish_task_switch(). */
            rq->prev_mm = prev->active_mm;
            prev->active_mm = NULL;
        }
    }

    rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);

    prepare_lock_switch(rq, next, rf);

    /* Here we just switch the register state and the stack. */
    switch_to(prev, next, prev);
    barrier();

    return finish_task_switch(prev);
}

对其中的一些关键函数进行分析

1.prepare_task_switch(rq, prev, next)

　　参数说明：

- 　　rq：指向运行队列（runqueue）的指针，表示当前的调度器正在管理的运行队列。
- 　　prev：指向前一个任务（previous task）的指针，表示当前正在运行的任务。
- 　　next：指向下一个任务（next task）的指针，表示即将要运行的任务。

　　该函数的作用就是为下一次任务切换做好准备工作，以确保系统可以顺利地进行任务切换，并保证各个任务能够正确地执行。

2.arch_start_context_switch(prev)

　　函数arch_start_context_switch(prev)的作用是为任务切换做准备工作，特别是在使用虚拟化技术（paravirt）的情况下，这个函数会和switch_to函数中的一个exit操作一起，将页表重新加载和切换后端结合成一个超级调用（hypercall）。

3.进程空间切换

　　接下来的一段代码主要是实现在任务切换时，处理当前任务和即将要执行的任务之间的页表切换和内存管理的操作。以确保在任务切换时系统能够正确地管理内存和页表，并且保持硬件中断的正确处理。

　　如果下一个任务（next）没有关联进程地址空间（mm）（即为内核任务），则将当前任务（prev）的活动进程地址空间（active_mm）中的页表信息延迟到TLB（Translation Lookaside Buffer）失效时再处理（enter_lazy_tlb），并将下一个任务（next）的活动进程地址空间（active_mm）设置为当前任务（prev）的活动进程地址空间（prev->active_mm）。如果当前任务（prev）也是进程任务，则增加活动进程地址空间（prev->active_mm）的引用计数（mmgrab），否则将其设置为NULL。

　　如果下一个任务（next）有关联的进程地址空间（mm）（即为用户任务），则调用membarrier_switch_mm函数切换内存映射，并确保在此过程中适当的同步，然后调用switch_mm_irqs_off函数切换页表和处理与硬件中断相关的操作。如果当前任务（prev）没有关联的进程地址空间（即为内核任务），则将其活动进程地址空间（prev->active_mm）设置为NULL，并在finish_task_switch函数中减少其引用计数（mmdrop）。

4. rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP)

　　这段代码的作用就是清除rq（运行队列）的时钟更新标志。

5. prepare_lock_switch(rq, next, rf);

该函数的作用是为任务切换做一些准备工作，具体来说，这个函数会设置运行队列的标志，以确保在任务切换期间不会出现竞争条件。在进行任务切换时，必须对运行队列上的锁进行获取，以确保在多个CPU上同时进行任务切换时不会出现竞争条件。

6.switch_to(prev, next, prev);

此函数是功能实现的主要函数，包括切换寄存器状态和内核堆栈。该函数会进一步调用__switch_to_asm，对应的汇编代码实现会根据具体的体系结构有不同实现。

X86_64体系结构

/*
 * %rdi: prev task
 * %rsi: next task
 */
ENTRY(__switch_to_asm)
    UNWIND_HINT_FUNC
    /*
     * Save callee-saved registers
     * This must match the order in inactive_task_frame
     */
    pushq    %rbp
    pushq    %rbx
    pushq    %r12
    pushq    %r13
    pushq    %r14
    pushq    %r15

    /* switch stack */
    movq    %rsp, TASK_threadsp(%rdi)
    movq    TASK_threadsp(%rsi), %rsp

#ifdef CONFIG_STACKPROTECTOR
    movq    TASK_stack_canary(%rsi), %rbx
    movq    %rbx, PER_CPU_VAR(fixed_percpu_data) + stack_canary_offset
#endif

#ifdef CONFIG_RETPOLINE
    /*
     * When switching from a shallower to a deeper call stack
     * the RSB may either underflow or use entries populated
     * with userspace addresses. On CPUs where those concerns
     * exist, overwrite the RSB with entries which capture
     * speculative execution to prevent attack.
     */
    FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
#endif

    /* restore callee-saved registers */
    popq    %r15
    popq    %r14
    popq    %r13
    popq    %r12
    popq    %rbx
    popq    %rbp

    jmp    __switch_to
END(__switch_to_asm)

代码中有内核堆栈栈顶指针RSP寄存器的切换，有jmp __switch_to，但是没有thread.ip及标号1的位置。__switch_to_asm是在C代码中调用的，也就是使用call指令，而这段汇编的结尾是jmp __switch_to，__switch_to函数是C代码最后有个return，也就是ret指令。将__switch_to_asm和__switch_to结合起来，正好是call指令和ret指令的配对出现。call指令压栈RIP寄存器到进程切换前的prev进程内核堆栈；而ret指令出栈存入RIP寄存器的是进程切换之后的next进程的内核堆栈栈顶数据。

ARM64体系结构

ARM64体系结构下通用代码部分与64位X86体系结构完全相同,重点放在swtich_to在ARM64体系结构下的具体实现代码上

/*
 * Thread switching.
 */
__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
                struct task_struct *next)
{
    struct task_struct *last;

    fpsimd_thread_switch(next);
    tls_thread_switch(next);
    hw_breakpoint_thread_switch(next);
    contextidr_thread_switch(next);
    entry_task_switch(next);
    uao_thread_switch(next);
    ptrauth_thread_switch(next);
    ssbs_thread_switch(next);

    /*
     * Complete any pending TLB or cache maintenance on this CPU in case
     * the thread migrates to a different CPU.
     * This full barrier is also required by the membarrier system
     * call.
     */
    dsb(ish);

    /* the actual thread switch */
    last = cpu_switch_to(prev, next);

    return last;
}

ENTRY(cpu_switch_to)
  mov  x10, #THREAD_CPU_CONTEXT  // 寄存器x10存放thread.cpu_context偏移，与进程task_struct地址相加后即可获得该进程的cpu_context
  add  x8, x0, x10               // x0与偏移量相加后存入x8，获取旧进程cpu_context的地址
  mov  x9, sp                    // 将栈顶sp存入x9，以备后续保存

  // 保存x19~x28寄存器的值，每条指令执行完毕后x8的值会自动+16，以便保存后续寄存器值
  stp  x19, x20, [x8], #16
  stp  x21, x22, [x8], #16
  stp  x23, x24, [x8], #16
  stp  x25, x26, [x8], #16
  stp  x27, x28, [x8], #16

  stp  x29, x9, [x8], #16        // 保存x29(栈基址)与x9(栈顶sp)
  str  lr, [x8]                  // 保存寄存器LR，该寄存器存放了cpu_switch_to函数的返回地址

  add  x8, x1, x10               // x1与偏移量相加后存入x8，获取新进程cpu_context的地址

  // 恢复x19~x28寄存器的值
  ldp  x19, x20, [x8], #16
  ldp  x21, x22, [x8], #16
  ldp  x23, x24, [x8], #16
  ldp  x25, x26, [x8], #16
  ldp  x27, x28, [x8], #16

  ldp  x29, x9, [x8], #16        // 恢复x29(栈基址)与x9(栈顶sp)
  ldr  lr, [x8]                  // 恢复寄存器LR，这样函数cpu_switch_to返回后就会从新进程上次被中断的位置处继续执行
  mov  sp, x9                    // 从x9处恢复sp的值
  msr  sp_el0, x1                // 将新进程进程task_struct地址放入sp_el0
  ret
ENDPROC(cpu_switch_to)
NOKPROBE(cpu_switch_to)

arm64结构下 x86_64 基本差不多，也是通过函数调用堆栈来完成指令指针寄存器RIP的切换。

标签：task,mm,x8,next,switch,切换,Linux,prev,内核
From： https://www.cnblogs.com/hunter-chen/p/17357693.html