pick_next_task_fair是CFS调度类中选择next任务的主要路径,其主要功能是从当前CPU的就绪队列cfs_rq中选出一个可运行的任务作为"next任务",并将前一个任务prev重新放到就绪队列。
下面是这段代码框架流程解读。
1 判断rq->cfs.nr_running>0?如果不满足说明没有可运行任务则goto idle
2 从最顶层的就绪队列rq->cfs逐层向下遍历,(1)检查每一层的cfs_rq是否throttle,如果throttle则判断是否还有可运行任务,若无则goto idle; (2) pick_next_entity(cfs_rq, curr)选择对应的调度实体,直到叶子节点task(se->my_q为空)
3 经过第2)就选好了下一个要运行的task,接下来就要将prev task从CPU上”put”卸载下来;同时将next task从就绪队列set到curr。
这里要考虑几个因素:(1) prev task从CPU卸载下来时,它的上层级se,如parent是否也要卸载?(2) next task从就绪队列set为curr时,它的上层se,如parent是否也要同样set? (3) 如果prev task和next task在同一个cgroup控制组,即它们都parent是同一个se,情况如何呢?
3.1 从prev和next task的叶子节点开始向上层级遍历(用pse和se分别代表prev和next的调度实体),直到prev和next处于同一层级,即 pse->cfs_rq == se->cfs_rq
3.2 在每次遍历过程中判断pse和se的层级深度,如果(pse->depth >= se->depth)则先put_prev_entity(pse)然后prev往parent上升一级pse = pse->parent;
相反,如果(se->depth >= pse->depth)则先set_next_entity(se)然后se往parent上升一级se = se->parent;
3.3 最终,se和pse处于同一个层级,同为sibling
下面是代码和注释,代码基于开源linux-5.10:
struct task_struct * pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { struct cfs_rq *cfs_rq = &rq->cfs; //此rq的顶层cfs_rq队列 struct sched_entity *se; struct task_struct *p; int new_tasks; again: if (!sched_fair_runnable(rq)) //rq->cfs.nr_running>0; 思考:rq->cfs.nr_running和rq->nr_running孰大孰小? goto idle; #ifdef CONFIG_FAIR_GROUP_SCHED if (!prev || prev->sched_class != &fair_sched_class) goto simple; /* * Because of the set_next_buddy() in dequeue_task_fair() it is rather * likely that a next task is from the same cgroup as the current. * * Therefore attempt to avoid putting and setting the entire cgroup * hierarchy, only change the part that actually changes. */ //由于set_next_buddy() in dequeue_task_fair() 很可能将next和prev在同一个cgroup,因此尽量只针对修改过的部分,而非调整整个整个hierarchy do { //从最顶层的cfs_rq开始遍历,直到层级的叶子节点se,即真正的task(判断叶子节点方法:se->my_q为NULL) struct sched_entity *curr = cfs_rq->curr; /* * Since we got here without doing put_prev_entity() we also * have to consider cfs_rq->curr. If it is still a runnable * entity, update_curr() will update its vruntime, otherwise * forget we've ever seen it. */ if (curr) { if (curr->on_rq) update_curr(cfs_rq); else curr = NULL; /* * This call to check_cfs_rq_runtime() will do the * throttle and dequeue its entity in the parent(s). * Therefore the nr_running test will indeed * be correct. */ if (unlikely(check_cfs_rq_runtime(cfs_rq))) { //【0】 cfs_rq = &rq->cfs; if (!cfs_rq->nr_running) goto idle; goto simple; } } se = pick_next_entity(cfs_rq, curr); //【1】从cfs_rq选一个next se,具体算法:todo cfs_rq = group_cfs_rq(se); //se->my_q } while (cfs_rq); p = task_of(se); /* * Since we haven't yet done put_prev_entity and if the selected task * is a different task than we started out with, try and touch the * least amount of cfs_rqs. */ if (prev != p) { struct sched_entity *pse = &prev->se; //从prev和se的叶子节点不断向上level调整层级,直到二者的se->cfs_rq一致,即同一个level为止 while (!(cfs_rq = is_same_group(se, pse))) { //循环对prev和next se做put或set,直到它们都在同一个level int se_depth = se->depth; //如何判断它们都层级深浅呢?通过se->depth来判定的。 int pse_depth = pse->depth; if (se_depth <= pse_depth) { //如果se<=prev的层级,则put prev,且prev往parent生一个level,最终目的是要和se保持同一个level put_prev_entity(cfs_rq_of(pse), pse); pse = parent_entity(pse); } if (se_depth >= pse_depth) { set_next_entity(cfs_rq_of(se), se); se = parent_entity(se); } } //到此二者的层级一致,此层级下面的层级都已经更新 // 这样se->parent的状态和se的状态保持一致,例如se->statistics.wait_start都会清0 //另外,如果se和pse在同一个group,会发生什么?pse是等待状态,但是pse->parent和se->parent是同一个 put_prev_entity(cfs_rq, pse); //【2】 set_next_entity(cfs_rq, se); //【3】 } goto done; simple: #endif if (prev) put_prev_task(rq, prev); do { se = pick_next_entity(cfs_rq, NULL); set_next_entity(cfs_rq, se); cfs_rq = group_cfs_rq(se); } while (cfs_rq); p = task_of(se); done: __maybe_unused; #ifdef CONFIG_SMP /* * Move the next running task to the front of * the list, so our cfs_tasks list becomes MRU * one. */ list_move(&p->se.group_node, &rq->cfs_tasks); #endif if (hrtick_enabled(rq)) hrtick_start_fair(rq, p); update_misfit_status(p, rq); return p; idle: if (!rf) return NULL; new_tasks = newidle_balance(rq, rf); /* * Because newidle_balance() releases (and re-acquires) rq->lock, it is * possible for any higher priority task to appear. In that case we * must re-start the pick_next_entity() loop. */ if (new_tasks < 0) return RETRY_TASK; if (new_tasks > 0) goto again; /* * rq is about to be idle, check if we need to update the * lost_idle_time of clock_pelt */ update_idle_rq_clock_pelt(rq); return NULL; }
标签:task,rq,fair,next,Linux,prev,cfs,se From: https://www.cnblogs.com/liuhailong0112/p/17981594