首页 > 系统相关 >Understanding the linux kernel Chapter3 Processes

Understanding the linux kernel Chapter3 Processes

时间:2024-01-24 18:16:07浏览次数:35  
标签:kernel Processes TASK thread do process linux define

Process Descriptor

How Processes Are Organized

the process in state:

  • TASK_RUNNING
    organized in runqueue list group

  • TASK_STROPPED\EXIT_ZOMBIE\EXIT_DEAD

    There is no need to group processes in any of these three states, because stopped, zombie, and dead processes are accessed only via PID or via linked lists of the child processes for a particular parent

  • TASK_INTERRUPTIBLE\TASK_UNINTERRUPTIBLE
    managed in wait queues

    By the way, it is rather uncommon that a wait queue includes both exclusive and nonexclusive processes.

the macro represent task state has expanded to more subdivided flags(base on v6.7), defined in <linux/include/sched.h>

new flags
/*
 * Task state bitmask. NOTE! These bits are also
 * encoded in fs/proc/array.c: get_task_state().
 *
 * We have two separate sets of flags: task->__state
 * is about runnability, while task->exit_state are
 * about the task exiting. Confusing, but this way
 * modifying one set can't modify the other one by
 * mistake.
 */

/* Used in tsk->__state: */
#define TASK_RUNNING			0x00000000
#define TASK_INTERRUPTIBLE		0x00000001
#define TASK_UNINTERRUPTIBLE		0x00000002
#define __TASK_STOPPED			0x00000004
#define __TASK_TRACED			0x00000008
/* Used in tsk->exit_state: */
#define EXIT_DEAD			0x00000010
#define EXIT_ZOMBIE			0x00000020
#define EXIT_TRACE			(EXIT_ZOMBIE | EXIT_DEAD)
/* Used in tsk->__state again: */
#define TASK_PARKED			0x00000040
#define TASK_DEAD			0x00000080
#define TASK_WAKEKILL			0x00000100
#define TASK_WAKING			0x00000200
#define TASK_NOLOAD			0x00000400
#define TASK_NEW			0x00000800
#define TASK_RTLOCK_WAIT		0x00001000
#define TASK_FREEZABLE			0x00002000
#define __TASK_FREEZABLE_UNSAFE	       (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP))
#define TASK_FROZEN			0x00008000
#define TASK_STATE_MAX			0x00010000

//see more flag in linux kernel source
... ...

Process Resource Limits

stored in the current->signal->rlim field

struct rlimit
struct rlimit {
unsigned long rlim_cur;//currently max value of the corresponding resource
unsigned long rlim_max;//max value of the rlim_cur
};

Whenever a user logs into the system, the kernel creates a process owned by the superuser, which can invoke setrlimit( ) to decrease the rlim_max and rlim_
cur fields for a resource. The same process later executes a login shell and becomes owned by the user. Each new process created by the user inherits the content of the rlim array from its parent, and therefore the user cannot override the limits enforced by the administrator.

Process Switch

Hardware Context

The set of data that must be loaded into the registers before the process resumes its execution on the CPU is called the hardware context.
The contents of all registers used by a process in User Mode have already been saved on the Kernel Mode stack before performing process switching.

Task State Segment

was ued by hardware context switches to store hardware context. the linux set up this segment for two main reaseon:

  1. When an 80×86 CPU switches from User Mode to Kernel Mode, it fetches the address of the Kernel Mode stack from the TSS.
  2. When a User Mode process attempts to access an I/O port by means of an in or out instruction, the CPU may need to access an I/O Permission Bitmap stored in the TSS to verify whether the process is allowed to address the port.

The thread field

each process descriptor includes a field called thread of type thread_struct, in which the kernel saves the hardware context.

Performing the Process Switch

The switch_to macro

Switching the Kernel Mode stack and the hardware context.


#define switch_to(prev, next, last)					\
do {									\
	((last) = __switch_to((prev), (next)));			\ //see below "the function __switch_to()"
} while (0)

The last parameter of the switch_to macro is an output parameter that specifies a memory location in which the macro writes the descriptor address of process C(who switced to A), thus last always reference the process who was the last running process before rerrunning prev.

The __switch_to() function


The _ _switch_to( ) function does the bulk of the process switch started by the switch_to( ) macro.

_ _switch_to(struct task_struct *prev_p,
struct task_struct *next_p)
_ _attribute_ _(regparm(3));

the detail excuting of the switch_to() see Page 110

If next_p was never suspended before because it is being executed for the first time, the function finds the starting address of the ret_from_fork( ) function

Saving and Loading the FPU,MMX,and XMM Registers

coprocessor:

A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices.

FPU: the arithmetic floating-point unit

ESCAPE instruction(use FPU,MMX and XMM registers)

floating-point arithmetic functions are performed with ESCAPE instructions.These instructions act on the set of floating-point registers included in the CPU.
so, if the process use them, the contents of them should be saved when process switched.

MMX instructions

a set of assembly language instructions are used to speed up the excuting of multimedia applications, introduced in later Pentium models, using the float-point registers of the FPU.

there are some support provided by the hardware to maintain those registers
TS (Task_Switching) flag( just like the lazy mechanism of COW, iomap)

When a context switch occurs from A to B, the kernel sets the TS flag and saves the floating-point registers into the TSS of process A. If the new process B does not use the mathematical coprocessor, the kernel won’t need to restore the contents of the floating-point registers. But as soon as B tries to execute an ESCAPE or MMX instruction, the CPU raises a “Device not available” exception, and the corresponding handler loads the floating-point registers with the values saved in the TSS of process B.

Saving and Loading the FPU registers

save:
save contents of register to thread.i387 and set flag TS_USEDFPU and The TS flag of cr0.
load:
raise a "Device not available" exception because the TS flag is set, and the handler reload the contents of FPU register from thread.i387

Using the FPU,MMX,and SSE/SSE2 units in Kernel Mode

flow

  1. Before using the coprocessor, the kernel must invoke kernel_fpu_begin(), which essentially calls save_init_fpu() to save the contents of the registers if the User Mode process used the FPU (TS_USEDFPU flag), and then resets the TS flag of the cr0 register.
  2. After using the coprocessor, the kernel must invoke kernel_fpu_end(), which sets the TS flag of the cr0 register.
  3. Later, when the User Mode process executes a coprocessor instruction, the math_state_restore() function will restore the contents of the registers, just as in process switch handling.

Creating Process

main call chain: clone ---> do_fork(kernel_clone in newst kernl(v6.7)) ---> copy_process()
the return address of the forked thread(the thread.esp filed stored in the thread kernel stack) point to ret_from_fork()

clone \ sys_clone

int clone(int (*fn)(void *), void *child_stack,
         int flags, void *arg, ...
       /* pid_t *ptid, void *newtls, pid_t *ctid */ );

flags: The low byte specifies the signal number to be sent to the parent process when the child terminates; the SIGCHLD signal is generally
selected. The remaining three bytes encode a group of clone flags.

The sys_clone() service routine that implements the clone() system call does not have the fn and arg parameters. In fact, the wrapper function saves the pointer fn into the child’s stack position corresponding to the return address of the wrapper function itself;

fork

The traditional fork( ) system call is implemented by Linux as a clone( ) system call whose flags parameter specifies both a SIGCHLD signal and all the clone flags cleared, and whose child_stack parameter is the current parent stack pointer. Therefore, the parent and child temporarily share the same User Mode stack. But thanks to the Copy On Write mechanism, they usually get separate copies of the User Mode stack as soon as one tries to change the stack.

/* Fork a new task - this creates a new program thread.
 * This is called indirectly via a small wrapper
 */
asmlinkage int sys_fork(struct pt_regs *regs)
{
	return do_fork(SIGCHLD, regs->ARM_sp, regs, 0, NULL, NULL);//see blow "the do_fork() function"
}

vfork

The vfork( ) system call creates a process that shares the memory address space of its parent. To prevent the parent from overwriting data needed by the child, the parent’s execution is blocked until the child exits or executes a new program.

asmlinkage int sys_vfork(struct pt_regs *regs)
{
	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->ARM_sp, regs, 0, NULL, NULL);
}

the do_fork() function

long do_fork(unsigned long clone_flags, unsigned long stack_start, struct pt_regs *regs, unsigned long stack_size, int __user *parent_tidptr, 
int __user *child_tidptr);

//in latest version of kernel(v6.7), the do_fork has changed to kernel_clone defined in /kernel/fork.c
pid_t kernel_clone(struct kernel_clone_args *args)

Kernel Threads

kernel_thread()
create kernel threads through do_fork(flags|CLONE_VM|CLONE_UNTRACED, 0, pregs, 0, NULL, NULL)
CLONE_VM: don't need access user address
CLONE_UNTRACED: don't allowed ptrace kernel processes

Destroying Processes

Process Termination

syscall: _exit() exit_group() --> do_exit()

The do_group_exit() function

The do_group_exit() function kills all processes belonging to the thread group of current. It receives as a parameter the process termination code, which is either a value specified in the exit_group() system call (normal termination) or an error code supplied by the kernel (abnormal termination).

send SIGKILL to all task in the current thread group, and call do_exit().

The do_exit() function
the way the thread exit:
in summary, if the thread is traced, it send signal to its parent and set state to EXIT_ZOMBIE, treated like a process(thread group leader). And if the process is a thread and not traced, it released remaining resources and set state to EXIT_DEAD.

Process Removal

release_task() is used to reclaime the remaining resources of the zombie child-process, which could be invoked in two ways:one is cause by parent who cares about the exiting code of ch-pro and calls wait(), the other is the parent don't care about the exit code and do_exit would call release_task() in which case resources are reclaimed by the scheduler(e.g. untraced threads).

标签:kernel,Processes,TASK,thread,do,process,linux,define
From: https://www.cnblogs.com/syp2023/p/17959398

相关文章

  • linux系统相关信息查询
    查看主机内核版本:cat/etc/redhat-release查看系统发行版本:uname-r查看主机cpu:lscpu查看主机内存:free-m......
  • 5款好用的Linux系统推荐!
    Linux是一款开源的操作系统,这意味着它的源代码可以被任何人查看、修改和使用,而且具有高度的可定制性和灵活性,可以根据不同的需求进行定制和优化。本文为大家推荐5款好用的Linux系统,快来看看吧。好用的Linux系统:Debian、LinuxMint、Manjaro、Ubuntu、Solus。1、LinuxM......
  • 嵌入式linux:开发板挂载nfs
    一、搭建nfs环境主机系统:ubuntu主机共享目录:/home/wm/mnt开发板共享目录:/mnt1、开发板与主机互ping测试,确保网络互通主机:通过ifconfig获取ip地址为192.168.3.83开发板:通过ifconfig获取ip地址为192.168.3.185主机ping开发板成功#以下命令在主机上运行ping192.......
  • 极速初始化配置 Manjaro Linux with WSL2 的开发环境
    配置ManjaroLinux用户组安装完成后,我们需要配置ManjaroLinux系统才能开始使用。在命令行执行passwd以设置root的密码。根据最小权限原则(opensinanewtab),日常使用中,不能所有命令中都使用root权限进行操作,因此我们需要创建一个新的用户以供日常使用。参照ArchW......
  • 重设红帽Linux root密码:简单实用指南
    在使用红帽Linux系统时,我们可能会遇到忘记root密码的情况红帽linux系统root密码,这时候该怎么办呢?本文将详细介绍如何重置红帽Linux系统的root密码。1.进入单用户模式首先,我们需要进入单用户模式。开机时红帽linux系统root密码,在grub菜单界面选择需要启动的内核版本,然后按下“e”键......
  • Linux查看文件的命令有哪些?
    在Linux系统中,查看文件的命令常用的有五个,分别是:find命令、locate命令、whereis命令、which命令及type命令,接下来通过这篇文章为大家详细介绍一下这五个命令。Linux查看文件的五种命令1、findfind是最常见和最强大的查找命令,你可以用它找到任何你想找的文件。f......
  • linux系统查看ssh登陆记录的方法
     1、下面的提供的命令可以实时检测/var/log/auth.log文件中的SSH登录记录,并只显示包含“ssh”的行sudotail-f/var/log/auth.log|grepssh2、如果你是CentOS、RedHat或Fedora等基于RHEL的发行版,则可以使用以下命令:下面的命令与上面的命令一样,都可以实时的检测/v......
  • Linux修复kernel时钟异常的问题
    发现与ntp服务器同步后,间隔10秒再同步,系统时间与ntp服务器已经相差0.6秒,因此怀疑系统时钟被修改了使用adjtimex调整sudoaptinstalladjtimex对比当前系统时钟sudoadjtimex--compareWARNING:CMOStimeis53.38minbehindsystemclock---current-----suggeste......
  • linux进程间通讯
    进程间通讯->共享内存structshmid_ds{ structipc_permshm_perm; /*operationperms*/ intshm_segsz; /*sizeofsegment(bytes)*/ __kernel_time_tshm_atime; /*lastattachtime*/ __kernel_time_t......
  • Linux7的启动原理和服务控制
    1、CentOS6的启动过程和原理BIOS/UEFI阶段--->GRUB阶段--->内核引导阶段--->Init阶段--->运行级别切换阶段--->服务启动阶段--->图形界面或命令行登录阶段--->用户登录BIOS/UEFI阶段:在计算机开机时,BIOS(或UEFI)会执行自检和硬件初始化,然后从启动设备(通常是硬盘)的MBR(主引导记......