首页 > 系统相关 >Linux ELF文件装入与执行概述

Linux ELF文件装入与执行概述

时间:2023-06-19 10:03:58浏览次数:37  
标签:load 装入 ELF elf mm Linux interpreter retval out


ELF是linux中使用最广泛的一种应用程序格式,为了弄清楚Linux内核是如何讲ELF文件精确映射到指定内存空间,上周末把内核sys_execve部分好好看了一遍,小结如下:

1. ELF格式
ELF指定了进程中text段、bss段、data段等应该放置到进程虚拟内存空间的什么位置,以及记录了进程需要用到的各种动态链接库的位置。


2. sys_execve的大致执行流程
  1) 打开ELF二进制文件,读入ELF头
  2) 删除从父进程继承过来的mm相关内容
  3) 根据ELF头将interpreter段、text段、data段等映射进内存(由此知linux不支持压缩了的二进制程序)
     设置好堆栈等,更新mm内容。
  4) "伪造"好本进程的内核栈,为进程返回用户态执行做好准备。内核栈中的ip指向了interpreter段入口。
  5) sys_execve系统调用返回到用户态,开始interpreter的执行(interpreter一般为linux-ld.so.2 or similar)

 进入到用户态后,interpreter做了些什么呢?

  6) interpreter帮助用户进程装入动态链接库,做好全部重定位映射工作。
  7) interpreter返回到main开始执行。

这里面有几个问题需要深究:
  1> sys_execve被调用的时候内核栈长什么样?用户态参数是如何传入到内核的?
    只有弄明白了这个问题,才知道如何从内核返回到interpreter入口开始执行
     A: 关于这个问题请参考linux系统调用相关章节。linux系统调用采取了一个一致的方法来处理系统调用参数问题,非常值得借鉴,将另外撰文梳理其设计思路。
  2> interpreter的参数从哪里来?interpreter如何返回到main?
     A: 如果从传统的C语言函数调用的角度来理解,这个问题会很费解。但是如果能从汇编的角度,动态地、有目的地调整和"伪造"调用栈,就能够做到方便地再各个函数间切换和传参。
     内核会构造好interpreter所需要的参数栈,interpreter会构造好main所需要的参数栈。用户栈是在setup_arg_pages函数中构建的。
  3> 内核是如何保证将各个段映射到期望的位置?
     mmap函数有一个参数取MAP_FIXED参数即可。

 

 

 

 

笔记附文:

 

/* 将当前(current)的mm结构替换成参数中的mm结构。本函数被
 * int flush_old_exec(struct linux_binprm * bprm)调用。
 * 旧mm被删除。
 */
static int exec_mmap(struct mm_struct *mm)
{
	struct task_struct *tsk;
	struct mm_struct * old_mm, *active_mm;
	/* Notify parent that we're no longer interested in the old VM */
	tsk = current;
	old_mm = current->mm;
	/* 释放当前进程的老mm结构(人老珠黄真可怕!)*/
	mm_release(tsk, old_mm);	
	if (old_mm) {  /* 如果老的mm正在被使用(coredump)则不能继续 */
		/*
		 * Make sure that if there is a core dump in progress
		 * for the old mm, we get out and die instead of going
		 * through with the exec.  We must hold mmap_sem around
		 * checking core_state and changing tsk->mm.
		 */
		down_read(&old_mm->mmap_sem);
		if (unlikely(old_mm->core_state)) {
			up_read(&old_mm->mmap_sem);
			return -EINTR;
		}
	}
	/* 老的mm已经销毁了,迎接新媳妇 */
	task_lock(tsk);
	/* 如果当前线程是个核心线程,则active_mm有效 */
	active_mm = tsk->active_mm;
	/* 新mm入洞房 */
	tsk->mm = mm;	
	tsk->active_mm = mm;
	/* 第二天起,新媳妇就正式管家啦! */
	activate_mm(active_mm, mm);	
	task_unlock(tsk);
	/* 设置了mm中几个函数指针, 何用? */
	arch_pick_mmap_layout(mm);
	if (old_mm) {
		/* 事到如今如果old_mm还没有消失,
		 * 那是因为他们家妹妹active_mm在帮她撑腰
		 */
		up_read(&old_mm->mmap_sem);
		BUG_ON(active_mm != old_mm);
		/* 如果老mm外头有人,就做个顺水人情 送给外头那位吧 */
		mm_update_next_owner(old_mm);
		/* 从自己的通讯录里头把老mm删除 */
		mmput(old_mm);
		return 0;
	}
	/* 彻底干掉老的active_mm. 莫非是为多线程服务? */
	mmdrop(active_mm);
	return 0;
}
/* 将elf文件映射到当前进程的虚拟内存中
 * 总体思路为:
 * 	
 *
 */
/* 预备知识
Complete Reference on ELF format:
	http://www.muppetlabs.com/~breadbox/software/ELF.txt
1. 为了读懂下面的代码,最好了解ELF头的格式:
typedef struct elf32_hdr{
  unsigned char	e_ident[EI_NIDENT];	/* Magic Number */
  Elf32_Half	e_type;	/* ET_EXEC或ET_DYN:可执行映像或共享库 */
  Elf32_Half	e_machine; /* 目标CPU类型 */
  Elf32_Word	e_version; /*  */
  Elf32_Addr	e_entry;  /* Entry point, 一般是_start()的起点 */
  Elf32_Off	e_phoff;  /* 指向“程序头(Program Header)”数组的起点 */
  Elf32_Off	e_shoff;  /* 向“区段头(Section Header)”数组的起点,
			     标定“程序段”“数据段”等等 */
  Elf32_Word	e_flags;
  Elf32_Half	e_ehsize;  /* 映像头部本身的大小 */
  Elf32_Half	e_phentsize;  /* “程序头(Program Header)”数组元素的大小 */
  Elf32_Half	e_phnum;  /* “程序头(Program Header)”数组元素的个数 */
  Elf32_Half	e_shentsize;  /* “区段头(Section Header)”数组元素的大小 */
  Elf32_Half	e_shnum;  /* “区段头(Section Header)”数组元素的个数 */
  Elf32_Half	e_shstrndx;
} Elf32_Ehdr;
2. 每个程序头里面包含的是什么呢?
typedef struct elf32_phdr{
  Elf32_Word	p_type;	/* 段的类型,特别地,PT_LOAD表示是可加载的段 */
  Elf32_Off	p_offset;  /* 该段在文件中相对于文件第0个字节的偏移 */
  Elf32_Addr	p_vaddr;  /* 该段加载后在进程空间中占用的内存起始地址 */
  Elf32_Addr	p_paddr;  /* 在支持paging的OS中该字段被忽略 */
  Elf32_Word	p_filesz;  /*该段在文件中占用的字节大小. 有些段可能在
			    文件中不存在但却占用一定的内存空间,此时这个字段为0 */
  Elf32_Word	p_memsz;  /* 该段在内存中占用的字节大小。有些段可能仅存在于文件
			    中而不被加载到内存,此时这个字段为0。*/
  Elf32_Word	p_flags;
  Elf32_Word	p_align;  /* 对齐值 */
} Elf32_Phdr;
3. 每个区段头里面包含的是什么呢?
   区段表是从链接角度看待ELF文件的结果,所以从区段的角度ELF文件分成了许多的区,
   每个区保存着用于不同目的的数据,这些数据可能被前面提到的程序头重复引用。
typedef struct elf64_shdr {
  Elf64_Word sh_name;		/* Section name, index in string tbl */
  Elf64_Word sh_type;		/* Type of section */
  Elf64_Xword sh_flags;		/* Miscellaneous section attributes */
  Elf64_Addr sh_addr;		/* Section virtual addr at execution */
  Elf64_Off sh_offset;		/* Section file offset */
  Elf64_Xword sh_size;		/* Size of section in bytes */
  Elf64_Word sh_link;		/* Index of another section */
  Elf64_Word sh_info;		/* Additional section information */
  Elf64_Xword sh_addralign;	/* Section alignment */
  Elf64_Xword sh_entsize;	/* Entry size if section holds table */
} Elf64_Shdr;
4. 程序头和区段头有什么区别?
链接器和加载器看待elf是完全不同的,
链接器看到的是由区段头部表描述的一系列逻辑区段的**(也就是说它忽略了程序头部表)。
而加载器则是看成是由程序头部表描述的一系列的段的**(忽略了区段头部表)。
区分图片: http://img.ddvip.com/2009_09_10/1252583354_ddvip_9407.jpeg
Segment是从映像装入角度考虑的划分,Section才是从连接/启动角度考虑的划分
以Wine为例子,
Section to Segment mapping:
  Segment Sections...
   00   
   01     .interp
   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r
          .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame
   03     .data .dynamic .ctors .dtors .jcr .got .bss
   04     .dynamic
   05     .note.ABI-tag


5. 如何保证各个区段map到期望的虚拟位置?
mmap函数flags参数有MAP_FIXED标志,当此标志被设置的时候,一旦映射失败,则返回错误!

6. 纵观全函数,load_elf_binary的作用是:
  1) 将elf各个段的数据读入到内存并建立映射
  2) 将interpreter载入到内存并建立映射(包括了动态重定位过程)
  3) 设置好regs结构的ip,sp等,为启动进程做好了准备
待解决的问题:interpreter如何把控制权交给_main()?
 我自己的一点分析:
在load_elf_binary中获得ld-linux.so.2的入口地址eax后,执行 
	push eax
	ret
就进入了ld-linux.so.2领地,在这里ld-linux.so.2帮助装入各个链接库
Q1. 如何知道装入哪些链接库?参数从何而来?
Q2. 如何在装入完成后返回到main开始执行主程序?
A1. 通过堆栈操作!注意到上面两句汇编代码,起本质等价于一个jump,可以想象jump的目标地址
    load_elf_binary函数内部,此时解释器的代码就和load_elf_binary函数共用参数堆栈了!
A2. 通过unwind interpreter的堆栈,然后返回到main开始执行

下面的代码取自GNU ELF interpreter,说明了ld.so是如何完成链接的。
Code in  http://ftp.gnu.org > 
         gnu > glibc > glibc-2.5.tar.bz2 > glibc-2.5 > sysdeps > i386 > dl-machine.h
/* Initial entry point code for the dynamic linker.
   The C function `_dl_start' is the real entry point;
   its return value is the user program's entry point.  */
#define RTLD_START asm ("/n/
        .text/n/
        .align 16/n/
0:      movl (%esp), %ebx/n/
        ret/n/
        .align 16/n/
.globl _start/n/
.globl _dl_start_user/n/
_start:/n/
        # Note that _dl_start gets the parameter in %eax./n/
        movl %esp, %eax/n/
        call _dl_start/n/
_dl_start_user:/n/
        # Save the user entry point address in %edi./n/
        movl %eax, %edi/n/
        # Point %ebx at the GOT./n/
        call 0b/n/
        addl $_GLOBAL_OFFSET_TABLE_, %ebx/n/
        # See if we were run as a command with the executable file/n/
        # name as an extra leading argument./n/
        movl _dl_skip_args@GOTOFF(%ebx), %eax/n/
        # Pop the original argument count./n/
        popl %edx/n/
        # Adjust the stack pointer to skip _dl_skip_args words./n/
        leal (%esp,%eax,4), %esp/n/
        # Subtract _dl_skip_args from argc./n/
        subl %eax, %edx/n/
        # Push argc back on the stack./n/
        push %edx/n/
        # The special initializer gets called with the stack just/n/
        # as the application's entry point will see it; it can/n/
        # switch stacks if it moves these contents over./n/
" RTLD_START_SPECIAL_INIT "/n/
        # Load the parameters again./n/
        # (eax, edx, ecx, *--esp) = (_dl_loaded, argc, argv, envp)/n/
        movl _rtld_local@GOTOFF(%ebx), %eax/n/
        leal 8(%esp,%edx,4), %esi/n/
        leal 4(%esp), %ecx/n/
        movl %esp, %ebp/n/
        # Make sure _dl_init is run with 16 byte aligned stack./n/
        andl $-16, %esp/n/
        pushl %eax/n/
        pushl %eax/n/
        pushl %ebp/n/
        pushl %esi/n/
        # Clear %ebp, so that even constructors have terminated backchain./n/
        xorl %ebp, %ebp/n/
        # Call the function to run the initializers./n/
        call _dl_init_internal@PLT/n/
        # Pass our finalizer function to the user in %edx, as per ELF ABI./n/
        leal _dl_fini@GOTOFF(%ebx), %edx/n/
        # Restore %esp _start expects./n/
        movl (%esp), %esp/n/
        # Jump to the user's entry point./n/
        jmp *%edi/n/
        .previous/n/
");
  /* Call the OS-dependent function to set up life so we can do things like
     file access.  It will call `dl_main' (below) to do all the real work
     of the dynamic linker, and then unwind our frame and run the user
     entry point on the same stack we entered on.  */
  Code in rtld.c ....

*/

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
{
	struct file *interpreter = NULL; /* to shut gcc up */
 	unsigned long load_addr = 0, load_bias = 0;
	int load_addr_set = 0;
	char * elf_interpreter = NULL;
	unsigned long error;
	struct elf_phdr *elf_ppnt, *elf_phdata;
	unsigned long elf_bss, elf_brk;
	int elf_exec_fileno;
	int retval, i;
	unsigned int size;
	unsigned long elf_entry;
	unsigned long interp_load_addr = 0;
	unsigned long start_code, end_code, start_data, end_data;
	unsigned long reloc_func_desc = 0;
	int executable_stack = EXSTACK_DEFAULT;
	unsigned long def_flags = 0;
	struct {
		struct elfhdr elf_ex;
		struct elfhdr interp_elf_ex;
	} *loc;
	loc = kmalloc(sizeof(*loc), GFP_KERNEL);
	if (!loc) {
		retval = -ENOMEM;
		goto out_ret;
	}
	
	/* Get the exec-header */
	loc->elf_ex = *((struct elfhdr *)bprm->buf);
	retval = -ENOEXEC;
	/* First of all, some simple consistency checks */
	if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
		goto out;
	if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)
		goto out;
	if (!elf_check_arch(&loc->elf_ex))
		goto out;
	/* EFL文件所在的文件系统必须支持mmap操作 */
	if (!bprm->file->f_op||!bprm->file->f_op->mmap)
		goto out;
	/* Now read in all of the header information */
	if (loc->elf_ex.e_phentsize != sizeof(struct elf_phdr))
		goto out;
	if (loc->elf_ex.e_phnum < 1 ||
	 	loc->elf_ex.e_phnum > 65536U / sizeof(struct elf_phdr))
		goto out;
	/* Note: ELF装载器(区分链接器)只使用Program Header
	 * 下面为Program Header分配空间
	 *  Program header里面指明了各个区段应该如何装载到内存中
	 */
	size = loc->elf_ex.e_phnum * sizeof(struct elf_phdr);
	retval = -ENOMEM;
	elf_phdata = kmalloc(size, GFP_KERNEL);
	if (!elf_phdata)
		goto out;
	/* 将ELF文件中Program Header部分读入到缓存中 */
	retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,
			     (char *)elf_phdata, size);
	if (retval != size) {
		if (retval >= 0)
			retval = -EIO;
		goto out_free_ph;
	}
	/* 下面对ELF文件的操作应该需要一个fd (?) */
	retval = get_unused_fd();
	if (retval < 0)
		goto out_free_ph;
	get_file(bprm->file);
	fd_install(elf_exec_fileno = retval, bprm->file);
	elf_ppnt = elf_phdata;
	elf_bss = 0;
	elf_brk = 0;
	start_code = ~0UL;
	end_code = 0;
	start_data = 0;
	end_data = 0;
	/* 下面的代码遍历 三次Program Header数组
	 *  第一次处理PT_INTERP类型的区段
	 *  第二次处理PT_GNU_STACK类型的区段
	 *  第三次才处理PT_LOAD类型的区段
	 *  NOTE: PT_DYNAMIC这个字段并没有处理,留给interpreter来映射和重定位。
	 *  下面分区段注释
	 */

	/*
	 *  第一次处理PT_INTERP类型的区段
	 */
	
	for (i = 0; i < loc->elf_ex.e_phnum; i++) {
		if (elf_ppnt->p_type == PT_INTERP) {
			/* This is the program interpreter used for
			 * shared libraries - for now assume that this
			 * is an a.out format binary
			 */
			retval = -ENOEXEC;
			if (elf_ppnt->p_filesz > PATH_MAX || 
			    elf_ppnt->p_filesz < 2)
				goto out_free_file;
			retval = -ENOMEM;
			elf_interpreter = kmalloc(elf_ppnt->p_filesz,
						  GFP_KERNEL);
			if (!elf_interpreter)
				goto out_free_file;
			/* 在PT_INTERP段中存放的是链接器的名称
			 * ELF规范强制要求OS最先处理该字段
			 * 该字段的内容类似于:
			 * /lib64/ld-linux-x86-64.so.2
			 */
			retval = kernel_read(bprm->file, elf_ppnt->p_offset,
					     elf_interpreter,
					     elf_ppnt->p_filesz);
			if (retval != elf_ppnt->p_filesz) {
				if (retval >= 0)
					retval = -EIO;
				goto out_free_interp;
			}
			/* make sure path is NULL terminated */
			retval = -ENOEXEC;
			if (elf_interpreter[elf_ppnt->p_filesz - 1] != '/0')
				goto out_free_interp;
			/*
			 * The early SET_PERSONALITY here is so that the lookup
			 * for the interpreter happens in the namespace of the 
			 * to-be-execed image.  SET_PERSONALITY can select an
			 * alternate root.
			 *
			 * However, SET_PERSONALITY is NOT allowed to switch
			 * this task into the new images's memory mapping
			 * policy - that is, TASK_SIZE must still evaluate to
			 * that which is appropriate to the execing application.
			 * This is because exit_mmap() needs to have TASK_SIZE
			 * evaluate to the size of the old image.
			 *
			 * So if (say) a 64-bit application is execing a 32-bit
			 * application it is the architecture's responsibility
			 * to defer changing the value of TASK_SIZE until the
			 * switch really is going to happen - do this in
			 * flush_thread().	- akpm
			 */
			SET_PERSONALITY(loc->elf_ex);
			/* 打开链接器文件,返回文件句柄 */
			interpreter = open_exec(elf_interpreter);
			retval = PTR_ERR(interpreter);
			if (IS_ERR(interpreter))
				goto out_free_interp;
			/*
			 * If the binary is not readable then enforce
			 * mm->dumpable = 0 regardless of the interpreter's
			 * permissions.
			 */
			if (file_permission(interpreter, MAY_READ) < 0)
				bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
			/* 读入链接器的程序头 */
			retval = kernel_read(interpreter, 0, bprm->buf,
					     BINPRM_BUF_SIZE);
			if (retval != BINPRM_BUF_SIZE) {
				if (retval >= 0)
					retval = -EIO;
				goto out_free_dentry;
			}
			/* Get the exec headers */
			loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
			break;
		}
		elf_ppnt++;
	}
	/*
	 *  第二次处理PT_GNU_STACK类型的区段
	 */	
	elf_ppnt = elf_phdata;
	for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
		if (elf_ppnt->p_type == PT_GNU_STACK) {
			/* 由代码可以看出,这个区段只是提供了一个标志
			 * 没有实际的段数据
			 */
			if (elf_ppnt->p_flags & PF_X)
				executable_stack = EXSTACK_ENABLE_X;
			else
				executable_stack = EXSTACK_DISABLE_X;
			break;
		}
	/* 检查链接器的ELF标志以及其目标平台是否合法  */
	/* Some simple consistency checks for the interpreter */
	if (elf_interpreter) {
		retval = -ELIBBAD;
		/* Not an ELF interpreter */
		if (memcmp(loc->interp_elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
			goto out_free_dentry;
		/* Verify the interpreter has a valid arch */
		if (!elf_check_arch(&loc->interp_elf_ex))
			goto out_free_dentry;
	} else {
		/* Executables without an interpreter also need a personality  */
		SET_PERSONALITY(loc->elf_ex);
	}
	/* Flush all traces of the currently running executable */
	retval = flush_old_exec(bprm);
	if (retval)
		goto out_free_dentry;
	/* OK, This is the point of no return */
	current->flags &= ~PF_FORKNOEXEC;
	current->mm->def_flags = def_flags;
	/* Do this immediately, since STACK_TOP as used in setup_arg_pages
	   may depend on the personality.  */
	SET_PERSONALITY(loc->elf_ex);
	if (elf_read_implies_exec(loc->elf_ex, executable_stack))
		current->personality |= READ_IMPLIES_EXEC;
	if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
		current->flags |= PF_RANDOMIZE;
	arch_pick_mmap_layout(current->mm);
	/* Do this so that we can load the interpreter, if need be.  We will
	   change some of these later */
	current->mm->free_area_cache = current->mm->mmap_base;
	current->mm->cached_hole_size = 0;
	retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
				 executable_stack);
	if (retval < 0) {
		send_sig(SIGKILL, current, 0);
		goto out_free_dentry;
	}
	
	current->mm->start_stack = bprm->p;

	/*
	 *  第三次处理PT_LOAD类型的区段
	 */	
	 
	/* Now we do a little grungy work by mmaping the ELF image into
	   the correct location in memory. */
	for(i = 0, elf_ppnt = elf_phdata;
	    i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
		int elf_prot = 0, elf_flags;
		unsigned long k, vaddr;
		if (elf_ppnt->p_type != PT_LOAD)
			continue;
		if (unlikely (elf_brk > elf_bss)) {
			unsigned long nbyte;
	            
			/* There was a PT_LOAD segment with p_memsz > p_filesz
			   before this one. Map anonymous pages, if needed,
			   and clear the area.  */
			retval = set_brk (elf_bss + load_bias,
					  elf_brk + load_bias);
			if (retval) {
				send_sig(SIGKILL, current, 0);
				goto out_free_dentry;
			}
			nbyte = ELF_PAGEOFFSET(elf_bss);
			if (nbyte) {
				nbyte = ELF_MIN_ALIGN - nbyte;
				if (nbyte > elf_brk - elf_bss)
					nbyte = elf_brk - elf_bss;
				if (clear_user((void __user *)elf_bss +
							load_bias, nbyte)) {
					/*
					 * This bss-zeroing can fail if the ELF
					 * file specifies odd protections. So
					 * we don't check the return value
					 */
				}
			}
		}
		if (elf_ppnt->p_flags & PF_R)
			elf_prot |= PROT_READ;
		if (elf_ppnt->p_flags & PF_W)
			elf_prot |= PROT_WRITE;
		if (elf_ppnt->p_flags & PF_X)
			elf_prot |= PROT_EXEC;
		elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE;
		vaddr = elf_ppnt->p_vaddr;
		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
			/* 非动态定位部分,必须映射到期望区间,
			 * 故而指定MAP_FIXED参数 
			 */
			elf_flags |= MAP_FIXED;
		} else if (loc->elf_ex.e_type == ET_DYN) {
			/* Try and get dynamic programs out of the way of the
			 * default mmap base, as well as whatever program they
			 * might try to exec.  This is because the brk will
			 * follow the loader, and is not movable.  */
#ifdef CONFIG_X86
			load_bias = 0;
#else
			load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
#endif
		}
		
		/* 重点代码
		 * 将file中的对应区段内容map到vaddr中 
		 */
		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
				elf_prot, elf_flags, 0);
		if (BAD_ADDR(error)) {
			send_sig(SIGKILL, current, 0);
			retval = IS_ERR((void *)error) ?
				PTR_ERR((void*)error) : -EINVAL;
			goto out_free_dentry;
		}
		/* 本代码只在第一次时执行 */
		if (!load_addr_set) {
			load_addr_set = 1;
			load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset);
			if (loc->elf_ex.e_type == ET_DYN) {
				load_bias += error -
				             ELF_PAGESTART(load_bias + vaddr);
				load_addr += load_bias;
				reloc_func_desc = load_bias;
			}
		}
		k = elf_ppnt->p_vaddr;
		if (k < start_code)
			start_code = k;
		if (start_data < k)
			start_data = k;
		/*
		 * Check to see if the section's size will overflow the
		 * allowed task size. Note that p_filesz must always be
		 * <= p_memsz so it is only necessary to check p_memsz.
		 */
		if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
		    elf_ppnt->p_memsz > TASK_SIZE ||
		    TASK_SIZE - elf_ppnt->p_memsz < k) {
			/* set_brk can never work. Avoid overflows. */
			send_sig(SIGKILL, current, 0);
			retval = -EINVAL;
			goto out_free_dentry;
		}
		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
		if (k > elf_bss)
			elf_bss = k;
		if ((elf_ppnt->p_flags & PF_X) && end_code < k)
			end_code = k;
		if (end_data < k)
			end_data = k;
		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
		if (k > elf_brk)
			elf_brk = k;
	} /* end of for PT_LOAD */
	/* 对PT_LOAD的全部努力就得到如下数据,加上已经
	 * 映射好了的内存段
	 */
	loc->elf_ex.e_entry += load_bias;
	elf_bss += load_bias;
	elf_brk += load_bias;
	start_code += load_bias;
	end_code += load_bias;
	start_data += load_bias;
	end_data += load_bias;
	/* Calling set_brk effectively mmaps the pages that we need
	 * for the bss and break sections.  We must do this before
	 * mapping in the interpreter, to make sure it doesn't wind
	 * up getting placed where the bss needs to go.
	 */
	retval = set_brk(elf_bss, elf_brk);
	if (retval) {
		send_sig(SIGKILL, current, 0);
		goto out_free_dentry;
	}
	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
		send_sig(SIGSEGV, current, 0);
		retval = -EFAULT; /* Nobody gets to see this, but.. */
		goto out_free_dentry;
	}
	/* 读入链接器到内存中,记录入口地址 */
	if (elf_interpreter) {
		unsigned long uninitialized_var(interp_map_addr);
		elf_entry = load_elf_interp(&loc->interp_elf_ex,
					    interpreter,
					    &interp_map_addr,
					    load_bias);
		if (!IS_ERR((void *)elf_entry)) {
			/*
			 * load_elf_interp() returns relocation
			 * adjustment
			 */
			interp_load_addr = elf_entry;
			elf_entry += loc->interp_elf_ex.e_entry;
		}
		if (BAD_ADDR(elf_entry)) {
			force_sig(SIGSEGV, current);
			retval = IS_ERR((void *)elf_entry) ?
					(int)elf_entry : -EINVAL;
			goto out_free_dentry;
		}
		reloc_func_desc = interp_load_addr;
		allow_write_access(interpreter);
		fput(interpreter);
		kfree(elf_interpreter);
	} else {
		elf_entry = loc->elf_ex.e_entry;
		if (BAD_ADDR(elf_entry)) {
			force_sig(SIGSEGV, current);
			retval = -EINVAL;
			goto out_free_dentry;
		}
	}
	kfree(elf_phdata);
	sys_close(elf_exec_fileno);
	set_binfmt(&elf_format);
#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
	retval = arch_setup_additional_pages(bprm, executable_stack);
	if (retval < 0) {
		send_sig(SIGKILL, current, 0);
		goto out;
	}
#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
	compute_creds(bprm);
	current->flags &= ~PF_FORKNOEXEC;
	/* 这个函数做了很多事,需要仔细分析! 
	 * bprm->p在这里被修改了。
	 */
	retval = create_elf_tables(bprm, &loc->elf_ex,
			  load_addr, interp_load_addr);
	if (retval < 0) {
		send_sig(SIGKILL, current, 0);
		goto out;
	}
	/* N.B. passed_fileno might not be initialized? */
	current->mm->end_code = end_code;
	current->mm->start_code = start_code;
	current->mm->start_data = start_data;
	current->mm->end_data = end_data;
	current->mm->start_stack = bprm->p;
#ifdef arch_randomize_brk
	if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1))
		current->mm->brk = current->mm->start_brk =
			arch_randomize_brk(current->mm);
#endif
	if (current->personality & MMAP_PAGE_ZERO) {
		/* Why this, you ask???  Well SVr4 maps page 0 as read-only,
		   and some applications "depend" upon this behavior.
		   Since we do not have the power to recompile these, we
		   emulate the SVr4 behavior. Sigh. */
		down_write(¤t->mm->mmap_sem);
		error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
				MAP_FIXED | MAP_PRIVATE, 0);
		up_write(¤t->mm->mmap_sem);
	}
#ifdef ELF_PLAT_INIT
	/*
	 * The ABI may specify that certain registers be set up in special
	 * ways (on i386 %edx is the address of a DT_FINI function, for
	 * example.  In addition, it may also specify (eg, PowerPC64 ELF)
	 * that the e_entry field is the address of the function descriptor
	 * for the startup routine, rather than the address of the startup
	 * routine itself.  This macro performs whatever initialization to
	 * the regs structure is required as well as any relocations to the
	 * function descriptor entries when executing dynamically links apps.
	 */
	ELF_PLAT_INIT(regs, reloc_func_desc);
#endif
	/* start_thread名不副实,更应该叫做prepare_user_thread()
	 * 它把邋elf_entry、user_stack设置到regs里面去了
	 * 为后面的启动做好了准备。真正启动用户态程序的时机是
	 * sys_execve()返回到用户态的时候!
         * http://lkml.indiana.edu/hypermail/linux/kernel/0105.2/0910.html
    	 */
	start_thread(regs, elf_entry, bprm->p);
	retval = 0;
out:
	kfree(loc);
out_ret:
	return retval;
	/* error cleanup */
out_free_dentry:
	allow_write_access(interpreter);
	if (interpreter)
		fput(interpreter);
out_free_interp:
	kfree(elf_interpreter);
out_free_file:
	sys_close(elf_exec_fileno);
out_free_ph:
	kfree(elf_phdata);
	goto out;
}
/*
ip,sp到底是如何转换的呢?这里面用到了诀窍!
sys_execve->do_execve->search_binary_handler->load_binary->load_elf_binary->(code above)
首先弄明白下面的问题:
1. 系统调用中,用户参数、用户栈是如何管理的?保存在哪里?
首先描述下陷入内核的时候堆栈长成了什么样: ( in the famous 8K space )
struct pt_regs {
	unsigned long bx;     /* 进入内核后SAVE_ALL压入           */              低地址
	unsigned long cx;     /* 进入内核后SAVE_ALL压入           */
	unsigned long dx;     /* 进入内核后SAVE_ALL压入           */
	unsigned long si;     /* 进入内核后SAVE_ALL压入           */
	unsigned long di;     /* 进入内核后SAVE_ALL压入           */                ^
	unsigned long bp;     /* 进入内核后SAVE_ALL压入           */                ^
	unsigned long ax;     /* 进入内核后SAVE_ALL压入           */                ^
	unsigned long ds;     /* 进入内核后SAVE_ALL压入           */                ^
	unsigned long es;     /* 进入内核后SAVE_ALL压入           */                ^
	unsigned long fs;     /* 进入内核后SAVE_ALL压入           */                ^
	/* int  gs; */
	unsigned long orig_ax;/* 进入内核后push eax压入           */
	unsigned long ip;     /* 陷入内核时系统自动压入 */
	unsigned long cs;     /* 陷入内核时系统自动压入 */
	unsigned long flags;  /* 陷入内核时系统自动压入 */
	unsigned long sp;     /* 陷入内核时系统自动压入 */ 
	unsigned long ss;     /* 陷入内核时系统自动压入 */                         高地址
};
NOTE: 越是下面的数据越早被压入堆栈.
下面是2.6内核中进入内核栈后的代码.
# system call handler stub
ENTRY(system_call)
	RING0_INT_FRAME			# can't unwind into user space anyway
	pushl %eax			# save orig_eax
	CFI_ADJUST_CFA_OFFSET 4		# cld instruction
	SAVE_ALL
	GET_THREAD_INFO(%ebp)		# 这个时候esp指向的是pt_regs栈顶(高地址)
syscall_call:
	call *sys_call_table(,%eax,4)   # call的目标地址为sys_call_table+eax*4, 应该就是eax表示调用号,
					# 调用目标即为函数入口, 此时ip再次压栈, 参数esp指向ip
					# 在服务函数内部, 就可以通过esp访问到pt_regs了
	movl %eax,PT_EAX(%esp)		# store the return value
syscall_exit:
	LOCKDEP_SYS_EXIT
	DISABLE_INTERRUPTS(CLBR_ANY)	# make sure we don't miss an interrupt
					# setting need_resched or sigpending
					# between sampling and the iret
	......

在64为计算机上,sys_execve反汇编结果如下:
(objdump -d /lib/modules/2.6.18.8/source/arch/x86_64/kernel/process.o)
0000000000000000 <sys_execve>:
   0:   48 83 ec 28             sub    $0x28,%rsp	# 腾出本地局部变量栈
   4:   48 89 5c 24 08          mov    %rbx,0x8(%rsp)   # 保存些寄存器到临时栈中
   9:   48 89 6c 24 10          mov    %rbp,0x10(%rsp)
   e:   48 89 d5                mov    %rdx,%rbp         # Why rdx? no idea.
  11:   4c 89 64 24 18          mov    %r12,0x18(%rsp)
  16:   4c 89 6c 24 20          mov    %r13,0x20(%rsp)
不纠缠这个了。。。乱!反正就是知道一点,pt_regs中有你所需
关于Linux用户进程向系统中断调用过程传递参数方面,
Linux系统使用了通用寄存器传递方法,例如寄存器ebx、ecx和edx。
这种使用寄存器传递参数方法的一个明显优点就是:
当进入系统中断服务程序而保存寄存器值时,
这些传递参数的寄存器也被自动地放在了内核态堆栈上,
因此用不着再专门对传递参数的寄存器进行特殊处理。

2. 如何与execve合作?
在pt_regs 的帮助下,可以设置ip,esp, 对于execve之类的系统调用,就可以通过替换掉ip,esp
来实现移花接木的效果。

3. 用户态如何把参数传入核心栈呢?
举个例子用户态write被调用时候
write:
	pushl %ebx 
	movl 8(%esp), %ebx    ; linux的_syscall3使得这里做了如此的展开
	movl 12(%esp), %ecx   ; 使得寄存器传参得以实现
	movl 16(%esp), %edx   ; 显然,这个过程不依赖于编译器
	movl $4, %eax 
	int $0x80 
	....
Read more from this perfect online book-store:
	http://my.safaribooksonline.com/0-596-00002-2/ch08-10-fm2xml
	Chapter 8. System Calls > Anticipating Linux 2.4 - Pg. 241
*/

/*
 * sys_execve() executes a new program.
 */
asmlinkage int sys_execve(struct pt_regs regs)
{
	int error;
	char * filename;
	filename = getname((char __user *) regs.bx);
	error = PTR_ERR(filename);
	if (IS_ERR(filename))
		goto out;
	error = do_execve(filename,
			(char __user * __user *) regs.cx,
			(char __user * __user *) regs.dx,
			®s);
	if (error == 0) {
		/* Make sure we don't return using sysenter.. */
		set_thread_flag(TIF_IRET);
	}
	putname(filename);
out:
	return error;
}

标签:load,装入,ELF,elf,mm,Linux,interpreter,retval,out
From: https://blog.51cto.com/maray/6510890

相关文章

  • Linux主机间建立信任关系
    目标:源->目ssh无需输入密码方法:拷贝源 ~/.ssh/id_rsa.pub中的文本内容到目的机器的~/.ssh/authorized_keys最后具体命令步骤:1.【源主机】cat ~/.ssh/id_rsa.pub ,复制其中内容2.【目主机】vi ~/.ssh/authorized_keys,将剪贴板里面的内容黏贴到最后附:1.如果添加信任关系失败......
  • 【Linux交换分区】 交换分区格式浅析
    完成本文,使用了两个工具 1.strace 2.googlecodesearch. ----swap分区有一个大小为PAGE_SIZE的页面,称为signature页,上面记录swap分区的基本信息。staticstructswap_header_v1{charbootbits[1024];/*Spacefordisklabeletc.*/unsig......
  • shell 登录linux服务器并执行命令
    注意里边(eeooff区域)不能定义变量#!/bin/bashscpdist.zipm-p:/data/wwwroot/medical-shop-websshm-p>/dev/null2>&1<<eeooffcd/data/wwwrootrm-rfdist_bakmvdistdist_bakunzipdist.zipexiteeooffechodone!进入容器操作不能用次方法,应该用docker......
  • 一分钟学一个 Linux 命令 - ps
    前言大家好,我是god23bin。欢迎来到《一分钟学一个Linux命令》系列,每天只需一分钟,记住一个Linux命令不成问题。今天要说的是ps命令。什么是ps命令?ps的英文全称是processstatus,意思是进程状态。ps命令是一个常用的Linux命令,用于查看当前系统中运行的进程信息。它......
  • Linux网络编程
    查看端口占用情况netstat-tunlp-t(tcp)仅显示tcp相关选项-u(udp)仅显示udp相关选项-n拒绝显示列名,能显示数字的全部转化为数字-l仅显示出在listen(监听)的服务状态-p显示潜力相关链接的程序名linux查看端口被哪个进程占用的方法本机地址127.0.0.1:这个地......
  • Kali Linux 下搭建ctfd靶场(报错解决)
    准备环境:获取一台运行Linux的服务器或虚拟机,确保具备足够的计算资源和网络连接。安装所需的软件和依赖项,如Python、pip等。安装CTFd:打开终端并使用以下命令克隆CTFd的GitHub存储库gitclonehttps://github.com/CTFd/CTFd.git进入克隆的CTFd目录:cdCTFd......
  • How to Install Nerd Fonts on Linux
    InstallingNerdFonts-Documentation(rockylinux.org) HowtoInstallNerdFontsonLinux(geekbits.io)没什么鸟用,我估计是因为我是wsl的关系,windows是不是要装这上面NerdFonts-Iconicfontaggregator,glyphs/iconscollection,&fontspatcher......
  • linux 定时任务 crontab更改当前用户
    1、参考CentOS7定时任务crontab入门Centos利用crontab定时执行任务及配置方法2、crontab-lcrontab-e#prodbackupdatabase#02***cd/data/xxxxx&&/usr/bin/shyy_backup.sh#prodautoupdatemanagecode#*****cd/data/xxxxx/&&sour......
  • linux & windows手机 (长期更新)
    原生非虚拟化 purismlibremPinephone ubuntutouchpostmarketOSwoa 旨在在Lumia安装完整Windows10arm,也可用在linuxarm的移植,仍然处于实验性阶段https://github.com/WOA-Projecthttps://woa-project.github.io/LumiaWOA/https://www.ithome.com/0/407/902.htm......
  • 浅谈生活中常见的三大应用程序架构(PE、ELF、Mach-O)、五大操作系统(windows、linux、mac
    ·今天不聊复杂的技术,就是想做一下科普。我们生活中常见的操作系统,大致有5种分别是 电脑: Windows linux    macos手机 androidiosWindows手机操作系统没有发展起来,不同的操作系统间软件不能......