Angr-Learn-0x2
注意
本文可以理解为官方文档的简单翻译+一部分个人理解
Loading a Binary
之前在Angr-Learn-0x1中,我们简单提到了CLE("CLE Loads Everything"),它的主要功能就是获取二进制依赖库来提供给angr。
装载器
这个代码简单说明了如何与装载器CLE交互
>>import angr, monkeyhex
>>proj = angr.Project('examples/fauxware/fauxware')
>>proj.loader
<Loaded fauxware, maps [0x400000:0x5008000]>
装载二进制文件
CLE加载器代表了加载的二进制文件的集合,加载并映射二进制文件到对应的内存空间。每个二进制对象都由可以处理其文件类型的加载器后端加载。比如说cle.ELF用于加载ELF文件。
我们可以通过loader.all_objects来获取已加载对象的完整列表:
# All loaded objects
>>> proj.loader.all_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>,
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
<ELF Object ld-2.23.so, maps [0x2000000:0x2227167]>,
<ELFTLSObject Object cle##tls, maps [0x3000000:0x3015010]>,
<ExternObject Object cle##externs, maps [0x4000000:0x4008000]>,
<KernelObject Object cle##kernel, maps [0x5000000:0x5008000]>]
# This is the "main" object, the one that you directly specified when loading the project
>>> proj.loader.main_object
<ELF Object fauxware, maps [0x400000:0x60105f]>
# This is a dictionary mapping from shared object name to object
>>> proj.loader.shared_objects
{ 'fauxware': <ELF Object fauxware, maps [0x400000:0x60105f]>,
'libc.so.6': <ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
'ld-linux-x86-64.so.2': <ELF Object ld-2.23.so, maps [0x2000000:0x2227167]> }
# Here's all the objects that were loaded from ELF files
# If this were a windows program we'd use all_pe_objects!
>>> proj.loader.all_elf_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>,
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
<ELF Object ld-2.23.so, maps [0x2000000:0x2227167]>]
# Here's the "externs object", which we use to provide addresses for unresolved imports and angr internals
>>> proj.loader.extern_object
<ExternObject Object cle##externs, maps [0x4000000:0x4008000]>
# This object is used to provide addresses for emulated syscalls
>>> proj.loader.kernel_object
<KernelObject Object cle##kernel, maps [0x5000000:0x5008000]>
# Finally, you can to get a reference to an object given an address in it
>>> proj.loader.find_object_containing(0x400000)
<ELF Object fauxware, maps [0x400000:0x60105f]>
我们可以通过这个对象来获取数据:
>>obj = proj.loader.main_object
# The entry point of the object
>>obj.entry
0x400580
>>obj.min_addr, obj.max_addr
(0x400000, 0x60105f)
# Retrieve this ELF's segments and sections
>>obj.segments
<Regions: [<ELFSegment memsize=0xa74, filesize=0xa74, vaddr=0x400000, flags=0x5, offset=0x0>,
<ELFSegment memsize=0x238, filesize=0x228, vaddr=0x600e28, flags=0x6, offset=0xe28>]>
>>obj.sections
<Regions: [<Unnamed | offset 0x0, vaddr 0x0, size 0x0>,
<.interp | offset 0x238, vaddr 0x400238, size 0x1c>,
<.note.ABI-tag | offset 0x254, vaddr 0x400254, size 0x20>,
...etc
# You can get an individual segment or section by an address it contains:
>>obj.find_segment_containing(obj.entry)
<ELFSegment memsize=0xa74, filesize=0xa74, vaddr=0x400000, flags=0x5, offset=0x0>
>>obj.find_section_containing(obj.entry)
<.text | offset 0x580, vaddr 0x400580, size 0x338>
# Get the address of the PLT stub for a symbol
>>addr = obj.plt['strcmp']
>>addr
0x400550
obj.reverse_plt[addr]
'strcmp'
# Show the prelinked base of the object and the location it was actually mapped into memory by CLE
>>obj.linked_base
0x400000
>>obj.mapped_base
0x400000
符号与重定位
我们可以使用CLE处理符号。符号是可执行文件的基本概念,我个人的简单理解就是函数名、变量名等等,符号能将名称有效的映射到地址。
从CLE获取符号的最简单方法就是是loader.find_symbol
:
>>strcmp = proj.loader.find_symbol('strcmp')
>>strcmp
<Symbol "strcmp" in libc.so.6 at 0x1089cd0>
符号最有用的属性是其名称、所有者和地址,但符号的“地址”可能不明确。Symbol 对象有三种报告其地址的方法:
.rebased_addr
是它在全局地址空间中的地址。这就是打印输出中显示的内容。.linked_addr
是它相对于二进制文件的预链接基址的地址。这是例如 中报告的地址readelf(1)
。.relative_addr
是它相对于对象基址的地址。这在文献(特别是 Windows 文献)中称为 RVA(相对虚拟地址)。
>>strcmp.name
'strcmp'
>>strcmp.owner
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>
>>strcmp.rebased_addr
0x1089cd0
>>strcmp.linked_addr
0x89cd0
>>strcmp.relative_addr
0x89cd0
并且,我们应该知道动态链接这一概念。在上述的例子中,libc提供了符号作为导出,主要的二进制文件依赖于它。因此,我们如果从加载的二进制文件获取符号信息,我们会得到它是一个导入符号,导入符号并没有与之相关有意义的地址,但CLE提供了用于解析题目的符号引用(.resolvedby
)。
>>strcmp.is_export
True
>>strcmp.is_import
False
# On Loader, the method is find_symbol because it performs a search operation to find the symbol.
# On an individual object, the method is get_symbol because there can only be one symbol with a given name.
>>main_strcmp = proj.loader.main_object.get_symbol('strcmp')
>>main_strcmp
<Symbol "strcmp" in fauxware (import)>
>>main_strcmp.is_export
False
>>main_strcmp.is_import
True
>>main_strcmp.resolvedby
<Symbol "strcmp" in libc.so.6 at 0x1089cd0>
重定位的相应导入符号可以作为 进行访问.symbol
。重定位将写入的地址可以通过任何可用于 Symbol 的地址标识符进行访问,并且.owner
还可以获得对请求重定位的对象的引用。
>>proj.loader.shared_objects['libc.so.6'].imports
{'__libc_enable_secure': <cle.backends.elf.relocation.amd64.R_X86_64_GLOB_DAT at 0x7ff5c5fce780>,
'__tls_get_addr': <cle.backends.elf.relocation.amd64.R_X86_64_JUMP_SLOT at 0x7ff5c6018358>,
'_dl_argv': <cle.backends.elf.relocation.amd64.R_X86_64_GLOB_DAT at 0x7ff5c5fd2e48>,
'_dl_find_dso_for_object': <cle.backends.elf.relocation.amd64.R_X86_64_JUMP_SLOT at 0x7ff5c6018588>,
'_dl_starting_up': <cle.backends.elf.relocation.amd64.R_X86_64_GLOB_DAT at 0x7ff5c5fd2550>,
'_rtld_global': <cle.backends.elf.relocation.amd64.R_X86_64_GLOB_DAT at 0x7ff5c5fce4e0>,
'_rtld_global_ro': <cle.backends.elf.relocation.amd64.R_X86_64_GLOB_DAT at 0x7ff5c5fcea20>}
加载器选项
具体可以查看CLE API 文档,而最常用且重要的部分选项如下:
backend
- 使用哪个后端,作为类或名称base_addr
- 要使用的基地址entry_point
- 使用的入口点arch
- 要使用的架构的名称
后端
CLE的后端主要用于区别要加载架构类型的二进制文件,如下:
backend name | description | requires arch ? |
---|---|---|
elf | Static loader for ELF files based on PyELFTools | no |
pe | Static loader for PE files based on PEFile | no |
mach-o | Static loader for Mach-O files. Does not support dynamic linking or rebasing. | no |
cgc | Static loader for Cyber Grand Challenge binaries | no |
backedcgc | Static loader for CGC binaries that allows specifying memory and register backers | no |
elfcore | Static loader for ELF core dumps | no |
blob | Loads the file into memory as a flat image | yes |