首页 > 其他分享 >coredump快速定位

coredump快速定位

时间:2022-09-24 22:33:37浏览次数:88  
标签:定位 code kernel read coredump segfault error address 快速

https://utcc.utoronto.ca/~cks/space/blog/linux/KernelSegfaultMessageMeaning

What the Linux kernel's messages about segfaulting programs mean on 64-bit x86

February 8, 2018

For quite a while the Linux kernel has had an option to log a kernel message about every faulting user program, and it probably defaults to on in your Linux distribution. I've seen these messages fly by for years, but for reasons beyond the scope of this entry I've recently wanted to understand what they mean in some moderate amount of detail.

I'll start with a straightforward and typical example, one that I see every time I build and test Go (as this is a test case that is supposed to crash):

testp[19288]: segfault at 0 ip 0000000000401271 sp 00007fff2ce4d210 error 4 in testp[400000+98000]

The meaning of this is:

  • 'testp[19288]' is the faulting program and its PID
  • 'segfault at 0' tells us the memory address (in hex) that caused the segfault when the program tried to access it. Here the address is 0, so we have a null dereference of some sort.
  • 'ip 0000000000401271' is the value of the instruction pointer at the time of the fault. This should be the instruction that attempted to do the invalid memory access. In 64-bit x86, this will be register %rip (useful for inspecting things in GDB and elsewhere).
  • 'sp 00007fff2ce4d210' is the value of the stack pointer. In 64-bit x86, this will be %rsp.
  • 'error 4' is the page fault error code bits from traps.h in hex, as usual, and will almost always be at least 4 (which means 'user-mode access'). A value of 4 means it was a read of an unmapped area, such as address 0, while a value of 6 (4+2) means it was a write of an unmapped area.
  • 'in testp[400000+98000]' tells us the specific virtual memory area that the instruction pointer is in, specifying which file it is (here it's the executable), the starting address that VMA is mapped at (0x400000), and the size of the mapping (0x98000).

With a faulting address of 0 and an error code of 4, we know this particular segfault is a read of a null pointer.

Here's two more error messages:

bash[12235]: segfault at 1054808 ip 000000000041d989 sp 00007ffec1f1cbd8 error 6 in bash[400000+f4000]

'Error 6' means a write to an unmapped user address, here 0x1054808.

bash[11909]: segfault at 0 ip 00007f83c03db746 sp 00007ffccbeda010 error 4 in libc-2.23.so[7f83c0350000+1c0000]

Error 4 and address 0 is a null pointer read but this time it's in some libc function, not in bash's own code, since it's reported as 'in libc-2.23.so[...]'. Since I looked at the core dump, I can tell you that this was in strlen().

On 64-bit x86 Linux, you'll get a somewhat different message if the problem is actually with the instruction being executed, not the address it's referencing. For example:

bash[2848] trap invalid opcode ip:48db90 sp:7ffddc8879e8 error:0 in bash[400000+f4000]

There are a number of such trap types set up in traps.c. Two notable additional ones are 'divide error', which you get if you do an integer division by zero, and 'general protection', which you can get for certain extremely wild pointers (one case I know of is when your 64-bit x86 address is not in 'canonical form'). Although these fields are formatted slightly differently, most of them mean the same thing as in segfaults. The exception is 'error:0', which is not a page fault error code. I don't understand the relevant kernel code enough to know what it means, but if I'm reading between the lines correctly in entry_64.txt, then it's either 0 (the usual case) or an error code from the CPU. Here is one possible list of exceptions that get error codes.

Sometimes these messages can be a little bit unusual and surprising. Here is a silly sample program and the error it produces when run. The code:

#include <stdio.h>
int main(int argc, char **argv) {
   int (*p)();
   p = 0x0;
   return printf("%d\n", (*p)());
}

If compiled (without optimization is best) and run, this generates the kernel message:

a.out[3714]: segfault at 0 ip           (null) sp 00007ffe872aa418 error 14 in a.out[400000+1000]

The '(null)' bit turns out to be expected; it's what the general kernel printf() function generates when asked to print something as a pointer and it's null (as seen here). In our case the instruction pointer is 0 (null) because we've made a subroutine call through a null pointer and thus we're trying to execute code at address 0. I don't know why the 'in ...' portion says that we're in the executable (although in this case the call actually was there).

The error code of 14 is in hex, which means that as bits it's 010100. This is a user mode read of an unmapped area (our usual '4' case), but it's an instruction fetch, not a normal data read or write. Any error 14s are a sign of some form of mangled function call or a return to a mangled address because the stack has been mashed.

(These bits turn out to come straight from the CPU's page fault IDT.)

For 64-bit x86 Linux kernels (and possibly for 32-bit x86 ones as well), the code you want to look at is show_signal_msg in fault.c, which prints the general 'segfault at ..' message, do_trap and do_general_protection in traps.c, which print the 'trap ...' messages, and print_vma_addr in memory.c, which prints the 'in ...' portion for all of these messages.

Sidebar: The various error code bits as numbers

+1 protection fault in a mapped area (eg writing to a read-only mapping)
+2 write (instead of a read)
+4 user mode access (instead of kernel mode access)
+8 use of reserved bits in the page table entry detected (the kernel will panic if this happens)
+16 (+0x10) fault was an instruction fetch, not data read or write
+32 (+0x20) 'protection keys block access' (don't ask me)

Hex 0x14 is 0x10 + 4; (hex) 6 is 4 + 2. Error code 7 (0x7) is 4 + 2 + 1, a user-mode write to a read-only mapping, and is what you get if you attempt to write to a string constant in C:

char *ex = "example";
int main(int argc, char **argv) {
   *ex = 'E';
}

Compile and run this and you will get:

a.out[8832]: segfault at 400540 ip 0000000000400499 sp 00007ffce6831490 error 7 in a.out[400000+1000]

It appears that the program code always gets loaded at 0x400000 for ordinary programs, although I believe that shared libraries can have their location randomized.

PS: Per a comment in the kernel source, all accesses to addresses above the end of user space will be labeled as 'protection fault in a mapped area' whether or not there are actual page table entries there. The kernel does this so you can't work out where its memory pages are by looking at the error code.

(I believe that user space normally ends around 0x07fffffffffff, per mm.txt, although see the comments about TASK_SIZE_MAX in processor.h and also page_64_types.h.)

标签:定位,code,kernel,read,coredump,segfault,error,address,快速
From: https://www.cnblogs.com/maojun1998/p/16726859.html

相关文章

  • scala 快速上手
    Scala特性基于JVM:可以与Java混合编程,且可相互调包。类型推测:不需要显式定义数据类型,\(var\)表示变量,\(val\)表示常量。并发和分布式(Actor,类似Java中的多线程T......
  • Servlet快速入门
     创建Servlet:创建web项目,导入Servlet依赖坐标<dependency><groupld>javax.servlet</groupld><artifactld>javax.servlet-api</artifactld><version>3.1.0</version>......
  • Maven快速配置和入门
    概念Maven其实就是一个管理项目、构建项目的工具。它有标准化的项目结构、构建流程、依赖管理。功能Maven提供了一套标准的项目结构Maven提供了一套标准的构建流程Ma......
  • LOJ #162. 快速幂 2
    题意要求一个\(O(\sqrtP)-O(1)\)的快速幂。幂可以用扩展欧拉定理规约到\([1,P-1]\)中。分析分块。定个阈值\(B=\sqrtP\)+1。\(a^t=a^{t\bmodB}\cdot......
  • 快速排序
    简介快速排序(Quicksort)是对冒泡排序的一种改进。基本思想是:通过一趟排序将要排序的数据分割成独立的两部分,其中一部分的所有数据都比另外一部分的所有数据都要小,然后再......
  • 前后端开发模式、API接口、接口测试工具postman、restful规范、序列化和反序列化、dja
    目录前后端开发模式一、两种模式1.传统开发模式:前后端混合开发1.1.缺点:2.前后端分离开发模式2.1.特点3.补充老刘的相关博客:二、API接口1.作用2.说明三、接口测试工具postm......
  • API接口、接口测试工具postman、restful规范、序列化与反序列化、djangorestframework
    API接口通过网络,规定了前台后台信息交流规则的url链接,也就是前后台信息交互的媒介API接口的样子url:长得像返回数据的url链接https://api.map.baidu.com/place/v2/s......
  • 易优cms忘记后台登陆密码?解决方法 Eyoucms快速入门
    忘记后台管理密码怎么搞? 方法一:下载官方易优修改重置后台密码小工具 https://www.eyoucms.com/uploads/soft/200319/1-2003191Q000.zip下载附件后解压,将setpwd.php......
  • 零基础使用 WebGL — 快速入门
    WebGL作为一个非常底层的API,学习和使用起来非常困难,因为WebGL需要大量的背景知识。网上教程一般都是介绍API就开始渲染,介绍不多,容易让人迷惑,也容易被劝退。即使你学会......
  • 快速稳定盈利的四个步骤
    控制自己不会有大幅度的回撤是稳定盈利的第一步。如何去控制回撤呢?第一是你要记录你对所有模式的实操情况。然后记录下来。比如你用半路战法的时候最大的一次日回撤是多少......