首页 > 系统相关 >Linux基础——BClinux8.2 排查vmcore异常宕机问题

Linux基础——BClinux8.2 排查vmcore异常宕机问题

时间:2024-05-06 17:57:25浏览次数:38  
标签:crash x86 宕机 35 000 vmcore usr Linux 64

 

一、无法/var/crash生成文件

1、参考配置:

https://cloud.tencent.cn/developer/article/2367955

 

2、BCoe8.2调整配置

 

 

 

3、手动生成crash

i.参考:参数详解

https://blog.csdn.net/tombaby_come/article/details/134038949

echo 1 > /proc/sys/kernel/sysrq

echo c > /proc/sysrq-trigger

注意:执行上述配置,主机重启,开始转储内存中数据到/var/crash目录中。

 

4、检查kdump

i.参考:kdump原理

https://zhuanlan.zhihu.com/p/684699511

 

二、crash工具和vmlinux内核一致性检查

1、检查/boot/vmlinuz-4.19.0-240.23.35.el8_2.bclinux.x86_64和/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux的md5值必需保持一致

 

2、主机内核vmlinux位置

/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

 

3、异常宕机vmcore文件所在位置

/var/crash/127.0.0.1-2024-05-06-03\:24\:36/vmcore

 

 

 

三、分析vmcore

 

1、crash工具打开vmcore

 

[root@NewOSBC8 127.0.0.1-2024-05-06-03:24:36]# crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

crash 7.2.7-3.el8.1
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [178MB]: patching 97096 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 2289
     COMMAND: "bash"
        TASK: ffff8d1122bf0000  [THREAD_INFO: ffff8d1122bf0000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 2289   TASK: ffff8d1122bf0000  CPU: 0   COMMAND: "bash"
 #0 [ffffa2ab80cefbe8] machine_kexec at ffffffff8c25fabe
 #1 [ffffa2ab80cefc40] __crash_kexec at ffffffff8c3658ba
 #2 [ffffa2ab80cefd00] crash_kexec at ffffffff8c36678d
 #3 [ffffa2ab80cefd18] oops_end at ffffffff8c2259fd
 #4 [ffffa2ab80cefd38] no_context at ffffffff8c26fd4e
 #5 [ffffa2ab80cefd90] do_page_fault at ffffffff8c270872
 #6 [ffffa2ab80cefdc0] page_fault at ffffffff8cc0122e
    [exception RIP: sysrq_handle_crash+18]
    RIP: ffffffff8c74eb12  RSP: ffffa2ab80cefe78  RFLAGS: 00010246
    RAX: ffffffff8c74eb00  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8d1131017108  RDI: 0000000000000063
    RBP: 0000000000000004   R8: 00000000000005ce   R9: 000000000000002d
    R10: 0000000000000000  R11: ffffa2ab80cefd30  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffff8d53c3e0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa2ab80cefe78] __handle_sysrq.cold.10 at ffffffff8c74f6f8
 #8 [ffffa2ab80cefea8] write_sysrq_trigger at ffffffff8c74f5bb
 #9 [ffffa2ab80cefeb8] proc_reg_write at ffffffff8c55de29
#10 [ffffa2ab80cefed0] vfs_write at ffffffff8c4e0db5
#11 [ffffa2ab80ceff00] ksys_write at ffffffff8c4e102f
#12 [ffffa2ab80ceff38] do_syscall_64 at ffffffff8c2041ab
#13 [ffffa2ab80ceff50] entry_SYSCALL_64_after_hwframe at ffffffff8cc000ad
    RIP: 00007f515c78ab28  RSP: 00007ffc1172a678  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f515c78ab28
    RDX: 0000000000000002  RSI: 000055b65d8c05c0  RDI: 0000000000000001
    RBP: 000055b65d8c05c0   R8: 000000000000000a   R9: 00007f515c81bc80
    R10: 000000000000000a  R11: 0000000000000246  R12: 00007f515ca5b6c0
    R13: 0000000000000002  R14: 00007f515ca56880  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> dis -l sysrq_handle_crash+18
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> dis -l 0xffffffff8c74eb12
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   458790       1.8 GB         ----
         FREE   194411     759.4 MB   42% of TOTAL MEM
         USED   264379         1 GB   57% of TOTAL MEM
       SHARED    50717     198.1 MB   11% of TOTAL MEM
      BUFFERS      530       2.1 MB    0% of TOTAL MEM
       CACHED   103545     404.5 MB   22% of TOTAL MEM
         SLAB    31239       122 MB    6% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP   532479         2 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE   532479         2 GB  100% of TOTAL SWAP

 COMMIT LIMIT   761874       2.9 GB         ----
    COMMITTED   511634         2 GB   67% of TOTAL LIMIT
crash> sys
      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
crash> p cpu_info:1
per_cpu(cpu_info, 1) = $1 = {
  x86 = 23 '\027',
  x86_vendor = 2 '\002',
  x86_model = 104 'h',
  x86_stepping = 1 '\001',
  x86_tlbsize = 3072,
  x86_virt_bits = 48 '0',
  x86_phys_bits = 45 '-',
  x86_coreid_bits = 0 '\000',
  cu_id = 255 '\377',
  extended_cpuid_level = 2147483680,
  cpuid_level = 16,
  x86_capability = {126614527, 802421759, 0, 129319184, 4277678595, 0, 4195321, 376123396, 557056, 563872169, 15, 0, 0, 17584641, 4, 0, 4194308, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 229696, 0},
  x86_vendor_id = "AuthenticAMD\000\000\000",
  x86_model_id = "AMD Ryzen 7 5700U with Radeon Graphics\000        \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  x86_cache_size = 512,
  x86_cache_alignment = 64,
  x86_cache_max_rmid = -1,
  x86_cache_occ_scale = -1,
  x86_power = 256,
  loops_per_jiffy = 1796624,
  x86_max_cores = 1,
  apicid = 2,
  initial_apicid = 2,
  x86_clflush_size = 64,
  booted_cores = 1,
  phys_proc_id = 2,
  logical_proc_id = 1,
  cpu_core_id = 0,
  cpu_index = 1,
  microcode = 0,
  x86_cache_bits = 45 '-',
  initialized = 1,
  cpuinfo_x86_extended_size_rh = 0,
  _rh = {
    cpu_die_id = 0,
    logical_die_id = 1,
    vmx_capability = {0, 0, 0}
  }
}
crash>  ps 1489
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
   1489   1382   0  ffff8d110eb20000  IN  11.9 3106588 249348  llvmpipe-1
crash>

 

crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vm                          linux

vmcore生成时间:DATE: Mon May  6 03:24:31 2024

中断原因:PANIC: "sysrq: SysRq : Trigger a crash"

 

2、查看中断寄存器地址和函数RIP

i.分析当时正在运行哪些应用调用函数sysrq_handle_crash,导致中断卡死问题;

ii.参考:

https://blog.csdn.net/weixin_43564241/article/details/130692946

 

3、查看用户层应用的调用代码

i.通过“[exception RIP: sysrq_handle_crash+18]”标黄部分查看调用代码;

 

 

4、查看宕机时内存使用情况

 

5、用户侧触发

i.手动触发了内存中数据的转储到/var/crash中。

 

标签:crash,x86,宕机,35,000,vmcore,usr,Linux,64
From: https://www.cnblogs.com/gkhost/p/18175533

相关文章

  • Linux系统yum只下载不安装
    方法一:用yum自带的参数选项,如果没有自带这个,需要安装yum-plugin-downloadonly安装包。用yumlist去找个软件包测试下,就用这个NetworkManager吧。yum-yinstall--downloadonly--downloaddir/tmp/NetworkManager搞定了,自带的参数就可以搞定,--downloadonly是只下载不安装......
  • Linux 和 Windows11双系统安装
    阅读目录1.下载Linux系统2.制作Linux系统启动盘3.电脑分盘4.安装Ubuntu系统5.安装完成后可能遇到的黑屏问题解决回到顶部1.下载Linux系统下载Linux系统的镜像文件,Linux系统有很多版本,Linux指的是系统内核,笔者下载的是Ubuntu系统,大家可以根据需要下载其他系统。Ubuntu......
  • 宝塔Linux面板redis服务开多个端口
    原文:https://blog.csdn.net/weixin_38272324/article/details/126421073首先找到redis文件夹 ,找到redis.conf文件,复制一份,重命名为redis_6380.conf;#进入redis文件夹cd/www/server/redis#编辑文件vimredis_6380.conf#按i进入insert模式#修改内容#93行port6380#159......
  • AnsysEM安装教程(Linux)
    前期准备解压相关安装文件unzipCrack.zip将安装iso文件挂载到/mnt/AnsysEM目录下sudomkdir/mnt/AnsysEmsudomountAnsys.Electronics.2021.R1.Linux64.iso/mnt/AnsysEM/-oloop转到/mnt/AnsysEM/目录下,打开Readme文件查看安装步骤教程创建AnsysEM安装目录sudomkd......
  • 【转】在 Linux 里布署 Docker
    来自:百度Docker可以布署在Linux系统上,也可以布署在你自己的电脑上。在Linux系统上布署Docker:安装Docker:curl-fsSLhttps://get.docker.com-oget-docker.shsudoshget-docker.sh启动Docker服务:sudosystemctlstartdocker使Docker开机自启:sudosyste......
  • 在Linux中,如何配置和使用KVM?
    在Linux中配置和使用KVM(Kernel-basedVirtualMachine)涉及几个关键步骤,包括检查硬件兼容性、安装必要的软件包、配置网络、创建及管理虚拟机等。下面是一个简化的指南,帮助你在Linux上部署KVM:1.检查硬件兼容性确保你的CPU支持硬件虚拟化技术,如Intel的VT-x或AMD的AMD-V。可以在终......
  • 在Linux中,什么是虚拟化?并且列出常见的虚拟化技术。
    在Linux中,虚拟化是指创建虚拟版本的物理计算机系统(如硬件资源和操作系统)的技术。虚拟化技术允许单个物理机器运行多个操作系统,每个操作系统都有自己的虚拟硬件环境。这种技术可以提高资源利用率、灵活性和效率,并且可以简化管理任务。1.虚拟化的基本概念:虚拟机(VM):虚拟化的基本......
  • 在Linux中,如何配置和使用Xen?
    在Linux中配置和使用Xen虚拟化技术涉及多个步骤。以下是一个详细的指南:1.安装Xen使用包管理器安装:在支持Xen的Linux发行版上,你可以使用包管理器(如yum、apt等)来安装Xen软件包。例如,在基于RPM的发行版上,你可以使用以下命令来安装Xen相关的软件包:sudoyuminstallkernel-xen......
  • 在Linux中,什么是集群,并且列出常见的集群技术。
    在Linux环境中,集群指的是由多台计算机(称为节点)通过高速网络连接构成的一个松耦合或紧耦合系统,这些计算机协同工作以实现特定的目标,如提高计算能力、增加服务可用性、实现负载均衡或者增强数据存储的可靠性。Linux集群可以被设计来满足不同的需求,例如高性能计算(HPC)、高可用性(HA)......
  • 在Linux中,Docker和容器虚拟概念是什么?
    Docker是一种开源的容器化平台,它允许开发者将应用及其依赖打包到一个轻量级、可移植的容器中,然后在任何支持Docker的系统上运行。容器虚拟化是一种与完全虚拟化(如KVM或Xen)不同的虚拟化技术,它提供了操作系统级别的虚拟化。1.容器虚拟化概念容器是一种轻量级、可移植的软件单元......