首页 > 其他分享 >Exadata X6-2,出现RS-7445 [Serv CELLSRV hang detected] [It will be restarted]

Exadata X6-2,出现RS-7445 [Serv CELLSRV hang detected] [It will be restarted]

时间:2023-03-28 13:34:08浏览次数:39  
标签:Serv 00 7445 dm01celadm12 RS CELLSRV bytes Cell

1、驻场的同事发现X6-2的某个存储节点,出现7445错误。

# cellcli -e list alerthistory

2023-03-27T23:01:44+08:00 critical "RS-7445 [Serv CELLSRV hang detected] [It will be restarted] [] [] [] [] [] [] [] [] [] []"

2、检查该存储节点的alert日志:

2023-03-27T23:01:44.912828+08:00
[RS] Monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt (pid: 16281) returned with error: 123
[RS] Service CELLSRV will be restarted.
Errors in file /opt/oracle/cell/log/diag/asm/cell/dm01celadm12/trace/rstrc_16269_omt.trc (incident=1):
RS-7445 [Serv CELLSRV hang detected] [It will be restarted] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/cell/log/diag/asm/cell/dm01celadm12/incident/incdir_1/rstrc_16269_omt_i1.trc

2023-03-27T23:01:45.172217+08:00
State dump signal delivered to CELLSRV<16314> by pid - 16269, uid - 0
State dump signal delivered to CELLSRV<16314> by RS.
2023-03-27T23:01:45.947036+08:00
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 924221440 bytes with size 16384 bytes membuf 0x6001d80be000, bioreq 0x600003dbf5d0 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 5584060416 bytes with size 131072 bytes membuf 0x601324800000, bioreq 0x6000042926c8 (errno: Input/output error [5])
Write Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 19931332608 bytes with size 512 bytes membuf 0x6001cbb51400, bioreq 0x600004647cf8 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 924221440 bytes with size 16384 bytes membuf 0x6001d6ea2000, bioreq 0x6002cde5cb48 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 924221440 bytes with size 16384 bytes membuf 0x6001d91de000, bioreq 0x600004526538 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 4483727360 bytes with size 16384 bytes membuf 0x6001d885a000, bioreq 0x600003e77518 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 33554432 bytes with size 512 bytes membuf 0x6001cbbece00, bioreq 0x6002cbe1c108 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 33554432 bytes with size 512 bytes membuf 0x6001cbad3400, bioreq 0x6002cc8b5ab8 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 5584060416 bytes with size 131072 bytes membuf 0x601379500000, bioreq 0x6002d0c7f578 (errno: Input/output error [5])
Read Error on Cell Disk FD_00_dm01celadm12 (/dev/nvme3n1) at device offset 4483727360 bytes with size 16384 bytes membuf 0x6001d7bc6000, bioreq 0x600003da6d60 (errno: Input/output error [5])
Max number of IO Error messages for FD_00_dm01celadm12 have been logged, further IO error messages for this device are temporary disabled
Mon Mar 27 23:01:45 2023 961 msec State dump completed for CELLSRV<16314>
2023-03-27T23:02:13.900399+08:00
[RS] Stopped Service CELLSRV
2023-03-27T23:02:13.911836+08:00
[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt with pid 12591
[RS] Previously detected 1 hang(s) for service CELLSRV. Using heartbeat timeout of 8 seconds.

可以看出,在报RS-7445错误时,/dev/nvme3n1这块FlashDISK出现IO读失败。

3、搜索MOS网站,可以找到MOS文档《Exadata: Cell Service crash with RS-7445 [SERV CELLSRV HANG DETECTED] during a flash disk failure (Doc ID 2486713.1)》 和 《Exadata: Database performance issues or outages after a flash disk failure, Cell Service may crash with RS-7445 [Serv CELLSRV hang detected] (Doc ID 2584475.1)》。

简单地说,就是FlashDISK出现IO失败,导致CELLSRV服务hang住。

 

4、后期需要升级存储软件版本,解决CELLSRV服务hang住的问题。

 

标签:Serv,00,7445,dm01celadm12,RS,CELLSRV,bytes,Cell
From: https://www.cnblogs.com/missyou-shiyh/p/17264816.html

相关文章

  • carsim与simulink联合仿真 差动驱动 两轮独立驱动电动汽车控制策略
    carsim与simulink联合仿真(3)——差动驱动两轮独立驱动电动汽车控制策略。分为低速和高速两种策略优化分配驱动力矩,低速基于阿克曼转向的差速控制,高速的分上下两层控制器,上......
  • java reflection exception--can not access a member of class XXX with modifiers "
    lookatthesampleprogrambelow.ItworkswhenIrunit.Field[]fields=reflectAllFields(parameter);for(Fieldfield:fields){if("createTime".equa......
  • RS485采集电表DLT645-1997/2007协议数据存入数据库方案
    DAQforIIOT通用工业数据采集系统是一套运行在边缘计算机、工业网关或普通电脑上的设备数据采集管理软件,主要用于对各种工业仪器设备、电表、PLC、注塑机、数控机床等数据......
  • 深入认识Tigase XMPP Server(上)
    深入认识TigaseXMPPServer(上)作者:chszs本文的目的是深入认识TigaseXMPPServer的特性。1、TigaseHTTPAPI实现XMPP和HTTP之间的桥梁,可通过REST调用实现对Tigase安装的管......
  • buuctf.crypto.rsarsa
    已知p,q,e(公钥),n(模数)加密的数据c求解密的数据importgmpy2p=96484230290105156765905517400104265349457376392357398006439893520398525072984913995610350091634......
  • web开发报错合集(基于openlayers)
    1、UncaughtTypeError:Cannotreadpropertiesofundefined(reading'getViewport')  代码:(位于代码页最上面)1//添加一个按钮,用于触发样式函数2......
  • crypto.rsa系列
    大佬的成品https://www.onctf.com/posts/d38358f9.html#2、rsarsa题目已知p,q,公钥E求出私钥D(逆元)buuctf.rsahttps://www.cnblogs.com/re4mile/p/17263231.html题......
  • DHTServer
    /***每个DHTServer对应多个本地DHT节点,每个本地DHT节点监听一个端口,每个DHTServer对象都有一个工作线程worker,*这个线程负责当前DHTServer对象维护的所有节点......
  • buuctf.crypto.rsa
    加密input[i]=pow(input[i],公钥)%33;解密input[i]=pow(input[i],私钥)%33;术语公钥:E模数:N私钥:D加密用(E,N)解密用(D,N)公钥私钥的制作选出2个质数p,q......
  • @Component,@Service,@Controller,@Repository注解
    Spring2.5中除了提供@Component注释外,还定义了几个拥有特殊语义的注释,它们分别是:@Repository、@Service和@Controller。在目前的Spring版本中,这3个注释和@Compo......