首页 > 数据库 >存储宕机导致Oracle异常故障处理---惜分飞

存储宕机导致Oracle异常故障处理---惜分飞

时间:2024-07-28 22:41:52浏览次数:13  
标签:2024 recovery 宕机 分飞 xff --- xff1 DATA ORA

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:由于存储宕机导致Oracle异常故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

存储突然掉线,导致数据库crash,报大量ORA-00206 ORA-00202 ORA-15081以及Linux-x86_64 Error: 5: Input/output error之类的错误

Sun Jul 21 20:00:11 2024 Thread 1 advanced to log sequence 1594398 (LGWR switch)   Current log# 5 seq# 1594398 mem# 0: +DATA/xff/onlinelog/group_5.412.906718739 Sun Jul 21 20:53:17 2024 WARNING: Write Failed. group:2 disk:0 AU:506916 offset:49152 size:16384 Sun Jul 21 20:53:17 2024 WARNING: Read Failed. group:2 disk:2 AU:506931 offset:49152 size:16384 WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 415 in group [2.34109396] from disk ORACLE_DATA_0002  allocation unit 506931 reason error; if possible, will try another mirror side Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc: ORA-15080: 与磁盘的同步 I/O 操作失败 ORA-27061: 异步 I/O 等待失败 Linux-x86_64 Error: 5: Input/output error Additional information: -1 Additional information: 16384 WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 415 in group 2 on disk 0 allocation unit 506916 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc: ORA-00206: 写入控制文件时出错 (块 3, # 块 1) ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737'' ORA-15081: 无法将 I/O 操作提交到磁盘 ORA-15081: 无法将 I/O 操作提交到磁盘 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc: ORA-00221: 写入控制文件时出错 ORA-00206: 写入控制文件时出错 (块 3, # 块 1) ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737'' ORA-15081: 无法将 I/O 操作提交到磁盘 ORA-15081: 无法将 I/O 操作提交到磁盘 CKPT (ospid: 42142): terminating the instance due to error 221 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_lmon_42087.trc: ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737'' ORA-15081: 无法将 I/O 操作提交到磁盘 ORA-27072: 文件 I/O 错误 Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 1038194784 Additional information: -1 Sun Jul 21 20:53:19 2024 ORA-1092 : opitsk aborting process Sun Jul 21 20:53:24 2024 ORA-1092 : opitsk aborting process Sun Jul 21 20:53:24 2024 License high water mark = 59 Sun Jul 21 20:53:28 2024 Instance terminated by CKPT, pid = 42142 USER (ospid: 64660): terminating the instance Instance terminated by USER, pid = 64660

存储恢复之后启动数据库报ORA-600 2131错误

Mon Jul 22 09:10:04 2024 ALTER DATABASE   MOUNT This instance was first to mount Mon Jul 22 09:10:04 2024 Sweep [inc][490008]: completed Sweep [inc2][490008]: completed NOTE: Loaded library: System SUCCESS: diskgroup ORACLE_DATA was mounted NOTE: dependency between database rac and diskgroup resource ora.ORACLE_DATA.dg is established Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_14301.trc  (incident=492409): ORA-00600: ??????, ??: [2131], [33], [32], [], [], [], [], [], [], [], [], [] Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_492409/xff1_ora_14301_i492409.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. ORA-600 signalled during: ALTER DATABASE   MOUNT...

客户尝试重建ctl进行恢复,结果由于分析不正确,导致在重建ctl的时候,遗漏了3个数据文件,并且在屏蔽一致性的情况下,强制resetlogs操作,结果数据库没有被正常打开,而是报ORA-600 2662错误

alter database open resetlogs RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. RESETLOGS after incomplete recovery UNTIL CHANGE 9965567206652 Clearing online redo logfile 1 +DATA/xff/onlinelog/group_1.414.906718739 Clearing online log 1 of thread 1 sequence number 0 Clearing online redo logfile 1 complete Clearing online redo logfile 2 +DATA/xff/onlinelog/group_2.413.906718739 Clearing online log 2 of thread 1 sequence number 0 Clearing online redo logfile 2 complete Clearing online redo logfile 5 +DATA/xff/onlinelog/group_5.412.906718739 Clearing online log 5 of thread 1 sequence number 0 Clearing online redo logfile 5 complete Expanded controlfile section 2 from 1 to 63 records The number of logical blocks in section 2 remains the same Expanded controlfile section 1 from 4 to 66 records Requested to grow by 62 records; added 32 blocks of records Expanded controlfile section 30 from 1 to 63 records The number of logical blocks in section 30 remains the same Expanded controlfile section 29 from 1 to 63 records The number of logical blocks in section 29 remains the same Control file has been expanded to support 63 threads Mon Jul 22 23:04:07 2024 Redo thread 2 enabled by open resetlogs or standby activation Online log +DATA/xff/onlinelog/group_1.414.906718739: Thread 1 Group 1 was previously cleared Online log +DATA/xff/onlinelog/group_2.413.906718739: Thread 1 Group 2 was previously cleared Online log +DATA/xff/onlinelog/group_3.501.1175036643: Thread 2 Group 3 was previously cleared Online log +DATA/xff/onlinelog/group_4.502.1175036645: Thread 2 Group 4 was previously cleared Online log +DATA/xff/onlinelog/group_5.412.906718739: Thread 1 Group 5 was previously cleared Mon Jul 22 23:04:08 2024 Setting recovery target incarnation to 2 Initializing SCN for created control file Database SCN compatibility initialized to 3 Warning - High Database SCN: Current SCN value is 9965567206655, threshold SCN value is 0 If you have not previously reported this warning on this database, please notify Oracle Support so that additional diagnosis can be performed. Mon Jul 22 23:04:09 2024 Assigning activation ID 2763017873 (0xa4b04e91) Thread 1 opened at log sequence 1   Current log# 1 seq# 1 mem# 0: +DATA/xff/onlinelog/group_1.414.906718739 Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Mon Jul 22 23:04:10 2024 SMON: enabling cache recovery Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc  (incident=624374): ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], [] Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_624374/xff1_ora_64210_i624374.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc: ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], [] Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc: ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 64210): terminating the instance due to error 600 Instance terminated by USER, pid = 64210 ORA-1092 signalled during: alter database open resetlogs...

操作到这里,后续问题就比较麻烦了,因为在asm磁盘组中数据文件重建ctl的时候遗漏3个并且还被resetlogs操作过,导致这三个文件的resetlogs scn和其他数据文件不一致,对于这个问题,解决办法通过Oracle Recovery Tools工具或者bbed修改相关resetlogs scn,然后重建ctl

SQL> @rectl.sql   Control file created.   SQL> RECOVER DATABASE; Media recovery complete

然后解决之前数据库启动报ORA-600 2662问题,通过修改数据库scn进行解决,可以使用Patch_SCN工具进行快速解决,然后open数据库成功

SQL> ALTER DATABASE OPEN;    Database altered.

但是查看alert日志数据库报大量ORA-600 4194、ORA-01595Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xC21D511] [PC:0x97F4EFA, kgegpa()+40]之类错误

Wed Jul 24 15:24:21 2024 alter database open Beginning crash recovery of 1 threads  parallel recovery started with 32 processes Started redo scan Completed redo scan  read 0 KB redo, 0 data blocks need recovery ………… Database Characterset is ZHS16GBK No Resource Manager plan active Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc  (incident=777938): ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Wed Jul 24 15:24:40 2024 QMNC started with pid=79, OS id=54632 Block recovery from logseq 2, block 74 to scn 9965587206835 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/xff/onlinelog/redo02 LOGSTDBY: Validating controlfile with logical metadata Wed Jul 24 15:24:40 2024 Block recovery stopped at EOT rba 2.82.16 Block recovery completed at rba 2.82.16, scn 2320.1263080114 Block recovery from logseq 2, block 74 to scn 9965587206833 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/xff/onlinelog/redo02 Block recovery completed at rba 2.82.16, scn 2320.1263080114 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc: ORA-01595: 释放区 (4) 回退段 (20) 时出错 ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], [] LOGSTDBY: Validation complete Wed Jul 24 15:24:41 2024 Sweep [inc][777938]: completed Sweep [inc2][777938]: completed Wed Jul 24 15:24:41 2024 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778362): ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Starting background process SMCO Wed Jul 24 15:24:42 2024 SMCO started with pid=83, OS id=54691 Block recovery from logseq 2, block 74 to scn 9965587206835 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/xff/onlinelog/redo02 Block recovery completed at rba 2.82.16, scn 2320.1263080118 Block recovery from logseq 2, block 74 to scn 9965587206838 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/xff/onlinelog/redo02 Block recovery completed at rba 2.83.16, scn 2320.1263080119 Error 600 in kwqmnpartition(), aborting txn Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778363): ORA-25319: 队列表重新分区已中止 Completed: alter database open Block recovery from logseq 2, block 74 to scn 9965587206835 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/rac/onlinelog/redo02 Block recovery completed at rba 2.82.16, scn 2320.1263080118 Block recovery from logseq 2, block 74 to scn 9965587207538 Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0   Mem# 0: +DATA/rac/onlinelog/redo02 Block recovery completed at rba 2.1097.16, scn 2320.1263080819 Errors in file /users/oracle/app/db/diag/rdbms/rac/rac1/trace/rac1_cjq0_55657.trc  (incident=778427): ORA-00600: 内部错误代码, 参数: [600], [ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []], [], [], [], [], [], [], [], [], [], [] Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_778427/xff1_cjq0_55657_i778427.trc Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F4EFA, kgegpa()+40][flags: 0x0, count: 1] Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2] Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]

从报错分析是由于undo异常导致,处理异常undo回滚段之后,数据库open正常,安排逻辑迁移数据,完成本次恢复

标签:2024,recovery,宕机,分飞,xff,---,xff1,DATA,ORA
From: https://www.cnblogs.com/xifenfei/p/18329019

相关文章

  • [学习笔记] 阶 & 原根 - 数论
    较为冷门(?)的数论知识,但在解决一些特殊问题上有着重要的作用。整数的阶根据欧拉定理有正整数\(n\)和一个与\(n\)互素的整数\(a\),那么有$a^{\phi(n)}\equiv1\pmod{n}$。因此至少存在一个整数满足这个方程。并且由良序原理可得一定存在一个最小正整数满足这个方程。、......
  • 攻防世界-简单的图片
    攻防世界-简单的图片1首先拿到图片是一张非常朴实的图片用010打开发现没什么特别之处因为是png图片猜测是lsb隐写,这里一开始我用的zsteg发现提取不全用Stegsolve打开看看,尝试了半天发现什么特殊信息,最后发现位平面顺序即bitplaneorder这一选项要选择BGR,得到下图发现有......
  • Nacos-微服务注册中⼼(Nacos简介 Nacos配置管理)
    目录一、 微服务的注册中⼼1.注册中⼼的主要作⽤ 2.常⻅的注册中⼼二、Nacos简介 nacos实战⼊⻔1.搭建nacos环境2.将订单微服务注册到nacos2.1 在pom.xml中添加nacos的依赖2.2在主类上添加@EnableDiscoveryClient注解2.3在application.yml中添加nacos服务......
  • OI 回忆录 2 - 我的学长们
    cxr说起学长,这就不得不先提到cxr了。他自己本身就非常强,高一进入集训队,创造qzez历史,于是在高二的时候非常照顾初三的我们日常训练,包括帮助调题、提供模拟赛等等方方面面。每次来到机房的时候cxr总是坐在固定的那个位置,cxr就是我们的老大哥,我们的主心骨。cxr在机房的时......
  • win平台利用winsw将php-cgi作为系统服务,支持服务的正常启动/停止/重启
    首先,需要有winsw,在GitHub搜索winsw,点击release跳转到下载页面选择版本进行下载或点击此链接 Releases·winsw/winsw(github.com)其次,将winsw复制到php目录,重命名为phpcgi-service.exe并增加配置文件phpcgi-service.xml和stop-cgi.bat,其中 phpcgi-service.xml的......
  • java 集合框架-map(键值对集合)
    一、Map接口 (键值对集合)1.实现类(1).线程不安全HashMap1.特点:        ①无序          ②查找效率高:根据key,查找value2.数据结构:数组(哈希表)+链表(链地址法解决哈希表冲突)+红黑树(自平衡二叉树,提高查找效率)      ①数组(哈希表):Ha......
  • CVE-2015-5254
    目录漏洞描述漏洞利用流程如下复现过程漏洞利用思路总结漏洞描述ApacheActiveMQ是由美国Pachitea(Apache)软件基金会开发的开源消息中间件,支持Java消息服务、集群、Spring框架等。影响版本:ApacheActiveMQ5.13.0之前5.x版本,该程序导致的漏洞并不限制可以在......
  • JCR一区级 | Matlab实现SSA-Transformer-LSTM多变量回归预测
    JCR一区级|Matlab实现SSA-Transformer-LSTM多变量回归预测目录JCR一区级|Matlab实现SSA-Transformer-LSTM多变量回归预测效果一览基本介绍程序设计参考资料效果一览基本介绍1.【JCR一区级】Matlab实现SSA-Transformer-LSTM多变量回归预测,麻雀搜索算法(S......
  • vue2 - 详细实现“视频切片/分段加载“播放大视频,解决视频过大加载播放缓慢问题,vue处
    效果图在vue2、nuxt2项目开发中,详解vue视频分片加载,所谓“边播放边加载”,利用axios分段请求后端服务器每次只拿一小段视频慢慢缓存播放,让非常大的视频(例如电影,很长的视频播放太慢)流畅播放,vue2实现将video视频进行切片网络请求加载提升视频加载速度,详细解决视频分段下载......
  • 实验7-1-12 组个最小数
    给定数字0-9各若干个。你可以以任意顺序排列这些数字,但必须全部使用。目标是使得最后得到的数尽可能小(注意0不能做首位)。例如:给定两个0,两个1,三个5,一个8,我们得到的最小的数就是10015558。现给定数字,请编写程序输出能够组成的最小的数。输入格式:输入在一行中给出10个非负整数,......