联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作
SQL> @dbms_diskgroup_get_block.sql +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header
Parameter 1:
ASM_file_name (required)
Parameter 2:
block_to_extract (required)
Parameter 3
number_of_blocks_to_extract (required)
Parameter 4:
FileSystem_File_Name (required)
old 14: v_AsmFilename := '&ASM_File_Name' ;
new 14: v_AsmFilename := '+DATA/xifenfei.dbf' ;
old 15: v_offstart := '&block_to_extract' ;
new 15: v_offstart := '1' ;
old 16: v_numblks := '&number_of_blocks_to_extract' ;
new 16: v_numblks := '1' ;
old 17: v_FsFilename := '&FileSystem_File_Name' ;
new 17: v_FsFilename := '/tmp/xff/xifenfei.dbf.header' ;
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size ( in logical blocks): 3978880
Logical Block Size : 16384
Physical Block Size : 512
PL/SQL procedure successfully completed.
|
然后通过bbed修改相关scn
BBED> set filename 'xifenfei.dbf.header'
FILENAME xifenfei.dbf.header
BBED> set blocksize 16384
BLOCKSIZE 16384
BBED> map
File: xifenfei.dbf.header (0)
Block: 1 Dba:0x00000000
------------------------------------------------------------
Data File Header
struct kcvfh, 860 bytes @0
ub4 tailchk @16380
BBED> p kcvfh.kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0xa8061324
ub2 kscnwrp @488 0x0081
BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0xa8133e2b
ub2 kscnwrp @488 0x0081
|
然后把修改的数据文件头写回到asm中
SQL> @dbms_diskgroup_cp_block_to_asm.sql /tmp/xff/xifenfei .dbf.header +DATA /xifenfei .dbf 1 1
Parameter 1:
v_FsFileName (required)
Parameter 2:
v_AsmFileName (required)
Parameter 3
v_offstart (required)
Parameter 4
v_numblks (required)
old 16: v_FsFileName := '&v_FsFileName' ;
new 16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header' ;
old 17: v_AsmFileName := '&v_AsmFileName' ;
new 17: v_AsmFileName := '+DATA/xifenfei.dbf' ;
old 18: v_offstart := '&v_offstart' ;
new 18: v_offstart := '1' ;
old 19: v_numblks := '&v_numblks' ;
new 19: v_numblks := '1' ;
File: +DATA /xifenfei .dbf
Type: 2 Data File
Size ( in logical blocks): 3978880
Logical Block Size: 16384
PL /SQL procedure successfully completed.
|
查询文件头是否修改成功
[oracle@xff1 xff]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to :
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> set numw 16
SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);
CHECKPOINT_CHANGE#
------------------
556870614571
556870614571
SQL> recover datafile 295;
Media recovery complete.
|
通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库
SQL> recover database ;
Media recovery complete.
SQL> alter database open ;
Database altered.
|
alert日志提示
Sat Aug 10 16:46:11 2024
ALTER DATABASE RECOVER datafile 295
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 295 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER datafile 295
Sat Aug 10 16:46:39 2024
ALTER DATABASE RECOVER database
Media Recovery Start
started logmerger process
Sat Aug 10 16:46:51 2024
WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command .
Sat Aug 10 16:46:54 2024
Parallel Media Recovery started with 64 slaves
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER database
Sat Aug 10 17:19:58 2024
alter database open
This instance was first to open
Sat Aug 10 17:19:58 2024
SUCCESS: diskgroup DATA was mounted
Sat Aug 10 17:19:58 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Sat Aug 10 17:20:10 2024
Picked broadcast on commit scheme to generate SCNs
Sat Aug 10 17:20:10 2024
SUCCESS: diskgroup REDO was mounted
Sat Aug 10 17:20:10 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Thread 1 opened at log sequence 124958
Current log # 14 seq# 124958 mem# 0: +REDO/xff/log2.ora
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 10 17:20:14 2024
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[33770] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:261099864 end:261100854 diff :990 (9 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Sat Aug 10 17:20:16 2024
minact-scn: Inst 1 is now the master inc #:2 mmon proc-id:33650 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Starting background process GTX0
Sat Aug 10 17:20:16 2024
GTX0 started with pid=45, OS id =34119
Starting background process RCBG
Sat Aug 10 17:20:16 2024
RCBG started with pid=46, OS id =34121
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Aug 10 17:20:16 2024
QMNC started with pid=47, OS id =34134
Starting background process SMCO
Completed: alter database open
|
检查数据字典一致性
SQL> @hcheck.sql
HCheck Version 07MAY18 on 10-AUG-2024 18:24:49
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF
Catalog Fixed
Procedure Name Version Vs Release Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj ... 1102000300 <= * All Rel* 08/10 18:24:49 PASS
.- MissingOIDOnObjCol ... 1102000300 <= * All Rel* 08/10 18:24:49 PASS
.- SourceNotInObj ... 1102000300 <= * All Rel* 08/10 18:24:49 PASS
.- OversizedFiles ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- PoorDefaultStorage ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- PoorStorage ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- TabPartCountMismatch ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- OrphanedTabComPart ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- MissingSum$ ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- MissingDir$ ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- DuplicateDataobj ... 1102000300 <= * All Rel* 08/10 18:24:50 PASS
.- ObjSynMissing ... 1102000300 <= * All Rel* 08/10 18:24:51 PASS
.- ObjSeqMissing ... 1102000300 <= * All Rel* 08/10 18:24:51 PASS
.- OrphanedUndo ... 1102000300 <= * All Rel* 08/10 18:24:51 PASS
.- OrphanedIndex ... 1102000300 <= * All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexPartition ... 1102000300 <= * All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexSubPartition ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- OrphanedTable ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- OrphanedTablePartition ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- OrphanedTableSubPartition ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- MissingPartCol ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- OrphanedSeg$ ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- OrphanedIndPartObj# ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- DuplicateBlockUse ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- FetUet ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- Uet0Check ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- SeglessUET ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- BadInd$ ... 1102000300 <= * All Rel* 08/10 18:24:52 PASS
.- BadTab$ ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadIcolDepCnt ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- ObjIndDobj ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- TrgAfterUpgrade ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- ObjType0 ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadOwner ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- StmtAuditOnCommit ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadPublicObjects ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadSegFreelist ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadDepends ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- CheckDual ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- ObjectNames ... 1102000300 <= * All Rel* 08/10 18:24:53 PASS
.- BadCboHiLo ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- ChkIotTs ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- NoSegmentIndex ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- BadNextObject ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- DroppedROTS ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- FilBlkZero ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- DbmsSchemaCopy ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- OrphanedObjError ... 1102000300 > 1102000000 08/10 18:24:54 PASS
.- ObjNotLob ... 1102000300 <= * All Rel* 08/10 18:24:54 PASS
.- MaxControlfSeq ... 1102000300 <= * All Rel* 08/10 18:24:55 PASS
.- SegNotInDeferredStg ... 1102000300 > 1102000000 08/10 18:25:18 PASS
.- SystemNotRfile1 ... 1102000300 > 902000000 08/10 18:25:18 PASS
.- DictOwnNonDefaultSYSTEM ... 1102000300 <= * All Rel* 08/10 18:25:18 PASS
.- OrphanTrigger ... 1102000300 <= * All Rel* 08/10 18:25:18 PASS
.- ObjNotTrigger ... 1102000300 <= * All Rel* 08/10 18:25:18 PASS
---------------------------------------
10-AUG-2024 18:25:18 Elapsed: 29 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)
PL/SQL procedure successfully completed.
Statement processed.
Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc
|
运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)
- ORA-00333 ORA-01595 恢复
- ORA-00316 ORA-00312故障处理
- 断电引起文件scn异常数据库恢复
- 硬件故障导致ORA-600 2662错误处理
- 记录一次ORA-600 3004 恢复过程和处理思路
- ORA-600 ktbsdp2 处理
- resetlogs失败故障恢复-ORA-01555
- Oracle断电故障处理
- Oracle Recovery Tools快速恢复ORA-19909
- ORA-00742 ORA-00312 恢复
- 分布式存储故障导致数据库无法启动故障处理
- 存储故障后oracle报—ORA-01122/ORA-01207故障处理