测试场景1(模拟NH SITE故障,OB在NH)【推荐】
--master observer 维护DR SITE :prdb19 DGMGRL> show observer Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdb19" - Master Host Name: prdb19 Last Ping to Primary: 1 second ago Last Ping to Target: 2 seconds ago Observer "prdg19" - Backup Host Name: prdg19 Last Ping to Primary: 2 seconds ago Last Ping to Target: 3 seconds ago DGMGRL> --主库配置:FastStartFailoverTarget :orcl0dg 该实例位于:prdg19 DGMGRL> show database verbose orcl2dg Database - orcl2dg Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): orcl2dg Properties: DGConnectIdentifier = 'prdg19/orcl2dg' ObserverConnectIdentifier = '' FastStartFailoverTarget = 'orcl0dg' 设置本地DB --全部配置如下: DGMGRL> SHOW CONFIGURATION lag verbose Configuration - dg_config Protection Mode: MaxPerformance Members: orcl2dg - Primary database orcl0dg - (*) Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) orcl - Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '45' OperationTimeout = '40' TraceLevel = 'SUPPORT' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '5' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'TRUE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' ConfigurationWideServiceName = 'orcl_CFG' Fast-Start Failover: Enabled in Potential Data Loss Mode Lag Limit: 30 seconds Threshold: 45 seconds Active Target: orcl0dg Potential Targets: "orcl0dg" --只有本地DB orcl0dg valid Observers: (*) prdb19 prdg19 Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: 5 seconds Observer Override: TRUE Configuration Status: SUCCESS DGMGRL>
站点故障模拟:虚拟机断电,备库和ob同时失去联系!
--主库没有关闭,正常 DGMGRL> show observer Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdg19" - Master Host Name: prdg19 Last Ping to Primary: 1 second ago Last Ping to Target: 3 seconds ago Observer "prdb19" - Backup Host Name: prdb19 Last Ping to Primary: 131 seconds ago Last Ping to Target: 100 seconds ago DGMGRL>
测试场景2:(模拟本地SIT 故障,OB在本地)(主库关闭)【隐患较大】
DGMGRL> SET MASTEROBSERVER to prdb19; Succeeded. DGMGRL> show observer Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdb19" - Master Host Name: prdb19 --本地OB MASTER Last Ping to Primary: 0 seconds ago Last Ping to Target: 0 seconds ago Observer "prdg19" - Backup Host Name: prdg19 Last Ping to Primary: 1 second ago Last Ping to Target: 0 seconds ago DGMGRL> DGMGRL> edit database orcl2dg set property FastStartFailoverTarget='orcl'; Error: ORA-16654: fast-start failover is enabled Failed. DGMGRL> disable fast_start failover Disabled. DGMGRL> edit database orcl2dg set property FastStartFailoverTarget='orcl'; 本地DB Property "faststartfailovertarget" updated DGMGRL> enable fast_start failover Enabled in Potential Data Loss Mode. DGMGRL> DGMGRL> SHOW CONFIGURATION lag verbose Configuration - dg_config Protection Mode: MaxPerformance Members: orcl2dg - Primary database orcl - (*) Physical standby database Transport Lag: 0 seconds (computed 0 seconds ago) Apply Lag: 0 seconds (computed 0 seconds ago) orcl0dg - Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '45' OperationTimeout = '40' TraceLevel = 'SUPPORT' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '5' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'TRUE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' ConfigurationWideServiceName = 'orcl_CFG' Fast-Start Failover: Enabled in Potential Data Loss Mode Lag Limit: 30 seconds Threshold: 45 seconds Active Target: orcl Potential Targets: "orcl" orcl valid Observers: (*) prdb19 prdg19 Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: 5 seconds Observer Override: TRUE Configuration Status: SUCCESS DGMGRL>
模拟办法:见本页面图 主库关闭, 2024-10-17T13:17:24.515534+08:00 Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds. It is likely an automatic failover has already occurred. Primary is shutting down. 2024-10-17T13:17:24.549536+08:00 Errors in file /u01/app/oracle/diag/rdbms/orcl2dg/orcl2dg/trace/orcl2dg_lgwr_4385.trc: ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting down LGWR (ospid: ): terminating the instance due to ORA error 2024-10-17T13:17:24.584808+08:00 System state dump requested by (instance=1, osid=4385 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl2dg/orcl2dg/trace/orcl2dg_diag_4361.trc 2024-10-17T13:17:25.200753+08:00 Dumping diagnostic data in directory=[cdmp_20241017131724], requested by (instance=1, osid=4385 (LGWR)), summary=[abnormal instance termination]. 2024-10-17T13:17:26.368261+08:00 Instance terminated by LGWR, pid = 4385
测试场景3 : FastStartFailoverTarget,多名称配置,第三站点配置在后面,OB第三站点,虽然达到目的,今天测试和昨天测试现象不一样,建议多次测试
DGMGRL> SET MASTEROBSERVER to prdb19; Succeeded. DGMGRL> show observer Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdb19" - Master Host Name: prdb19 Last Ping to Primary: 0 seconds ago Last Ping to Target: 0 seconds ago Observer "prdg19" - Backup Host Name: prdg19 Last Ping to Primary: 1 second ago Last Ping to Target: 0 seconds ago DGMGRL> DGMGRL> disable fast_start failover Disabled. DGMGRL> edit database orcl2dg set property FastStartFailoverTarget='orcl0dg,orcl'; Property "faststartfailovertarget" updated DGMGRL> enable fast_start failover Enabled in Potential Data Loss Mode. DGMGRL> DGMGRL> SHOW CONFIGURATION lag verbose Configuration - dg_config Protection Mode: MaxPerformance Members: orcl2dg - Primary database orcl0dg - (*) Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) orcl - Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '45' OperationTimeout = '40' TraceLevel = 'SUPPORT' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '5' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'TRUE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' ConfigurationWideServiceName = 'orcl_CFG' Fast-Start Failover: Enabled in Potential Data Loss Mode Lag Limit: 30 seconds Threshold: 45 seconds Active Target: orcl0dg Potential Targets: "orcl0dg,orcl" orcl0dg valid orcl valid Observers: (*) prdb19 prdg19 Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: 5 seconds Observer Override: TRUE Configuration Status: SUCCESS DGMGRL> 模拟故障方法见图: 主库没有关闭,broker 发生切换,建议多次测试! Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdg19" - Master Host Name: prdg19 Last Ping to Primary: 3 seconds ago Last Ping to Target: 1 second ago Observer "prdb19" - Backup Host Name: prdb19 Last Ping to Primary: 145 seconds ago Last Ping to Target: 114 seconds ago DGMGRL>
测试场景4 : FastStartFailoverTarget,多名称配置,第三站点配置在前面,OB第三站点,测试结果和昨天一样,不推荐
DGMGRL> SET MASTEROBSERVER to prdb19; Succeeded. DGMGRL> show observer Configuration - dg_config Primary: orcl2dg Active Target: orcl0dg Observer "prdb19" - Master Host Name: prdb19 Last Ping to Primary: 0 seconds ago Last Ping to Target: 0 seconds ago Observer "prdg19" - Backup Host Name: prdg19 Last Ping to Primary: 1 second ago Last Ping to Target: 0 seconds ago DGMGRL> DGMGRL> disable fast_start failover Disabled. DGMGRL> edit database orcl2dg set property FastStartFailoverTarget='orcl,orcl0dg'; Property "faststartfailovertarget" updated DGMGRL> enable fast_start failover Enabled in Potential Data Loss Mode. DGMGRL> DGMGRL> SHOW CONFIGURATION lag verbose Protection Mode: MaxPerformance Members: orcl2dg - Primary database orcl - (*) Physical standby database Transport Lag: 0 seconds (computed 0 seconds ago) Apply Lag: 0 seconds (computed 0 seconds ago) orcl0dg - Physical standby database Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '45' OperationTimeout = '40' TraceLevel = 'SUPPORT' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '5' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'TRUE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' ConfigurationWideServiceName = 'orcl_CFG' Fast-Start Failover: Enabled in Potential Data Loss Mode Lag Limit: 30 seconds Threshold: 45 seconds Active Target: orcl Potential Targets: "orcl,orcl0dg" orcl valid orcl0dg valid Observers: (*) prdb19 prdg19 Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: 5 seconds Observer Override: TRUE Configuration Status: SUCCESS 模拟故障方法见图
主库关闭: 2024-10-17T13:37:50.874595+08:00 Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds. It is likely an automatic failover has already occurred. Primary is shutting down. 2024-10-17T13:37:50.910293+08:00 Errors in file /u01/app/oracle/diag/rdbms/orcl2dg/orcl2dg/trace/orcl2dg_lgwr_1607.trc: ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting down LGWR (ospid: ): terminating the instance due to ORA error 2024-10-17T13:37:50.947834+08:00 System state dump requested by (instance=1, osid=1607 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl2dg/orcl2dg/trace/orcl2dg_diag_1588.trc 2024-10-17T13:37:51.512835+08:00 Dumping diagnostic data in directory=[cdmp_20241017133750], requested by (instance=1, osid=1607 (LGWR)), summary=[abnormal instance termination]. 2024-10-17T13:37:52.622822+08:00 Instance terminated by LGWR, pid = 1607 大概80秒 Configuration - dg_config Primary: orcl2dg Active Target: orcl Observer "prdb19" - Master Host Name: prdb19 Last Ping to Primary: 79 seconds ago Last Ping to Target: 77 seconds ago Observer "prdg19" - Backup Host Name: prdg19 Last Ping to Primary: 2 seconds ago Last Ping to Target: 78 seconds ago DGMGRL> / ORA-03135: connection lost contact Process ID: 3938 Session ID: 423 Serial number: 56624 Configuration details cannot be determined by DGMGRL DGMGRL> /
标签:ago,seconds,Ping,Primary,FSFOtest,DGMGRL,orcl2dg From: https://www.cnblogs.com/notonlydba/p/18472043