问题概述
在某套数据库升级迁移后,系统内的其他数据库收到了一些错误:
Fri Nov 25 08:00:47 2022
Errors in file /oracle/app/oracle/diag/rdbms/resdb/resdb2/trace/resdb2_reco_3277084.trc:
ORA-02062: distributed recovery received DBID ee3f1032, expected 8b98b2bc
ORA-02062: distributed recovery received DBID 36acbe83, expected a76256d8
ORA-02062: distributed recovery received DBID dd1695c4, expected b9e5f818
问题原因
数据库的reco进程,是一个与分布式事务有关的进程,结合伴随的报错信息,怀疑是数据库的分布式事务出现了异常。
Oracle会自动处理分布事务,保证分布事务的一致性,所有站点全部提交或全部回滚,当commit或rollback的时候,出现了连接中断,或数据库崩溃的情况,就无法继续执行,出现所谓的不确定性分布式事务(in-doubt distributed transaction),事务挂起的情况。
该数据库并未在升级迁移范围内,唯一的变动是对其中的一些数据库链接进行了重建(指向升级迁移系统的数据库链接);Oracle分布式事务中的关键组件就是数据库链接(database link),这很有可能导致了分布式事务连接的中断;至此,先检查重建数据库链接的可用性,发现重建的数据库链接并无异常。
SQL> select sysdate from dual@DBLNK_CRMCENDB1;
SYSDATE
---------
25-NOV-22
检查数据库内是否有挂起的分布式事务,可以看到,确实存在一些挂起的分布式事务;通过DBA_2PC_NEIGHBORS视图可以看到,这些挂起的分布式确实是与我们重建的database link相关。
state描述
- collecting:在收集数据过程中,产生异常
- prepared: 在接受到异步commit/rollback指令前, 产生异常
- forced rollback: 在使用rollback force出现
- forced commit:在使用commit force出现
SQL> select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID STATE
---------------------- ----------------
122.22.3695788 collecting
217.26.3191060 collecting
321.30.2497885 collecting
404.32.1018036 collecting
473.21.1523563 collecting
818.19.1110703 collecting
850.26.1009115 collecting
7 rows selected.
SQL> SELECT LOCAL_TRAN_ID,IN_OUT,DATABASE,INTERFACE,DBUSER_OWNER FROM DBA_2PC_NEIGHBORS;
LOCAL_TRAN_ID IN_ DATABASE INTERFACE DBUSER_OWNER
-------------------- --- -------------------- -------------------- ---------------
404.32.1018036 in N EC_SMS
122.22.3695788 in N EC_SMS
321.30.2497885 in N EC_SMS
850.26.1009115 in N EC_SMS
473.21.1523563 in N EC_SMS
217.26.3191060 in N EC_SMS
818.19.1110703 in N EC_SMS
404.32.1018036 out DBLNK_CRMCENDB1 N EC_SMS
404.32.1018036 out DBLNK_CRMCENDB1 N EC_SMS
122.22.3695788 out DBLNK_CRMCENDB1 N EC_SMS
122.22.3695788 out DBLNK_CRMCENDB1 N EC_SMS
LOCAL_TRAN_ID IN_ DATABASE INTERFACE DBUSER_OWNER
-------------------- --- -------------------- -------------------- ---------------
321.30.2497885 out DBLNK_CRMCENDB1 N EC_SMS
321.30.2497885 out DBLNK_CRMCENDB1 N EC_SMS
850.26.1009115 out DBLNK_CRMCENDB1 N EC_SMS
850.26.1009115 out DBLNK_CRMCENDB1 N EC_SMS
473.21.1523563 out DBLNK_CRMCENDB1 N EC_SMS
473.21.1523563 out DBLNK_CRMCENDB1 N EC_SMS
217.26.3191060 out DBLNK_CRMCENDB1 N EC_SMS
217.26.3191060 out DBLNK_CRMCENDB1 N EC_SMS
818.19.1110703 out DBLNK_CRMCENDB1 N EC_SMS
20 rows selected.
解决方案
清理挂起的事务,该过程用于清除本地数据库所记载的远程事务入口,如下:
EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(' ');
SQL> select ' exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('||chr(39)||LOCAL_TRAN_ID||chr(39)||');'||chr(10)||'commit;' from dba_2pc_pending;
'EXECDBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('||CHR(39)||LOCAL_TRAN_ID||CHR(39)|
-----------------------------------------------------------------------------
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('122.22.3695788');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('217.26.3191060');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('321.30.2497885');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('404.32.1018036');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('473.21.1523563');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('818.19.1110703');
commit;
exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('850.26.1009115');
commit;
7 rows selected.
再次检查分布式事务挂起情况和alert日志,均再无相关内容。
SQL> select local_tran_id,state from dba_2pc_pending;
no rows selected
SQL>
SQL>
SQL>
SQL> SELECT LOCAL_TRAN_ID,IN_OUT,DATABASE,INTERFACE,DBUSER_OWNER FROM DBA_2PC_NEIGHBORS;
no rows selected
标签:CRMCENDB1,02062,SMS,EC,commit,DBLNK,ORA,out,分布式
From: https://blog.51cto.com/u_13482808/7974500