问题概述
用户反馈,查询视图gv$instance超过10秒,导致业务异常(程序的逻辑是先访问gv$instance ,判断实例是否存活,然后再开始干活,要是gv$instance不及时返回信息,程序就自杀),其他GV$视图查询性能正常。
问题原因
SQL语句执行消耗时间长,首先想到的是时间去哪了,在等待什么。event 10046非常合适来做这些,trace文件可以看到消耗的10秒时间都在等待事件“reliable message”上面。
Reliable Message 是一个非空闲等待事件,进程级别,用于跟踪Oracle数据库中的许多不同类型的通道通信,如果长时间等待,意味着某些进程的通道通讯受到阻塞。MOS 上对于reliable message的解释如下(WAITEVENT: "reliable message" Reference Note (Doc ID 69088.1)):
When a process sends a message using the 'KSR' intra-instance broadcast service,the message publisher waits on this wait-event until all subscribers have consumed the 'reliable message' just sent. The publisher waits on this wait-event for up to one second and then re-tests if all subscribers have consumed the message, or until posted. If the message is not fully consumed the wait recurs,repeating until either the message is consumed or until the waiter is interrupted.
查看当前数据库GV$SESSION,检查是否还有其他异常等待事件,发现该视图执行的时间也是非常长,结果显示等待事件最多的也是“reliable message”。也就是说,数据库中不仅仅是GV$INSTANCE查询缓慢,其他GV$视图也是非常缓慢的。
此等待事件,是针对各种channel的,不同的channel 处理方式不一。gv$channel_waits 视图里查询问题最严重的 channel,可以看到“kxfp control signal channel ”遥遥领先其他CHANNEL。
SELECT CHANNEL, SUM(wait_count) sum_wait_count
FROM GV$CHANNEL_WAITS
GROUP BY CHANNEL
ORDER BY SUM(wait_count) DESC
;
搜索12.1.0.2的bug list(12.1.0.2 Patch Set Updates - List of Fixes in each PSU (Doc ID 1924126.1)),有且仅有一个该等待事件的bug 20470877。
按照该bug的对应文档提示,执行dump ksim后,查看后台进程lck的trace文件,现象也是符合的。基本上可以判断,是这个bug导致问题。
Running the following will dump the ksim PGA nd SGA cache
SQL> oradebug setorapname lck0
SQL> oradebug call ksimPrintGroupMemPGA
SQL> oradebug call ksimPrintGroupMemCache
If you check the LCK0 trace file after the above you may
see many datapump/AQ generated groups left behind in the form of:
SYS.KUPC$S_<thread>_yyyymmddhhmiss
SYS.KUPC$C_<thread>_yyyymmddhhmiss
解决方案
临时workaround,逐个实例重启;根本解决方案,应用补丁20470877
参考文档
WAITEVENT: "reliable message" Reference Note (Doc ID 69088.1)
12.1.0.2 Patch Set Updates - List of Fixes in each PSU (Doc ID 1924126.1)
Excessive "reliable Message" waits seen on RAC after a few days uptime (Doc ID 1990796.1)
标签:GV,reliable,视图,message,channel,wait From: https://blog.51cto.com/u_13482808/7427765