[20240510]SQL语句存在问题与共享池内存分配.txt
--//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql
--//语句.很奇怪的是应用前台根本不报错,真不知道开发如何写代码,难道没做例外处理以及记录吗?
--//测试看看如果sql语句存在问题,是否消耗共享池以及parse的情况以及对性能相关问题.
1.环境:
SCOTT@test01p> @ver1
PORT_STRING VERSION BANNER CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
2.测试:
SCOTT@test01p> select sql_id from v$sql where sql_text='select count(1) from deptxxx';
no rows selected
--//相关语句不在共享池,并且deptxxx表不存在.
SCOTT@test01p> select count(1) from deptxxx;
select count(1) from deptxxx
*
ERROR at line 1:
ORA-00942: table or view does not exist
--//表deptxxx不存在!!
SCOTT@test01p> @ hash
HASH_VALUE SQL_ID CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX SQL_EXEC_START SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
3744802176 07v29cmgma9c0 0 75136 903671040 df352580 2024-05-10 22:36:28 16777216
SCOTT@test01p> @ sql_id 07v29cmgma9c0
--SQL_ID = 07v29cmgma9c0
select sql_id from v$sql where sql_text='select count(1) from deptxxx';
--//是前一次正确执行的sql_id,这样无法获得无法执行sql语句的sql_id.
SCOTT@test01p> select sql_id from v$sqlarea where sql_text='select count(1) from deptxxx';
no rows selected
SCOTT@test01p> select sql_id from v$sql where sql_text='select count(1) from deptxxx';
no rows selected
--//查询v$sqlarea,v$sql视图根本不能发现对应sql_id.
--//使用我写的脚本计算sql_id与hash_value:
$ ./sql_idx.sh 'select count(1) from deptxxx'
sql_text = select count(1) from deptxxx\0
full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F
hash_value(10) = 2945576271
sql_id(32) = 2utgtqyrt3vag
sql_id(32) = 2utgtqyrt3vag
sql_id(32) = 2utgtqyrt3vag
SYS@test> @ sharepool/shp4x 2utgtqyrt3vag 0
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ---------------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- ------- ---- ---------- ------------- --------
child handle address 000007FF121630F0 000007FF16537BC8 select count(1) from deptxxx 0 0 3 00 00 0 0 3165 3165 3165 2945576271 2utgtqyrt3vag 0
parent handle address 000007FF16537BC8 000007FF16537BC8 select count(1) from deptxxx 1 0 3 000007FF12E82EA0 00 4072 0 0 4072 4072 2945576271 2utgtqyrt3vag 65535
--//可以发现这种情况一样建立父子光标,即使sql语句执行错误.不过子光标的堆0,堆6不存在罢了.
--//注:可以查询v$db_object_cache视图,比如name(对应sql文本)里面的特征字符,不过如果是生产系统,我估计会很慢!!
--//说明即使无法正确执行的sql语句也会消耗共享池内存.
3.看看parse的情况:
--//session 1:
SCOTT@test01p(265,16062)> @ spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
--- ------- --------- --------- ---- --- ---------- --------------------------------------------------
265 16062 9104:6104 DEDICATED 9516 59 8 alter system kill session '265,16062' immediate;
--//session 2:
SYS@test> @ ses2 265 "parse count"
SID NAME VALUE
--- ---------------------- -----
265 parse count (total) 729
265 parse count (hard) 266
265 parse count (failures) 31
--//session 1:
--//执行3次,输出略.前面parse count (failures)=31,说明我已经重复执行多次.
select count(1) from deptxxx;
--//session 2:
SYS@test> @ ses2 265 "parse count"
SID NAME VALUE
--- ---------------------- -----
265 parse count (total) 732
265 parse count (hard) 269
265 parse count (failures) 34
--//可以发现parse count (failures),parse count (hard),parse count (total)各自增加3次.
--//可以想象,如果密集的执行这些sql语句将是一场"灾难",如果生产系统这些语句执行频繁,至少出现大量分析,并且因为hash_value一样,
--//会在相同bucket上出现争用,出现大量library cache: mutex X,shared pool latch争用,但是不明白生产系统还会出现library
--//cache lock.
--//注:事后测试发现,情况并不是那样,没有出现大量shared pool latch争用,另外写blog分析.
--//生产系统的当时的情况:
> @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'"
Total
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN
------- ------- ------- ---------------------- ------------------- -------------------
8164610 913.0 76% library cache lock 2024-04-26 08:23:22 2024-04-26 09:17:44
2587860 289.4 24% library cache: mutex X 2024-04-26 08:23:22 2024-04-26 09:17:44
10 .0 0% library cache pin 2024-04-26 10:00:40 2024-04-26 10:00:40
--//问题出现在2024-04-26 08:23:22,在2024-04-26 09:17:44问题解决.
--//同事分析缺乏经验,实际上知道sql_id,如果能知道sql语句,随便执行看看或者查看对象就可以定位问题.
--//当然定位不是查询gv$sqlarea,gv$sql视图,而是查询x$kglob底层结构.
4.继续:
--//session 1:
SCOTT@test01p(265,16062)> select count(1) from;
select count(1) from
*
ERROR at line 1:
ORA-00903: invalid table name
--//执行多次,看看这样不完整的sql语句的情况.
$ ./sql_idx.sh 'select count(1) from'
sql_text = select count(1) from\0
full_hash_value(16) = 17A4F71D1DD4E12BAB0FAF270B2672EB
hash_value(10) = 187069163
sql_id(32) = aq3xg4w5kcwrb
sql_id(32) = aq3xg4w5kcwrb
sql_id(32) = aq3xg4w5kcwrb
SYS@test> @ sharepool/shp4x aq3xg4w5kcwrb 0
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- -------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- ------- ---- --------- ------------- --------
child handle address 000007FF008F2D88 000007FF12F38968 select count(1) from 0 0 4 00 00 0 0 3157 3157 3157 187069163 aq3xg4w5kcwrb 0
parent handle address 000007FF12F38968 000007FF12F38968 select count(1) from 1 0 4 000007FF12B33EE8 00 4072 0 0 4072 4072 187069163 aq3xg4w5kcwrb 65535
--//情况与上面测试类似.
--//做这个测试主要原因在生产系统发现如下:
--//主要原因在生产系统看到如下:
> @ dashtop event,p1,p3 "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 09:30:00'"
Total
Seconds AAS %This EVENT P1 P3 FIRST_SEEN LAST_SEEN
------- ------ ------- ---------------------- ------------ --------------- ------------------- -------------------
8163900 1528.0 76% library cache lock 81867324184 5373954 2024-04-26 08:23:22 2024-04-26 09:17:44
1545660 289.3 14% library cache: mutex X 3802446058 82 2024-04-26 08:23:22 2024-04-26 09:17:44
1041590 194.9 10% library cache: mutex X 3802446058 119 2024-04-26 08:23:33 2024-04-26 09:17:44
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
700 .1 0% library cache lock 81867324184 5373955 2024-04-26 08:23:22 2024-04-26 09:17:44
490 .1 0% library cache: mutex X 3802446058 106 2024-04-26 08:25:31 2024-04-26 09:17:44
40 .0 0% library cache: mutex X 3771887287 1 2024-04-26 09:17:11 2024-04-26 09:17:44
30 .0 0% library cache: mutex X 3802446058 124 2024-04-26 08:46:19 2024-04-26 08:58:18
20 .0 0% library cache: mutex X 3802446058 85 2024-04-26 09:17:11 2024-04-26 09:17:22
10 .0 0% library cache lock 81864522800 400827822964738 2024-04-26 08:43:39 2024-04-26 08:43:39
10 .0 0% library cache: mutex X 37799 49 2024-04-26 08:36:47 2024-04-26 08:36:47
10 .0 0% library cache: mutex X 106013 49 2024-04-26 08:47:04 2024-04-26 08:47:04
10 .0 0% library cache: mutex X 121048 49 2024-04-26 08:36:37 2024-04-26 08:36:37
12 rows selected.
> @ sharepool/shp4x 0 3802446058
> @ pr
==============================
TEXT : parent handle address
KGLHDADR : 00000013077E9500
KGLHDPAR : 00000013077E9500
C40 : select count(1) from
KGLHDLMD : 0
KGLHDPMD : 0
KGLHDIVC : 0
KGLOBHD0 : 00
KGLOBHD6 : 00
KGLOBHS0 : 0
KGLOBHS6 : 0
KGLOBT16 : 0
N0_6_16 : 0
N20 : 0
KGLNAHSH : 3802446058
~~~~~~~~~~~~~~~~~~~~~~
KGLOBT03 : --//没有值.
KGLOBT09 : 65535
PL/SQL procedure successfully completed.
--//小心,在生产系统访问x$kglob可能存在严重性能问题,不行快速kill相关进程.包括下面测试使用的fchaz.sql脚本.
--//我的查询可以使用索引估计问题不大!!
--//参数1 sql_id 参数2 hash_value.
> select replace(kglnaobj,chr(13),' ') c100 ,length(kglnaobj) n10 from x$kglob where KGLHDPAR=hextoraw('00000013077E9500');
C100 N10
--------------------- ---
select count(1) from 20
--//长度确实是20,但是KGLNAHSH=3802446058,完全与我的测试对不上,我的测试是KGLNAHSH=187069163.这个问题先放一放.
5.继续看看对象deptxxx:
--//继续昨天的测试,在scottt用户下执行select count(1) from deptxxx;多次.
SYS@test> select * from v$open_cursor where sid=263 and sql_text like '%count%';
no rows selected
--//语句执行有问题,光标不会缓存.
SYS@test> select * from v$db_object_cache where name = 'DEPTXXX'
2 @ prxx
==============================
OWNER : SCOTT
NAME : DEPTXXX
DB_LINK :
NAMESPACE : TABLE/PROCEDURE
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 1
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 77291
INVALIDATIONS : 0
HASH_VALUE : 1772563947
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 4
PINNED_TOTAL : 4
PROPERTY :
FULL_HASH_VALUE : ef603aaa09d90710c71ba16b69a72deb
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF14929138
EDITION :
==============================
OWNER : PUBLIC
NAME : DEPTXXX
DB_LINK :
NAMESPACE : TABLE/PROCEDURE
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 1
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 104220
INVALIDATIONS : 0
HASH_VALUE : 2989463324
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 4
PINNED_TOTAL : 4
PROPERTY :
FULL_HASH_VALUE : bee0db68379be71263a53e5fb22f971c
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FEFFF8AF58
EDITION :
PL/SQL procedure successfully completed.
--//可以发现即使对象不存在,也会加载在共享池.一个owner=SCOTT,另外一个PUBLIC.oracle这样的目的是表示对象是否存在.
SYS@test> @ sharepool/shp4x 0 1772563947
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ------- --------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
parent handle address 000007FF14929138 000007FF14929138 DEPTXXX 0 0 0 00 00 0 0 0 0 0 1772563947 0
SYS@test> @ sharepool/shp4x 0 2989463324
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ------- --------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
parent handle address 000007FEFFF8AF58 000007FEFFF8AF58 DEPTXXX 0 0 0 00 00 0 0 0 0 0 2989463324 0
--//实际上v$db_object_cache视图来源就是x$kglob,v$db_object_cache.addr=KGLHDADR.
--//1772563947%2^17 = 77291 对应CHILD_LATCH = 77291
--//2989463324%2^17 = 104220 对应CHILD_LATCH = 104220
SYS@test> @ fchaz 000007FF14929138
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ---------------- -----------------
SGA 000007FF14929108 1 1 KGLHD 816 recr 80 00 000007FF14929437
SYS@test> @ fchaz 000007FEFFF8AF58
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ---------------- -----------------
SGA 000007FEFFF8AF28 1 1 KGLHD 816 recr 80 00 000007FEFFF8B257
--//2个对象消耗816字节.
6.结论:
--//可以看出几个特点即使sql语句存在问题,oracle还是消耗一定的共享池内存,建立父子光标,仅仅子光标堆0,堆6不存在.
--//并且每次执行都会执行产生1次硬分析.
--//相关表也会加载到共享池中.
--//下次测试环境下模拟生产系统大量执行这类sql语句时出现的情况.
7.附上shp4x.sql脚本:
$ cat shp4x.sql
column N0_6_16 format 99999999
SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr,
kglhdpar, 'parent handle address',
'child handle address')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
KGLHDLMD,
KGLHDPMD,
kglhdivc,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03,
kglobt09
FROM x$kglob
WHERE kglobt03 = lower('&1') or KGLNAHSH= &2;
--//fchaz.sql脚本来源tpt fcha.sql,我仅仅注解了里面的提示信息,增加一个显示字段KSMCHPTR_END.
--//TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
--//注:在生产系统执行要小心!!
-- Copyright 2018 Tanel Poder. All rights reserved. More info at http://tanelpoder.com
-- Licensed under the Apache License, Version 2.0. See LICENSE.txt for terms & conditions.
--------------------------------------------------------------------------------
--
-- File name: fcha.sql (Find CHunk Address) v0.2
-- Purpose: Find in which heap (UGA, PGA or Shared Pool) a memory address resides
--
-- Author: Tanel Poder
-- Copyright: (c) http://blog.tanelpoder.com | @tanelpoder
--
-- Usage: @fchaz <addr_hex> --
-- @fchaz F6A14448 --
--
-- Other: This would only report an UGA/PGA chunk address if it belongs
-- to *your* process/session (x$ksmup and x$ksmpp do not see other
-- session/process memory)
--
--------------------------------------------------------------------------------
--prompt Find in which heap (UGA, PGA or Shared Pool) the memory address &1 resides...
--prompt
--prompt WARNING!!! This script will query X$KSMSP, which will cause heavy shared pool latch contention
--prompt in systems under load and with large shared pool. This may even completely hang
--prompt your instance until the query has finished! You probably do not want to run this in production!
--prompt
--pause Press ENTER to continue, CTRL+C to cancel...
select
'SGA' LOC,
KSMCHPTR,
KSMCHIDX,
KSMCHDUR,
KSMCHCOM,
KSMCHSIZ,
KSMCHCLS,
KSMCHTYP,
KSMCHPAR,
TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
x$ksmsp
where
to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
between
to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
union all
select
'UGA',
KSMCHPTR,
null,
null,
KSMCHCOM,
KSMCHSIZ,
KSMCHCLS,
KSMCHTYP,
KSMCHPAR,
TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
x$ksmup
where
to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
between
to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
union all
select
'PGA',
KSMCHPTR,
null,
null,
KSMCHCOM,
KSMCHSIZ,
KSMCHCLS,
KSMCHTYP,
KSMCHPAR,
TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
x$ksmpp
where
to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
between
to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
/