Fix high

标签：00 4.425 Fix cache T2 T3 T1 high

This is the first part in a series of articles entitled "The right tool for the right job". In each article, I will use tools and ideas made available to us by oracle gurus like Tanel Poder, Jonathan Lewis, Tom Kyte, Kerry Osborne and many others. I will try to show you how to use those available free tools for troubleshooting real life problem I've encountered at client sites. You can use the scripts "as-is" (note that you should always test any script before using it on production systems) or modify them to suit your needs or even write your own based on ideas found on original scripts.

Some background first, database is 10.2.0.4 running in a SUN M9000 dynamic domain with 128Go of memory and 32 CPU (2520 MHz) allocated using fair share scheduler (FSS). We are experiencing high "latch: cache buffers chains" (CBC) contention issue. The problem occurs intermittently (several times a day) and also resolves by itself. Here is an overview of the issue with Quest Perfomance Analysis (as the client pays for it, let's use it to show a pretty graph ^_^) :

CBC latches are represented in pink color, I did the whole analysis during 10:00 and 10:15 AM, when the problem was currently happening.

What is a latch?
Latches are simple, low-level serialization mechanisms to protect shared SGA data structures and shared code segments (such as various internal linked list modifications, shared pool memory allocation, library cache object lookups and so on...) from simultaneous session access. They are designed to be very quickly acquired and freed. Latches are very low-level locks, inaccessible to users or applications as they cannot directly acquire nor release them. They are similar in purpose to locks: Latches protect internal memory structures while locks protect data structures.

The database buffer cache is the portion of the SGA that holds copies of data blocks read from datafiles so they can be accessed faster than by reading them off disks.

We can usually get more information about wait events by checking P1, P2 and P3. We will then look up on V$EVENT_NAME what P1, P2 and P3 mean for "latch: cache buffers chains".

`SQL> select PARAMETER1 P1, PARAMETER2 P2, PARAMETER3 P3` `2 from v$event_name` `3 where name = 'latch: cache buffers chains';` `P1 P2 P3` `---------- ---------- ----------` `address number tries`

From the above query, P1 is the address of the latch for the cbc latch wait. Now, we can query ASH by grouping the CBC latch waits by the address and find out what address is experiencing the most contention (=waits).

SQL> select * from ( 2 select 3 event, 4 trim(to_char(p1, 'XXXXXXXXXXXXXXXX')) latch_addr, 5 trim(round(ratio_to_report(count()) over () 100, 1))\|\|'%' pct, 6 count() 7 from 8 v$active_session_history 9 where 10 event = 'latch: cache buffers chains' 11 and session_state = 'WAITING' 12 group by event,p1 13 order by count() desc 14 ) 15 where rownum <= 10; EVENT LATCH_ADDR PCT COUNT(*) -------------------------------------------------- ----------------- -------- ---------- latch: cache buffers chains 967757968 78.8% 178 latch: cache buffers chains 964F892C0 2.7% 6 latch: cache buffers chains CE2984B50 2.2% 5 latch: cache buffers chains 9676FC808 1.3% 3 latch: cache buffers chains 963019308 1.3% 3 latch: cache buffers chains 963DE63F0 .9% 2 latch: cache buffers chains 9655A1E38 .9% 2 latch: cache buffers chains 966F65E28 .9% 2 latch: cache buffers chains 963276CE8 .9% 2 latch: cache buffers chains CE1AD8E38 .9% 2 10 rows selected. SQL>

Looking at the highlighted line, 78.8% of the waits on "latch: cache buffers chains" involves child latch at address 967757968.

Now that we know the actual CBC child latch address having the most waits, we can use Tanel Poder's latchprofx script (http://files.e2sn.com/scripts/latchprofx.sql) to find the root cause of the contention. We will monitor the holders (sid and sql_id) of this particular child latch. The column called "object", only available in x$ksuprlat shows information about the object protected by a given latch. For cache buffers chains latches, this column shows the Data Block Address (DBA) of the block that we accessed, causing the latch get.

`SQL> @latchprofx sid,name,sqlid,object % 967757968 10000` `-- LatchProfX 2.00 by Tanel Poder ( http://www.tanelpoder.com` `SID NAME SQLID OBJECT Held Gets Held % Held ms Avg hold ms` `------ --------------------------- ------------- --------------- ---------- ---------- ------- ----------- -----------` `1258 cache buffers chains amnf5uaxyn12c 31B7E54 100 100 1.00 442.500 4.425` `3077 cache buffers chains amnf5uaxyn12c 31B7E54 97 97 .97 429.225 4.425` `3004 cache buffers chains amnf5uaxyn12c 31B7E54 88 88 .88 389.400 4.425` `3082 cache buffers chains amnf5uaxyn12c 31B7E54 50 50 .50 221.250 4.425` `2947 cache buffers chains amnf5uaxyn12c 31B7E54 13 13 .13 57.525 4.425` `967 cache buffers chains 17qcz644sgfzk 31B7E54 4 4 .04 17.700 4.425` `3176 cache buffers chains 17qcz644sgfzk 31B7E54 4 4 .04 17.700 4.425` `3000 cache buffers chains 17qcz644sgfzk 31B7E54 2 2 .02 8.850 4.425` `1165 cache buffers chains 17qcz644sgfzk 31B7E54 2 2 .02 8.850 4.425` `1037 cache buffers chains 17qcz644sgfzk 31B7E54 2 2 .02 8.850 4.425` `2950 cache buffers chains fb8phc7ms9s5u 31B7E54 2 2 .02 8.850 4.425` `3077 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `2950 cache buffers chains 50p77b0zpx5cx 31B7E54 1 1 .01 4.425 4.425` `2933 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `2633 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `2594 cache buffers chains 985rq8bkb254j 31B7E54 1 1 .01 4.425 4.425` `2557 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `1284 cache buffers chains 985rq8bkb254j 31B7E54 1 1 .01 4.425 4.425` `1240 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `1026 cache buffers chains 17qcz644sgfzk 31B7E54 1 1 .01 4.425 4.425` `20 rows selected.`

The block address accessed is 31B7E54 and the sessions holding the child latch the most were executing the query with sql_id amnf5uaxyn12c.
Next, we will convert that data block address to file#, block# using dbms_utility and try to find to which segment it belongs querying x$bh (bh stands for blocks headers). Once again, Tanel wrote a script for that (dba.sql standing for Data Block Address^^). You will find it in tpt_public_unixmac.tar.gz available on Tanel's new website.

`SQL> @dba 31B7E54` `RFILE# BLOCK#` `---------- ----------` `12 1801812` `Press enter to find the segment using V$BH (this may take CPU time), CTRL+C to cancel:` `STATE BLOCK_CLASS OBJECT_TYPE object TCH MODE_HELD D T P S D FLG_LRUFLG DQ` `---------- ------------------ ---------------- ------------------------------ ---------- ---------- - - - - - -------------- -------` `xcur data block INDEX TAU$OWNER.ITRN2TAU 17 1 N N N N N 2000:8 0` `Press enter to query what segment resides there using DBA_EXTENTS (this can be IO intensive), CTRL+C to cancel:` `^C` `SQL>`

To solve the problem once and for all, we must find the SQL and why is application hitting the block so hard.

We can get sql_fulltext of a given sql_id either from V$SQL or V$SQLAREA or V$SQLSTATS. I personally prefer getting it from V$SQLSTATS because it is faster, more scalable, and has a greater data retention (the statistics may still appear in this view, even after the cursor has been aged out of the shared pool).

I have a little script for that : sqltext.sql

`SQL> @sqltext amnf5uaxyn12c` `Wrote file /tmp/sqlplus_settings` `SQL_FULLTEXT` `------------------------------------------------------------------------------------------------------------------------------------------------------` `SELECT DATEFT, CODAPPOPE, NUMTRAOPE, NUMTRN, MNEFLD, MNEPTF, INIRSPFRT, MNECPAFIN, NUMHDRPDT FROM TTRNTAU T1 WHERE T1.CODAPPOPE = :B2 AND T1.NUMTRAOPE` `= :B1 AND T1.DATEFT = (SELECT /+ORDERED/ MAX(T2.DATEFT) FROM TTRNTAU T2, THDRPDTTAU HDP WHERE T2.CODAPPOPE = T1.CODAPPOPE AND T2.NUMTRAOPE =` `T1.NUMTRAOPE AND T2.NUMHDRPDT = HDP.NUMHDRPDT AND ((HDP.TYPCPAREF != 'PF' AND T2.MNECPATRN = HDP.MNECPAREF) OR (HDP.TYPCPAREF = 'PF' AND T2.MNECPATRN` `= :B3 )) AND T2.CATTRN != 'ANNTEC' AND 0 = (SELECT MOD(COUNT(* ),2) FROM TTRNTAU T3 START WITH T3.CODAPPOPE = T2.CODAPPOPE AND T3.NUMTRAOPE =` `T2.NUMTRAOPE AND T3.CATTRN = 'ANNTEC' AND T3.NUMTRNANU = T2.NUMTRN CONNECT BY T3.CODAPPOPE = T2.CODAPPOPE AND T3.NUMTRAOPE = T2.NUMTRAOPE AND` `T3.CATTRN = 'ANNTEC' AND T3.NUMTRNANU = PRIOR T3.NUMTRN)) AND ((T1.NATTRN = 'CRE' AND NVL(T1.TYPSOUTRN,'XXX') NOT IN ('ANNPAR','ANNTOT')) OR` `(T1.NATTRN = 'ANN' AND (NVL(T1.TYPSOUTRN,'XXX') IN ('ANNPAR','ANNTOT') OR NOT EXISTS (SELECT NULL FROM TTRNTAU T3 WHERE T3.CODAPPOPE = T1.CODAPPOPE` `AND T3.NUMTRAOPE = T1.NUMTRAOPE AND T3.DATEFT = T1.DATEFT AND T3.NATTRN = 'CRE'))))` `SQL>`

If you have T.O.A.D, one useful feature is SQL formatting tool (don't get me wrong, I'm no big fan of GUI tools, I use SQL*PLUS 99% of the time ;-)

`/* Formatted on 2010/04/16 16:17 (Formatter Plus v4.8.8) /` `SELECT dateft, codappope, numtraope, numtrn, mnefld, mneptf, inirspfrt,` `mnecpafin, numhdrpdt` `FROM ttrntau t1` `WHERE t1.codappope = :b2` `AND t1.numtraope = :b1` `AND t1.dateft =` `(SELECT /+ORDERED/` `MAX (t2.dateft)` `FROM ttrntau t2, thdrpdttau hdp` `WHERE t2.codappope = t1.codappope` `AND t2.numtraope = t1.numtraope` `AND t2.numhdrpdt = hdp.numhdrpdt` `AND ( (hdp.typcparef != 'PF' AND t2.mnecpatrn = hdp.mnecparef` `)` `OR (hdp.typcparef = 'PF' AND t2.mnecpatrn = :b3)` `)` `AND t2.cattrn != 'ANNTEC'` `AND 0 =` `(SELECT MOD (COUNT (), 2)` `FROM ttrntau t3` `START WITH t3.codappope = t2.codappope` `AND t3.numtraope = t2.numtraope` `AND t3.cattrn = 'ANNTEC'` `AND t3.numtrnanu = t2.numtrn` `CONNECT BY t3.codappope = t2.codappope` `AND t3.numtraope = t2.numtraope` `AND t3.cattrn = 'ANNTEC'` `AND t3.numtrnanu = PRIOR t3.numtrn))` `AND ( ( t1.nattrn = 'CRE'` `AND NVL (t1.typsoutrn, 'XXX') NOT IN ('ANNPAR', 'ANNTOT')` `)` `OR ( t1.nattrn = 'ANN'` `AND ( NVL (t1.typsoutrn, 'XXX') IN ('ANNPAR', 'ANNTOT')` `OR NOT EXISTS (` `SELECT NULL` `FROM ttrntau t3` `WHERE t3.codappope = t1.codappope` `AND t3.numtraope = t1.numtraope` `AND t3.dateft = t1.dateft` `AND t3.nattrn = 'CRE')` `)` `)` `)`

Let's have a look at the execution plan of this query using dbms_xplan.display_cursor:

`SQL> @sqlplan amnf5uaxyn12c` `Wrote file /tmp/sqlplus_settings` `SQL_ID amnf5uaxyn12c, child number 0` `-------------------------------------` `SELECT DATEFT, CODAPPOPE, NUMTRAOPE, NUMTRN, MNEFLD, MNEPTF, INIRSPFRT, MNECPAFIN,` `NUMHDRPDT FROM TTRNTAU T1 WHERE T1.CODAPPOPE = :B2 AND T1.NUMTRAOPE = :B1 AND T1.DATEFT` `= (SELECT /+ORDERED/ MAX(T2.DATEFT) FROM TTRNTAU T2, THDRPDTTAU HDP WHERE` `T2.CODAPPOPE = T1.CODAPPOPE AND T2.NUMTRAOPE = T1.NUMTRAOPE AND T2.NUMHDRPDT =` `HDP.NUMHDRPDT AND ((HDP.TYPCPAREF != 'PF' AND T2.MNECPATRN = HDP.MNECPAREF) OR` `(HDP.TYPCPAREF = 'PF' AND T2.MNECPATRN = :B3 )) AND T2.CATTRN != 'ANNTEC' AND 0 =` `(SELECT MOD(COUNT(* ),2) FROM TTRNTAU T3 START WITH T3.CODAPPOPE = T2.CODAPPOPE AND` `T3.NUMTRAOPE = T2.NUMTRAOPE AND T3.CATTRN = 'ANNTEC' AND T3.NUMTRNANU = T2.NUMTRN` `CONNECT BY T3.CODAPPOPE = T2.CODAPPOPE AND T3.NUMTRAOPE = T2.NUMTRAOPE AND T3.CATTRN =` `'ANNTEC' AND T3.NUMTRNANU = PRIOR T3.NUMTRN)) AND ((T1.NATTRN = 'CRE' AND` `NVL(T1.TYPSOUTRN,'XXX') NOT IN ('ANNPAR','ANNTOT')) OR (T1.NATTRN = 'ANN' AND` `(NVL(T1.TYPSOUTRN,'XXX') IN ('ANNPAR','ANNTOT') OR NOT EXISTS (SELECT NULL FROM TTRNTAU` `T3 WHERE T3.CODAPPOP` `Plan hash value: 311289460` `--------------------------------------------------------------------------------------------------------------------------------` `\| Id \| Operation \| Name \| E-Rows \|E-Bytes\| Cost (%CPU)\| E-Time \| OMem \| 1Mem \| Used-Mem \|` `--------------------------------------------------------------------------------------------------------------------------------` `\|* 1 \| FILTER \| \| \| \| \| \| \| \| \|` `\| 2 \| HASH GROUP BY \| \| 1 \| 144 \| 17 (6)\| 00:00:01 \| 773K\| 773K\| 978K (0)\|` `\|* 3 \| FILTER \| \| \| \| \| \| \| \| \|` `\| 4 \| MERGE JOIN CARTESIAN \| \| 1 \| 144 \| 11 (0)\| 00:00:01 \| \| \| \|` `\| 5 \| NESTED LOOPS \| \| 1 \| 75 \| 7 (0)\| 00:00:01 \| \| \| \|` `\|* 6 \| TABLE ACCESS BY INDEX ROWID \| TTRNTAU \| 1 \| 58 \| 5 (0)\| 00:00:01 \| \| \| \|` `\|* 7 \| INDEX RANGE SCAN \| ITRN2TAU \| 1 \| \| 4 (0)\| 00:00:01 \| \| \| \|` `\|* 8 \| TABLE ACCESS BY INDEX ROWID \| THDRPDTTAU \| 1 \| 17 \| 2 (0)\| 00:00:01 \| \| \| \|` `\|* 9 \| INDEX UNIQUE SCAN \| PK_THDRPDTTAU \| 1 \| \| 1 (0)\| 00:00:01 \| \| \| \|` `\| 10 \| BUFFER SORT \| \| 1 \| 69 \| 10 (10)\| 00:00:01 \| 2048 \| 2048 \| 2048 (0)\|` `\| 11 \| TABLE ACCESS BY INDEX ROWID \| TTRNTAU \| 1 \| 69 \| 4 (0)\| 00:00:01 \| \| \| \|` `\|* 12 \| INDEX RANGE SCAN \| ITRN2TAU \| 1 \| \| 3 (0)\| 00:00:01 \| \| \| \|` `\|* 13 \| TABLE ACCESS BY INDEX ROWID \| TTRNTAU \| 1 \| 22 \| 5 (0)\| 00:00:01 \| \| \| \|` `\|* 14 \| INDEX RANGE SCAN \| ITRN2TAU \| 1 \| \| 4 (0)\| 00:00:01 \| \| \| \|` `\| 15 \| SORT AGGREGATE \| \| 1 \| 28 \| \| \| \| \| \|` `\|* 16 \| CONNECT BY WITH FILTERING \| \| \| \| \| \| 1024 \| 1024 \| \|` `\|* 17 \| TABLE ACCESS BY INDEX ROWID \| TTRNTAU \| 1 \| 50 \| 5 (0)\| 00:00:01 \| \| \| \|` `\|* 18 \| INDEX RANGE SCAN \| ITRN2TAU \| 1 \| \| 4 (0)\| 00:00:01 \| \| \| \|` `\| 19 \| NESTED LOOPS \| \| \| \| \| \| \| \| \|` `\| 20 \| CONNECT BY PUMP \| \| \| \| \| \| \| \| \|` `\|* 21 \| TABLE ACCESS BY INDEX ROWID\| TTRNTAU \| 1 \| 28 \| 5 (0)\| 00:00:01 \| \| \| \|` `\|* 22 \| INDEX RANGE SCAN \| ITRN2TAU \| 1 \| \| 4 (0)\| 00:00:01 \| \| \| \|` `--------------------------------------------------------------------------------------------------------------------------------` `Predicate Information (identified by operation id):` `---------------------------------------------------` `1 - filter("DATEFT"=MAX("T2"."DATEFT"))` `3 - filter(((("T1"."NATTRN"='CRE' AND NVL("T1"."TYPSOUTRN",'XXX')<>'ANNPAR' AND` `NVL("T1"."TYPSOUTRN",'XXX')<>'ANNTOT') OR ("T1"."NATTRN"='ANN' AND (NVL("T1"."TYPSOUTRN",'XXX')='ANNTOT' OR` `NVL("T1"."TYPSOUTRN",'XXX')='ANNPAR' OR IS NULL))) AND =0))` `6 - filter("T2"."CATTRN"<>'ANNTEC')` `7 - access("T2"."CODAPPOPE"=:B2 AND "T2"."NUMTRAOPE"=:B1)` `8 - filter((("T2"."MNECPATRN"="HDP"."MNECPAREF" AND "HDP"."TYPCPAREF"<>'PF') OR ("HDP"."TYPCPAREF"='PF' AND` `"T2"."MNECPATRN"=:B3)))` `9 - access("T2"."NUMHDRPDT"="HDP"."NUMHDRPDT")` `12 - access("T1"."CODAPPOPE"=:B2 AND "T1"."NUMTRAOPE"=:B1)` `13 - filter("T3"."NATTRN"='CRE')` `14 - access("T3"."CODAPPOPE"=:B1 AND "T3"."NUMTRAOPE"=:B2 AND "T3"."DATEFT"=:B3) filter("T3"."DATEFT"=:B1)` `16 - access("T3"."NUMTRNANU"=PRIOR NULL)` `17 - filter(("T3"."NUMTRNANU"=:B1 AND "T3"."CATTRN"='ANNTEC'))` `18 - access("T3"."CODAPPOPE"=:B1 AND "T3"."NUMTRAOPE"=:B2)` `21 - filter(("T3"."NUMTRNANU"=PRIOR NULL AND "T3"."CATTRN"='ANNTEC'))` `22 - access("T3"."CODAPPOPE"=:B1 AND "T3"."NUMTRAOPE"=:B2)` `Note` `-----` `- Warning: basic plan statistics not available. These are only collected when:` `* hint 'gather_plan_statistics' is used for the statement or` `* parameter 'statistics_level' is set to 'ALL', at session or system level` `72 rows selected.` `SQL>`

The operation connect by with filtering (lines 16 to 22 of the plan) is used to process hierarchical queries. It is characterized by two child operations. The first one is used to get the root of the hierarchy, and the second one is executed once for each level in the hierarchy. Each of those two child operations includes an index range scan of itrn2tau.

Looking at lines 4 to 12 of the plan, it is assuming that the result of the nested loops (lines 5 to 9) between ttrntau (accessing index itrn2tau) and thdrpdttau will return one row. It is also assuming that accessing table ttrntau by index itrn2tau will return 1 row as well (line 11-12). Then it just puts those two rows together with a cartesian product and joins that result to table ttrntau accessed by index itrn2tau (line 13-14) with a filter operation (line 3). It is because of the low cardinality expected that it went using a merge join cartesian. It is a perfectly valid approach and is very fast here.

The filter operation used for last join is very similar to a nested loop join - for each row in the driving table, Oracle operates the filter condition to determine whether or not to keep a row. However, filter can be much more efficient than nested loop, because it can remember results of previous probes into the second table - effectively making the probe an in-memory lookup.

As seen previously, for a single query execution, we access index itrn2tau several times. Furthermore, this query was executed concurrently by many sessions, that is why the latches were hammered so much. This is all about scalability, the query is running fine on a single user system but does not scale when run concurrently by hundreds of users.

Fortunately, this query was in-house developed so I asked the dev team to work on it.

标签：00,4.425,Fix,cache,T2,T3,T1,high
From： https://blog.51cto.com/u_15794314/5891288

相关文章

赞助商

阅读排行