MySQL锁等待超时一例分析

标签：NULL join SQL buffer 一例 time MySQL 超时 id

1、问题现象

开发反馈某业务持续性报锁等待超时，相关错误信息如下：

Lock wait timeout exceeded; try restarting transaction

为了能精确定位问题，继续询问开发有没有锁等待超时相关SQL，开发又给了相关报错SQL：

INSERT INTO <TABLE_NAME> VALUES(...)

2、分析诊断

根据错误信息得知，单条insert语句锁等待超时，如果都是单条insert插入，不应该频繁报锁超时，似乎有点不寻常，当前数据库版本为5.6，锁等待超时参数设置时长30秒：

root@ (none)> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 30 |
+--------------------------+-------+

查看慢日志及 show engine innodb status\G ，发现有批量插入动作，由于自增锁竞争产生死锁：

询问开发，批量插入SQL为定时作业，查看当前innodb_autoinc_lock_mode参数设置：

+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+

innodb_autoinc_lock_mode=1，对于批量插入语句，需要等到语句执行结束才释放自增锁，故要解决锁等待超时，可以将参数值设置为2，但该参数为静态参数需要重启MySQL才能生效，不能重启情况下只能优化SQL执行时间，查看慢日志得知SQL执行一次需要100+秒，扫描行数86w，结果集却为0，说明SQL有优化空间：

Query_time: 108.527499 Lock_time: 0.000342 Rows_sent: 0 Rows_examined: 862584

分析SQL执行计划：

SELECT *
from ( SELECT * from aa WHERE add_time >= '2022-10-01' ) a
left JOIN ( SELECT * from bb WHERE add_time >= '2022-10-01' ) b
on a.account = b.accountb and a.end_time = b.end_timeb and a.app_id = b.app_idb WHERE
b.accountb is null;
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL |
NULL | 2722 | NULL |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | bb | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where |
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time | 6 |
NULL | 2722 | Using index condition |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
4 rows in set (0.00 sec)

SQL有子查询，使用到了派生表，首先执行子查询

( SELECT * from aa WHERE add_time >= '2022-10-01' )

将结果集存入临时表derived2 ，然后执行子查询

( SELECT **** from bb WHERE add_time >= '2022-10-01' )

将结果集存入临时表derived3，最后derived2和derived3根据关联条件做表关联，使用Block Nested Loop算法，即使表chat_black（account , app_id , end_time）列有复合索引也使用不到。如果mysql版本是5.7的话，optimizer_switch 参数会增加一个选项：derived_merge=on ，满足一定条件，即子查询中没有如下条件：

Aggregate fun

ctions ( SUM() , MIN() , MAX() , COUNT() , and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subqueries in the select list
Assignments to user variables
Refererences only to literal values (in this case, there is no underlying table)

子查询将被合并到外层查询。

3、问题解决

知道SQL慢的原因后，对SQL进行改写，执行计划如下：

SELECT * FROM (
select * from aa where add_time >= '2022-10-01') a
left join bb b
on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time =
b.end_time and a.app_id = b.app_id)
where b.account is null;
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL
| NULL | NULL | 3096 | NULL |
| 1 | PRIMARY | b | ref | idx_ac_app_id_end_time |
idx_ac_app_id_end_time | 1542 | a.account,a.app_id,a.end_time | 1 | Using where
|
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time
| 6 | NULL | 3096 | Using index condition |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+---------------------

执行时间从原来的100+秒降低不到1秒：

root@ xsj_chat_filter> SELECT count(*) FROM (
-> select * from aa where add_time >= '2022-10-01') a
-> left join bb b
-> on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time
= b.end_time and a.app_id = b.app_id)
-> where b.account is null;
+----------+
| count(*) |
+----------+
| 23       |
+----------+
1 row in set (0.65 sec)

执行时间短了，自然就不存在自增锁等待超时了。

相关连接：转载于：https://blog.sina.com.cn/s/blog_a1e9c7910102x1bz.html

一、背景

从事DBA工作两年多以来，经常会遇到开发上线的SQL中含有多表关联join的查询，我自己本身也是比较抗拒的，很多DBA一般会建议开发拆分SQL，避免join带来的性能问题。但是我始终认为，任何事物存在必然有它的理由，不能全盘否定它！在5.5版本之前，MySQL只支持一种表间关联方式，也就是嵌套循环(Nested Loop)。如果关联的表数据量很大，那么join关联的时间会很长。在5.5版本以后，MySQL引入了BNL算法来优化嵌套循环。

二、Nested Loop Join算法

NLJ算法：将外层表的结果集作为循环的基础数据，然后循环从该结果集每次一条获取数据作为下一个表的过滤条件去查询数据，然后合并结果。如果有多个表join，那么应该将前面的表的结果集作为循环数据，取结果集中的每一行再到下一个表中继续进行循环匹配，获取结果集并返回给客户端。该算法的伪代码为：

1for each row in t1 matching range {

2 for each row in t2 matching reference key {

3 for each row in t3 {

4 if row satisfies join conditions,

5 send to client

6 }

7 }

8 }

普通的Nested-Loop Join算法一次只能将一行数据传入内存循环，所以外层循环结果集有多少行，那么内存循环就要执行多少次。

三、Block Nested-Loop Join算法

BNL算法原理：将外层循环的行/结果集存入join buffer，内存循环的每一行数据与整个buffer中的记录做比较，可以减少内层循环的扫描次数

举个简单的例子：外层循环结果集有1000行数据，使用NLJ算法需要扫描内层表1000次，但如果使用BNL算法，则先取出外层表结果集的100行存放到join buffer, 然后用内层表的每一行数据去和这100行结果集做比较，可以一次性与100行数据进行比较，这样内层表其实只需要循环1000/100=10次，减少了9/10。

伪代码如下：

1for each row in t1 matching range {

2 for each row in t2 matching reference key {

3 store used columns from t1, t2 in join buffer

4 if buffer is full {

5 for each row in t3 {

6 for each t1, t2 combination in join buffer {

7 if row satisfies join conditions,

8 send to client

9 }

10 }

11 empty buffer

12 }

13 }

14}

17if buffer is not empty {

18 for each row in t3 {

19 for each t1, t2 combination in join buffer {

20 if row satisfies join conditions,

21 send to client

22 }

23 }

24}

我们用一个示意图来简明阐述一下：

我们从这个图可以看到，BNL算法把t1和t2的结果集存放到join buffer中（t1表和t2表的关联，仍然是取出t1表的每一行记录和t2表的每一行记录进行匹配，t1表仍然是full scan全表扫描），而不是每次从t1表取出一条记录和t2表进行能过匹配得出结果集，就马上和t3进行关联。

对于放入到join buffer的列，是指所有参与查询的列，而不是只有join的列。如下面的这个查询：

SELECT a.col3 FROM a,b

WHERE a.col1 = b.col2

AND a.col2 > …. AND b.col2 = …

上述SQL语句外表是a,内表是b，那么存放在join buffer的列是所有参与查询的列，这里是：(a.col1、a.col2、a.col3).

我们可以从explain的sql执行计划中看到Extra有“using join buffer”，则说明mysql使用了join buffer来对sql进行优化.

使用join buffer的要点：

1）、join buffer size变量大小决定了buffer的大小

2）、只有在join类型为all 或者index或者range的时候，才可以使用join buffer（也就是说explain执行计划中，type为all或者index或者range的时候，才会出现Using join buffer）。如：

id: 1

select_type: SIMPLE

table: c

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 96

Extra: Using where; Using join buffer

这里type: ALL，所以可以使用join buffer.

3）、在join之前就会分配join buffer，在query执行完毕之后，立即释放buffer

标签：NULL,join,SQL,buffer,一例,time,MySQL,超时,id
From： https://www.cnblogs.com/harda/p/16993830.html

MySQL锁等待超时一例分析

1、问题现象

2、分析诊断

3、问题解决

相关文章

赞助商

阅读排行