一、锁超时
-
先查看当前事务,看看有没有事务时间超时的
SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;
查到如下结果:
INSERT INTO information_schema.INNODB_TRX (trx_id,
trx_state,
trx_started,
trx_requested_lock_id,
trx_wait_started, trx_weight,
trx_mysql_thread_id,
trx_query, trx_operation_state,
trx_tables_in_use,
trx_tables_locked,
trx_lock_structs,
trx_lock_memory_bytes,
trx_rows_locked,
trx_rows_modified,
trx_concurrency_tickets,
trx_isolation_level,
trx_unique_checks,
trx_foreign_key_checks,
trx_last_foreign_key_error,
trx_adaptive_hash_latched,
trx_adaptive_hash_timeout,
trx_is_read_only,
trx_autocommit_non_locking)
VALUES ('560205677', 'RUNNING', '2022-08-13 12:07:28', null, null, 4, 8297971, null, null, 0, 1, 3, 1136, 2, 1, 0, 'READ COMMITTED', 1, 1, null, 0, 0, 0, 0);
(上面的数据是我从数据库里直接copy出来的)
如果trx_started这个时间距离现在已经过去很久了,说明这个事务一直没提交,我们需要把这个进程杀掉,这个放在最后处理,因为我们还要查看多少事务被这个影响了,以及查到这个长事务正在执行什么sql?
- 查看锁表或者锁行的事务
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS;
(mysql 8.0以上这个表已经替换成了performance_schema.data_locks)
- 查看等待锁释放的有哪些事务
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS;
(mysql 8.0以上这个表已经替换成了performance_schema.data_lock_waits)
- 查看超时的事务是在执行什么sql
SELECT t.trx_mysql_thread_id AS connection_id
,t.trx_id AS trx_id
,t.trx_state AS trx_state
,t.trx_started AS trx_started
,TIMESTAMPDIFF(SECOND,t.trx_started, now()) AS "trx_run_time(s)"
,t.trx_requested_lock_id AS trx_requested_lock_id
,t.trx_operation_state AS trx_operation_state
,t.trx_tables_in_use AS trx_tables_in_use
,t.trx_tables_locked AS trx_tables_locked
,t.trx_rows_locked AS trx_rows_locked
,t.trx_isolation_level AS trx_isolation_level
,t.trx_is_read_only AS trx_is_read_only
,t.trx_autocommit_non_locking AS trx_autocommit_non_locking
,e.event_name AS event_name
,e.timer_wait / 1000000000000 AS timer_wait
,e.sql_text
FROM information_schema.innodb_trx t,
performance_schema.events_statements_current e,
performance_schema.threads c
WHERE t.trx_mysql_thread_id = c.processlist_id
AND e.thread_id = c.thread_id;
-
最后查到结果后,把长事务给杀掉
把第一步查到的trx_mysql_thread_id拿出来,kill掉
kill trx_mysql_thread_id;
-
mysql无法检测长事务,会导致线上sql一直被这个事务占据
我们可以使用pt-kill写个脚本去自动处理,具体操作这里不明细
附上其他查询的sql
1.查询锁住数据库事务正在执行的sql
-- 查询锁住数据库事务正在执行的sql
select hs.THREAD_ID
,hs.EVENT_ID
,iet.trx_id
,hs.SQL_TEXT as historySql -- 这个事务历史执行的sql
,iet.SQL_TEXT as currentSql -- 这个事务当前执行的sql
,iet.trx_started -- 事务开始时间
, DATE_SUB(NOW(),INTERVAL (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='UPTIME')- TIMER_START*10e-13 second) AS start_time
, DATE_SUB(NOW(),INTERVAL (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='UPTIME')- TIMER_END*10e-13 second) AS end_time
from performance_schema.events_statements_history hs,
(select e.thread_id,t.trx_id,t.trx_started,e.SQL_TEXT
FROM information_schema.innodb_trx t,
performance_schema.events_statements_current e,
performance_schema.threads c
WHERE t.trx_mysql_thread_id = c.processlist_id
AND e.thread_id = c.thread_id) iet
where hs.THREAD_ID =iet.THREAD_ID
order by THREAD_ID,start_time asc;
2.查看被锁住的sql正在等待什么事务,执行什么sql
-- 查看被锁住的sql是在等待什么事务,执行什么sql
select hs.THREAD_ID
,hs.EVENT_ID
,iet.trx_id
,hs.SQL_TEXT as historySql -- 这个事务历史执行的sql
,iet.SQL_TEXT as currentSql -- 这个事务当前执行的sql
,iet.trx_started -- 事务开始时间
, DATE_SUB(NOW(),INTERVAL (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='UPTIME')- TIMER_START*10e-13 second) AS start_time
, DATE_SUB(NOW(),INTERVAL (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='UPTIME')- TIMER_END*10e-13 second) AS end_time
from performance_schema.events_statements_history hs,
(select e.thread_id,t.trx_id,t.trx_started,e.SQL_TEXT
FROM information_schema.innodb_trx t,
performance_schema.events_statements_current e,
performance_schema.threads c
WHERE t.trx_mysql_thread_id = c.processlist_id
AND e.thread_id = c.thread_id) iet,
performance_schema.data_lock_waits dlw
where hs.THREAD_ID =iet.THREAD_ID
and hs.THREAD_ID=dlw.BLOCKING_THREAD_ID
and hs.EVENT_ID=dlw.BLOCKING_EVENT_ID -- 加上这行查的是最后一条执行的sql,这行注释掉就是查事务历史的执行sql
order by THREAD_ID,start_time asc;
二、死锁
死锁的排查和锁超时差不多,死锁的信息可以通过show innodb status命令从数据库查询到,里面有死锁执行的具体sql
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-08-09 08:53:11 140130527762176
*** (1) TRANSACTION:
TRANSACTION 18445008, ACTIVE 19 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 22862, OS thread handle 140126242903808, query id 405981864 192.168.60.123 root updating
/* ApplicationName=DataGrip 2022.3.3 */ update edu_health_student.tb_student_attend_1 set school_name ="龙港市第九小学1" where student_base_id=206470646604544
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 1253 page no 16 n bits 792 index indx_student_id of table `edu_health_student`.`tb_student_attend_1` trx id 18445008 lock_mode X
Record lock, heap no 51 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 8000bcefa1ab2b00; asc + ;;
1: len 8; hex 8000bcefa1ab2b00; asc + ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1253 page no 16 n bits 792 index indx_student_id of table `edu_health_student`.`tb_student_attend_1` trx id 18445008 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 8000bbc8b18c2300; asc # ;;
1: len 8; hex 8000bbc8b18c2300; asc # ;;
*** (2) TRANSACTION:
TRANSACTION 18445009, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 22833, OS thread handle 140126251333376, query id 405981874 192.168.60.123 root updating
/* ApplicationName=DataGrip 2022.3.3 */ update edu_health_student.tb_student_attend_1 set school_name ="温州市蒲鞋市小学龟湖校区1" where student_base_id=207737395555072
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1253 page no 16 n bits 792 index indx_student_id of table `edu_health_student`.`tb_student_attend_1` trx id 18445009 lock_mode X
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 8000bbc8b18c2300; asc # ;;
1: len 8; hex 8000bbc8b18c2300; asc # ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1253 page no 16 n bits 792 index indx_student_id of table `edu_health_student`.`tb_student_attend_1` trx id 18445009 lock_mode X waiting
Record lock, heap no 51 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 8000bcefa1ab2b00; asc + ;;
1: len 8; hex 8000bcefa1ab2b00; asc + ;;
*** WE ROLL BACK TRANSACTION (2)
这里很清楚地能看到了死锁正在执行的sql,HOLDS THE LOCK(持有的锁)和WAITING FOR THIS LOCK TO BE GRANTED(等待被释放的锁),这种就是两个事务发生互相等待了,一般会有两种情况:
- 业务逻辑上操作相反的sql
- 逻辑上顺序一致,但是sql产生了间隙锁,两个sql互相锁住了间隙。可以看下上面日志里是否有gap关键字来判断