最近一个MySQL 5.7.21备库告警当天的备份失败,登录上去看的时候发现前一天的备份任务还没有结束,通过查看日志发现无法备份成功的原因是一直无法获取FTWRL锁,登录MySQL查看会话状态发现其中几个复制worker一致处于异常状态,下发STOP SLAVE命令时命令也一直被卡住,当时的会话状态如下:
> show processlist; +---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+ | 1 | event_scheduler | localhost | NULL | Daemon | 47179666 | Waiting on empty queue | NULL | | 101 | system user | | NULL | Connect | 47085737 | Waiting for master to send event | NULL | | 102 | system user | | NULL | Connect | 112325 | Waiting for dependent transaction to commit | NULL | | 103 | system user | | NULL | Connect | 116446 | Waiting for commit lock | NULL | | 104 | system user | | NULL | Connect | 116434 | Waiting for preceding transaction to commit | NULL | | 105 | system user | | NULL | Connect | 590564 | Waiting for an event from Coordinator | NULL | | 106 | system user | | NULL | Connect | 2902104 | Waiting for an event from Coordinator | NULL | | 107 | system user | | NULL | Connect | 7057083 | Waiting for an event from Coordinator | NULL | | 108 | system user | | NULL | Connect | 10948598 | Waiting for an event from Coordinator | NULL | | 109 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 110 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 111 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 112 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 113 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 114 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 115 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 116 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 117 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 118 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 119 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 120 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 121 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 122 | system user | | NULL | Connect | 47085737 | Waiting for an event from Coordinator | NULL | | 4409099 | backupxxx | xxx.xxx.xxx.xx:48300 | NULL | Query | 112222 | Waiting for commit lock | FLUSH TABLES WITH READ LOCK | +---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+根据当时的会话状态我们可以得知是发生了元数据锁等待,不过由于这个实例并没有开启元数据锁跟踪,所以并不能通过metadata_locks表来确认元数据锁的详细情况。不过有经验的DBA通过会话状态及故障现象应该能够猜测出来是发生了并行复制死锁,所以当时为了保留现场并快速恢复复制,就打了一个pstack便于后面详细分析问题,并通过kill掉当时的备份任务和会话来解除死锁状态,使复制能够正常运行下去。 接下来我们来分析一下当时的堆栈,还原一下当时MySQL的状态及问题发生的原因。 将堆栈归类,并将可能对我们分析问题有用的线程堆栈提取整理后如下:
#1. Waiting for dependent transaction to commit pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Mts_submode_logical_clock::wait_for_last_committed_trx(thr_cond.h:140),我们给这5个线程编了号并大致说明了一下线程的状态,接下来我们详细分析一下每个线程的状态。 先看第一个线程#1所处的函数为Mts_submode_logical_clock::wait_for_last_committed_trx:
Mts_submode_logical_clock::schedule_next_event(rpl_mts_submode.cc:657),schedule_next_event(log_event.cc:2768),Log_event::get_slave_worker(log_event.cc:2909),Log_event::apply_event(log_event.cc:3468),
apply_event_and_update_pos(rpl_slave.cc:4713),exec_relay_log_event(rpl_slave.cc:5240),handle_slave_sql(rpl_slave.cc:5240),pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #2. 持有意向全局读锁(IX) Waiting for preceding transaction to commit pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Commit_order_manager::wait_for_its_turn(thr_cond.h:140),
MYSQL_BIN_LOG::ordered_commit(binlog.cc:9245),MYSQL_BIN_LOG::commit(binlog.cc:8529),ha_commit_trans(handler.cc:1796),trans_commit(transaction.cc:239),Xid_log_event::do_commit(log_event.cc:7016),
Xid_apply_log_event::do_apply_event_worker(log_event.cc:7111),slave_worker_exec_job_group(rpl_rli_pdb.cc:2594),handle_slave_worker(rpl_slave.cc:6161),pfs_spawn_thread(pfs.cc:2190),
start_thread(libpthread.so.0),clone(libc.so.6) #3. STOP SLAVE 等待worker完成当前任务并退出 pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),terminate_slave_thread(thr_cond.h:129),
terminate_slave_thread(rpl_slave.cc:1665),terminate_slave_threads(rpl_slave.cc:1665),stop_slave(rpl_slave.cc:10172),stop_slave(rpl_slave.cc:618),stop_slave_cmd(rpl_slave.cc:759),
mysql_execute_command(sql_parse.cc:3438),mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),
pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #4. 等待获取意向全局读锁(IX) Waiting for commit lock pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),MDL_wait::timed_wait(thr_cond.h:129),
MDL_context::acquire_lock(mdl.cc:3695),ha_commit_trans(handler.cc:1761),trans_commit(transaction.cc:239),Xid_log_event::do_commit(log_event.cc:7016),
Xid_apply_log_event::do_apply_event_worker(log_event.cc:7111),slave_worker_exec_job_group(rpl_rli_pdb.cc:2594),handle_slave_worker(rpl_slave.cc:6161),
pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #5. 备份线程 等待获取全局读锁(X) Waiting for commit lock pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),MDL_wait::timed_wait(thr_cond.h:129),
MDL_context::acquire_lock(mdl.cc:3695),Global_read_lock::make_global_read_lock_block_commit(lock.cc:1196),reload_acl_and_cache(sql_reload.cc:234),mysql_execute_command(sql_parse.cc:4170),
mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2190),
start_thread(libpthread.so.0),clone(libc.so.6)
thd->ENTER_COND(&rli->logical_clock_cond, &rli->mts_gaq_LOCK, &stage_worker_waiting_for_commit_parent, &old_stage); do { mysql_cond_wait(&rli->logical_clock_cond, &rli->mts_gaq_LOCK); } while ((!rli->info_thd->killed && !is_error) && !clock_leq(last_committed_arg, estimate_lwm_timestamp()));其所处的stage为“Waiting for dependent transaction to commit”。目前其停留在do...while循环中,其正常需要满足两个条件才能跳出循环,第一:last_committed <= lwm,即满足事务满足并行回放的条件;第二:接收到logical_clock_cond条件变量,logical_clock_cond条件变量为worker回放完事务后会判断当前是否还有事务等待,如果存在则发送logical_clock_cond条件变量唤醒等待的coordinator线程。 线程#2所处的函数为Commit_order_manager::wait_for_its_turn,该函数被函数MYSQL_BIN_LOG::ordered_commit调用,其逻辑处于两阶段提交的FLSH STAGE之前,用于判断从库提交事务时是否需要按照和主库一样的顺序进行提交(slave_preserve_commit_order)。线程所处的位置为:
while (queue_front() != worker->id) { if (unlikely(worker->found_order_commit_deadlock())) { mysql_mutex_unlock(&m_mutex); thd->EXIT_COND(&old_stage); DBUG_RETURN(true); } mysql_cond_wait(cond, &m_mutex); }其也处在一个while循环中,其跳出循环的条件为只有当前worker是worker队列的头部时才可以。worker应用事务时,不仅会把事务分配给worker,还会将worker注册到工作队列中,其入队的堆栈如下:
#0 Commit_order_manager::queue_push (this=0x7fb1a8007f30, index=0) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.h:123 #1 0x000000000183ba79 in Commit_order_manager::register_trx (this=0x7fb1a8007f30, worker=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.cc:49 #2 0x000000000183adb9 in Mts_submode_logical_clock::get_least_occupied_worker (this=0x7fb1a8007ee0, rli=0xb340c70, ws=0xb343fa8, ev=0x7fb1a8691ce0) at /soft/mysql-5.7.21/sql/rpl_mts_submode.cc:916 #3 0x00000000017978b9 in Log_event::get_slave_worker (this=0x7fb1a8691ce0, rli=0xb340c70) at /soft/mysql-5.7.21/sql/log_event.cc:2954 #4 0x0000000001798f77 in Log_event::apply_event (this=0x7fb1a8691ce0, rli=0xb340c70) at /soft/mysql-5.7.21/sql/log_event.cc:3468 #5 0x000000000180b5a4 in apply_event_and_update_pos (ptr_ev=0x7fb255aca8c0, thd=0x7fb1a8000950, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_slave.cc:4713 #6 0x000000000180cc3c in exec_relay_log_event (thd=0x7fb1a8000950, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_slave.cc:5240 #7 0x00000000018131e3 in handle_slave_sql (arg=0xb2a31b0) at /soft/mysql-5.7.21/sql/rpl_slave.cc:7377 #8 0x0000000001ce0555 in pfs_spawn_thread (arg=0x7fb1b00ddab0) at /soft/mysql-5.7.21/storage/perfschema/pfs.cc:2190 #9 0x00007fb2f7745ea5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fb2f660bb0d in clone () from /lib64/libc.so.6其出队的堆栈如下:
#0 Commit_order_manager::queue_pop (this=0x7fb1a8007f30) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.h:111 #1 0x000000000183bee9 in Commit_order_manager::unregister_trx (this=0x7fb1a8007f30, worker=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.cc:126 #2 0x00000000017cd0a4 in Stage_manager::enroll_for (this=0x2d1c418 <mysql_bin_log+4568>, stage=Stage_manager::FLUSH_STAGE, thd=0x7fb1ac000950, stage_mutex=0x0) at /soft/mysql-5.7.21/sql/binlog.cc:2000 #3 0x00000000017dd652 in MYSQL_BIN_LOG::change_stage (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, stage=Stage_manager::FLUSH_STAGE, queue=0x7fb1ac000950, leave_mutex=0x0, enter_mutex=0x2d1b248 <mysql_bin_log+8>) at /soft/mysql-5.7.21/sql/binlog.cc:8854 #4 0x00000000017de2f6 in MYSQL_BIN_LOG::ordered_commit (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, all=false, skip_commit=false) at /soft/mysql-5.7.21/sql/binlog.cc:9251 #5 0x00000000017dcbbe in MYSQL_BIN_LOG::commit (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, all=false) at /soft/mysql-5.7.21/sql/binlog.cc:8529 #6 0x0000000001632b78 in trans_commit_stmt (thd=0x7fb1ac000950) at /soft/mysql-5.7.21/sql/transaction.cc:463 #7 0x000000000153895a in mysql_execute_command (thd=0x7fb1ac000950, first_level=true) at /soft/mysql-5.7.21/sql/sql_parse.cc:5004 #8 0x0000000001539f91 in mysql_parse (thd=0x7fb1ac000950, parser_state=0x7fb255a88a80) at /soft/mysql-5.7.21/sql/sql_parse.cc:5582 #9 0x000000000179cf62 in Query_log_event::do_apply_event (this=0x7fb1a8691ce0, rli=0x7fb1a8022a60, query_arg=0x7fb1a869f2d7 "create table t(id int)", q_len_arg=22) at /soft/mysql-5.7.21/sql/log_event.cc:4711 #10 0x000000000179be51 in Query_log_event::do_apply_event (this=0x7fb1a8691ce0, rli=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/log_event.cc:4431 #11 0x00000000017bf204 in Log_event::do_apply_event_worker (this=0x7fb1a8691ce0, w=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/log_event.cc:758 #12 0x0000000001834bee in Slave_worker::slave_worker_exec_event (this=0x7fb1a8022a60, ev=0x7fb1a8691ce0) at /soft/mysql-5.7.21/sql/rpl_rli_pdb.cc:1833 #13 0x0000000001836bb8 in slave_worker_exec_job_group (worker=0x7fb1a8022a60, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_rli_pdb.cc:2594 #14 0x000000000180f9aa in handle_slave_worker (arg=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave.cc:6161 #15 0x0000000001ce0555 in pfs_spawn_thread (arg=0x7fb1a8027000) at /soft/mysql-5.7.21/storage/perfschema/pfs.cc:2190 #16 0x00007fb2f7745ea5 in start_thread () from /lib64/libpthread.so.0即事务在ordered_commit中当在切换到FLUSH STAGE的过程中会进行出队,因为此时已经可以通过FLSUH STAGE的&LOCK_LOG互斥量来保证该事务组一定会比后续的事务组先提交。 线程#3是我们下发STOP SLAVE命令的线程,其处于terminate_slave_thread函数中,其正在退出SQL线程:
while (*slave_running) // Should always be true { DBUG_PRINT("loop", ("killing slave thread")); mysql_mutex_lock(&thd->LOCK_thd_data); /* Error codes from pthread_kill are: EINVAL: invalid signal number (can't happen) ESRCH: thread already killed (can happen, should be ignored) */ #ifndef _WIN32 int err MY_ATTRIBUTE((unused))= pthread_kill(thd->real_id, SIGUSR1); DBUG_ASSERT(err != EINVAL); #endif thd->awake(THD::NOT_KILLED); mysql_mutex_unlock(&thd->LOCK_thd_data); /* There is a small chance that slave thread might miss the first alarm. To protect againts it, resend the signal until it reacts */ struct timespec abstime; set_timespec(&abstime,2); #ifndef DBUG_OFF int error= #endif mysql_cond_timedwait(term_cond, term_lock, &abstime); // 在这等待 if ((*stop_wait_timeout) >= 2) (*stop_wait_timeout)= (*stop_wait_timeout) - 2; else if (*slave_running) { if (need_lock_term) mysql_mutex_unlock(term_lock); DBUG_RETURN (1); } DBUG_ASSERT(error == ETIMEDOUT || error == 0); } DBUG_ASSERT(*slave_running == 0); if (need_lock_term) mysql_mutex_unlock(term_lock); DBUG_RETURN(0); }其也是处于一个while循环中,循环判断接收线程退出条件变量term_cond,退出条件变量的触发只有当worker线程发现了mi->abort_slave = true退出标记并完成当前的event_group之后才会退出(详细分析可见https://www.cnblogs.com/wagaga/p/17286379.html),所以如果当前worker线程存在未完成的event,用户的STOP SLAVE命令就无法执行完成。 线程#4正处于ha_commit_trans函数中,正在等待获取commit lock:
if (rw_trans && !ignore_global_read_lock) { /* Acquire a metadata lock which will ensure that COMMIT is blocked by an active FLUSH TABLES WITH READ LOCK (and vice versa: COMMIT in progress blocks FTWRL). We allow the owner of FTWRL to COMMIT; we assume that it knows what it does. */ MDL_REQUEST_INIT(&mdl_request, MDL_key::COMMIT, "", "", MDL_INTENTION_EXCLUSIVE, MDL_EXPLICIT); DBUG_PRINT("debug", ("Acquire MDL commit lock")); if (thd->mdl_context.acquire_lock(&mdl_request, thd->variables.lock_wait_timeout)) { ha_rollback_trans(thd, all); DBUG_RETURN(1); } release_mdl= true; DEBUG_SYNC(thd, "ha_commit_trans_after_acquire_commit_lock"); }即事务在提交之前需要先获取意向独占元数据锁(commit lock),如果当前实例存在全局读锁,则事务就会被阻塞,不能提交。 此时让我们再回头看线程#2,发现其堆栈中间也有ha_commit_trans函数,说明其已经获取到了意向独占元数据锁。 最后看线程#5,其很简单,就是由于xtrabackup备份而触发的需要获取全局读锁,且正处于等待状态。 此时,我们把待分析的线程都分析了一遍,这时候我们就可以理清他们之间的关系了:
- 线程#1可能在等待线程#2,也可能在等待线程#4,不过不重要,因为它也只是受害者
- 线程#2持有元数据意向独占锁(IX),并等待线程#4的提交
- 线程#3等待线程#2和#4完成后停止复制线程,不重要,受害者
- 线程#4想要获取意向独占锁(IX),但是却被阻塞,所以在等待线程#5
- 线程#5想要获取独占锁(X),但是却被阻塞,所以在等待线程#2