最近遇到一个Mysql由于create table时由于从库表表空间问题,从库sql重放进程执行sql失败,导致从库的sql应用进程挂掉。客户反馈,可以跳过失败的create table事务,本博文展示mysql 基于GITD模式跳过错误CTID事务的正确方法。
1.0 明确错误原因
root@mysqldb 15:53: [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.56.2
Master_User: repl
Master_Port: 9506
Connect_Retry: 60
Master_Log_File: mybinlog.006091
Read_Master_Log_Pos: 768927745
Relay_Log_File: TEST-DB-2-relay-bin.009546
Relay_Log_Pos: 543586669
Relay_Master_Log_File: mybinlog.006090
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1813
Last_Error: Error 'Tablespace '`test`.`p_gov_......_temp`' exists.' on query. Default database: 'test'. Query: 'CREATE TABLE `p_gov_......_temp` (
......
Skip_Counter: 0
Exec_Master_Log_Pos: 543586536
Relay_Log_Space: 10836318250
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1813
Last_SQL_Error: Error 'Tablespace '`test`.`p_gov_......_temp`' exists.' on query. Default database: 'test'. Query: 'CREATE TABLE `p_gov_......_temp` (
......
Replicate_Ignore_Server_Ids:
Master_Server_Id: 9506002
Master_UUID: 4efcbd9f-c1e2-11eb-b710-0050569e2112
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 230607 15:40:49
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 4efcbd9f-c1e2-11eb-b710-0050569e2112:1217191452-1442042781
Executed_Gtid_Set: 24fce019-edde-11ec-8f9e-0050569e562f:1-26407,
4efcbd9f-c1e2-11eb-b710-0050569e2112:1-1441877106:1441877108-1441877109,
54c6f166-c1e4-11eb-b023-0050569e562f:1
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
root@mysqldb 15:53: [(none)]>
通过观察show slave status可以看到从库192.168.56.2的Slave_Sql由于1813错误,具体原因是表空间问题导致create table失败,引起从库Slave_Sql进程挂起。
2.0 问题处理方法
经过沟通,允许跳过失败的GTID事务。至于如何跳过GITD事务,网上有很多资料,有的从show slave status输出结果中跳过Retrieved_Gtid_Set,有的是Executed_Gtid_Set拼凑的,可以说均是错误的。另外,有些说从binlog里边挖,其实没有那么麻烦,要跳过失败事务的GTID可以通过sql直接查询到。
root@mysqldb 16:20: [(none)]> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 0
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1813
LAST_ERROR_MESSAGE: Error 'Tablespace '`test`.`p_gov_......_temp`' exists.' on query. Default database: 'test'. Query: 'CREATE TABLE `p_gov_......_temp` (
......
LAST_ERROR_TIMESTAMP: 2023-06-07 16:11:59.433420
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION: 4efcbd9f-c1e2-11eb-b710-0050569e2112:1441877107 《《《《《《《《《《《要跳过的事务
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2023-06-07 14:42:22.913515
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2023-06-07 14:42:22.913515
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2023-06-07 16:11:59.392908
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)
查询结果中 APPLYING_TRANSACTION: 4efcbd9f-c1e2-11eb-b710-0050569e2112:1441877107 就是要跳过的事务GTID号。
3.0 GTID跳过失败事务执行过程
root@mysqldb 16:31: [(none)]> SET GTID_NEXT='4efcbd9f-c1e2-11eb-b710-0050569e2112:1441877107';
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 16:33: [(none)]> select @@gtid_next;
+-------------------------------------------------+
| @@gtid_next |
+-------------------------------------------------+
| 4efcbd9f-c1e2-11eb-b710-0050569e2112:1441877107 |
+-------------------------------------------------+
1 row in set (0.00 sec)
root@mysqldb 16:33: [(none)]> BEGIN; COMMIT;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
root@mysqldb 16:33: [(none)]> select @@gtid_next;
+-------------------------------------------------+
| @@gtid_next |
+-------------------------------------------------+
| 4efcbd9f-c1e2-11eb-b710-0050569e2112:1441877107 |
+-------------------------------------------------+
1 row in set (0.00 sec)
root@mysqldb 16:33: [(none)]> SET SESSION GTID_NEXT = AUTOMATIC;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 16:34: [(none)]> select @@gtid_next;
+-------------+
| @@gtid_next |
+-------------+
| AUTOMATIC |
+-------------+
1 row in set (0.00 sec)
root@mysqldb 16:34: [(none)]>
root@mysqldb 16:34: [(none)]>
root@mysqldb 16:34: [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
root@mysqldb 16:34: [(none)]>
root@mysqldb 16:34: [(none)]> show slave statuS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.56.2
Master_User: repl
Master_Port: 9506
Connect_Retry: 60
Master_Log_File: mybinlog.006092
Read_Master_Log_Pos: 726558126
Relay_Log_File: TEST-DB-2-relay-bin.009546
Relay_Log_Pos: 640919944
Relay_Master_Log_File: mybinlog.006090
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 640919811
Relay_Log_Space: 11867704299
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 6329
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 9506002
Master_UUID: 4efcbd9f-c1e2-11eb-b710-0050569e2112
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 4efcbd9f-c1e2-11eb-b710-0050569e2112:1217191452-1442167357
Executed_Gtid_Set: 24fce019-edde-11ec-8f9e-0050569e562f:1-26407,
4efcbd9f-c1e2-11eb-b710-0050569e2112:1-1441891442,
54c6f166-c1e4-11eb-b023-0050569e562f:1
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
root@mysqldb 16:34: [(none)]>