没有MGR环境,只是学学别人经验。原文地址:https://www.percona.com/blog/effective-strategies-for-recovering-mysql-group-replication-from-failures/
组复制是一种容错/高可用复制拓扑结构,可确保在主节点宕机时,由其他候选成员或辅助成员之一接管,从而使写入和读取操作不间断地继续进行。
不过,在某些情况下,由于故障、网络分区或数据库崩溃,组内成员资格可能会被破坏,或者我们最终会发现一些孤立的成员。在这种情况下,我们必须执行部分或全部恢复操作,使整个拓扑结构再次处于活动状态。
假设有下面的情况: 当主节点 [Node1] 正在进行读/写操作时,所有组节点都宕机了。
1)下面是使用 dbdeployer 工具设置的组复制拓扑。
node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | | group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1 | 23638 | ONLINE | SECONDARY | 8.0.36 | XCom | | group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1 | 23639 | ONLINE | SECONDARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 3 rows in set (0.01 sec)
2)在主节点(127.0.0.1:23637)上运行负载
sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql-password=Sbtest@2022 --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=23637 --tables=10 --table-size=10000 --create_secondary=off --threads=50 --time=0 --events=0 --report-interval=1 /opt/homebrew/Cellar/sysbench/1.0.20_6/share/sysbench/oltp_read_write.lua run
运行结果:
[ 1s ] thds: 50 tps: 1726.75 qps: 35178.95 (r/w/o: 24710.96/6964.73/3503.26) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 50 tps: 2528.19 qps: 50605.04 (r/w/o: 35405.76/10142.90/5056.39) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 50 tps: 2316.05 qps: 46257.26 (r/w/o: 32348.96/9290.15/4618.15) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 ...
3)在这里,我们特意停止了所有的组节点。
./stop_all # executing 'stop' on /Users/aniljoshi/sandboxes/group_msb_8_0_36 executing 'stop' on node3 stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node3 executing 'stop' on node2 stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2 executing 'stop' on node1 stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1
至此,群集已完全瘫痪,所有群组成员都已停止工作。现在,我们将再次尝试恢复群集。
引导/恢复群集节点
4) 首先,让我们启动每个节点。由于我使用的是 dbdeployer,我可以通过下面的一些内部命令/脚本来管理(启动/停止)或做其他事情。
shell> ./start_all # executing 'start' on /Users/aniljoshi/sandboxes/group_msb_8_0_36 executing "start" on node 1 ... sandbox server started executing "start" on node 2 .. sandbox server started executing "start" on node 3 .. sandbox server started
5) 连接到每个节点,验证当前状态。
node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | | | NULL | OFFLINE | | | | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.00 sec) node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | | | NULL | OFFLINE | | | | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.01 sec) node3 [localhost:23639] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | | | NULL | OFFLINE | | | | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.01 sec)
此时,所有的节点都是offline状态。
6) 我们可以通过引导群组中的一个节点来恢复群集,然后再启动其他节点后,它们将通过分布式恢复过程加入。但在此之前,了解每个节点上已认证/已提交事务的状态非常重要,这样我们就不会启动错误的节点。启动错误的节点会导致极大的数据丢失!
注意:在我们的例子中,“received_transaction_set ”为空,因为所有节点都重新启动了。但是,如果节点没有重启过,只有组复制因网络中断或其他事件而损坏,那么我们也可以观察这部分的值。
ode1 [localhost:23637] {root} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED -> ; +-----------------------------------------------+ | @@GLOBAL.GTID_EXECUTED | +-----------------------------------------------+ | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148034 | +-----------------------------------------------+ 1 row in set (0.00 sec) node1 [localhost:23637] {root} ((none)) > SELECT received_transaction_set FROM -> performance_schema.replication_connection_status WHERE -> channel_name="group_replication_applier"; +--------------------------+ | received_transaction_set | +--------------------------+ | | +--------------------------+ 1 row in set (0.01 sec) node2 [localhost:23638] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED; +----------------------------------------------+ | @@GLOBAL.GTID_EXECUTED | +----------------------------------------------+ | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-96208 | +----------------------------------------------+ 1 row in set (0.00 sec) node2 [localhost:23638] {msandbox} ((none)) > SELECT received_transaction_set FROM -> performance_schema.replication_connection_status WHERE -> channel_name="group_replication_applier"; +--------------------------+ | received_transaction_set | +--------------------------+ | | +--------------------------+ 1 row in set (0.01 sec) node3 [localhost:23639] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED; +----------------------------------------------+ | @@GLOBAL.GTID_EXECUTED | +----------------------------------------------+ | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-76208 | +----------------------------------------------+ 1 row in set (0.00 sec) node3 [localhost:23639] {msandbox} ((none)) > SELECT received_transaction_set FROM -> performance_schema.replication_connection_status WHERE -> channel_name="group_replication_applier"; +--------------------------+ | received_transaction_set | +--------------------------+ | | +--------------------------+ 1 row in set (0.00 sec)
从这里的 gtid_executed 信息可以看出,Node1(“00023636-bbbb-cccc-ddd-eeeeeeeeeeee:1-148034”) 拥有最新的 GTID ,因此我们应该引导这个节点。
请注意:收集完所有组员的事务集后,请对它们进行比较,找出哪个组员的事务集最大,包括已执行事务(gtid_executed)和已认证事务(在 group_replication_applier 通道上)。可以通过查看 GTID 手动进行比较,也可以使用存储函数(GTID_SUBSET 和 GTID_SUBSTRACT)比较 GTID 集。
mysql> SELECT @@GTID_SUBSET(); mysql> SELECT @@GTID_SUBSTRACT();
7) 让我们开始 Node1 的引导过程。
mysql> SET GLOBAL group_replication_bootstrap_group=ON; mysql> START GROUP_REPLICATION; mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
现在,如果我们检查 Node1 的状态,就会发现该成员已在线。
node1 [localhost:23637] {msandbox} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.00 sec)
8) 现在,我们只需启动其他节点的组复制即可。如果二进制日志已不存在于任何可用的组节点上,则根据源上可用的二进制日志的状态转移,或通过克隆插件对数据进行完全克隆,通过分布式恢复机制加入节点。
node2 [localhost:23638] {msandbox} ((none)) > start group_replication; node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | | group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1 | 23638 | ONLINE | SECONDARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 2 rows in set (0.00 sec) node3 [localhost:23639] {root} ((none)) > start group_replication; Query OK, 0 rows affected (1.36 sec) node3 [localhost:23639] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | | group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1 | 23638 | ONLINE | SECONDARY | 8.0.36 | XCom | | group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1 | 23639 | ONLINE | SECONDARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 3 rows in set (0.00 sec)
注意:如果二进制日志以某种方式被rotated或清除,那么我们需要从备份中恢复,或者我们可以设置克隆插件,它将自动执行完整的快照过程。
此外,我们还可以通过执行以下命令来验证节点是否仍在进行分布式恢复。当恢复完成且节点显示 “SECONDARY”状态时,复制进程 [Slave_IO_Running/Slave_SQL_Running] 将停止。
node2 [localhost:23638] {root} ((none)) > SHOW SLAVE STATUS FOR CHANNEL 'group_replication_recovery'G;
到这个阶段,我们已经有了一个完全运行的集群。
从备份中恢复
在极少数情况下,当数据完全损坏、想要进行部分恢复或添加新节点时,我们可以直接执行备份/恢复活动。
让我们试着用下面的场景来理解一下。在此,我们将尝试使用物理备份 [Percona XtraBackup] 恢复群集。
1) 为演示目的,我们从群组中的一个节点进行了备份。
shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/
备份结果:
... 2024-08-03T19:54:50.102188+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266657170) to (266657878) was copied. 2024-08-03T19:54:50.329784+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!
2) 为了让备份有用,我们也要做好 prepare。
shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/
输出结果:
... 2024-08-03T19:58:00.432140+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266658326 2024-08-03T19:58:00.434701+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!
现在,考虑到我们要使用这些数据恢复群集。
3) 首先,我们将清理目标成员 [Node1] 的数据目录。
shell> rm -rf /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/
4) 然后,我们将把prepare好的数据复制到 Node1 数据目录下。
shell> cp -aR /Users/aniljoshi/backup/ /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/
5) 引导节点 [Node1]
mysql> SET GLOBAL group_replication_bootstrap_group=ON; mysql> START GROUP_REPLICATION; mysql> SET GLOBAL group_replication_bootstrap_group=OFF; node1 [localhost:23637] {root} (performance_schema) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.00 sec)
其他节点可以通过分布式恢复/克隆流程加入(如果已经设置);不过,这里我们将演示如何使用备份执行还原/恢复。
6) 让我们从 Node1 获取一个新的备份,并记住 GTID 执行信息。
shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/Primary/
备份结果:
... 2024-08-03T20:16:36.182978+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266723771) to (266724479) was copied. 2024-08-03T20:16:36.408616+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!
GTID的详细信息:
shell> cat /Users/aniljoshi/backup/Primary/xtrabackup_binlog_info mysql-bin.000006 193 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040
7) prepare备份数据
shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/Primary/
输出结果:
... 2024-08-03T20:17:47.817955+05:30 0 [Note] [MY-013072] [InnoDB] Starting shutdown... 2024-08-03T20:17:47.822229+05:30 0 [Note] [MY-013084] [InnoDB] Log background threads are being closed... 2024-08-03T20:17:47.865162+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266724886 2024-08-03T20:17:47.867836+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!
8) 清理 Node2 数据目录,并将prepare好的备份复制到 Node2 上。
shell> cp -aR /Users/aniljoshi/backup/Primary/ /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2/data/
9) 重启服务后,我们就可以验证是否应用了 gtid_purged。
node2 [localhost:23638] {root} ((none)) > show global variables like '%gtid_purged%'; +---------------+-----------------------------------------------+ | Variable_name | Value | +---------------+-----------------------------------------------+ | gtid_purged | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 | +---------------+-----------------------------------------------+ 1 row in set (0.01 sec)
否则,我们可以使用下面的命令设置 gtid_purged。
MySQL> set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 ";
10) 最后,我们可以建立复制并启动组复制进程。
node2 [localhost:23638] {root} ((none)) > change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery'; node2 [localhost:23638] {root} ((none)) > start group_replication; Query OK, 0 rows affected (1.97 sec) node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | | group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1 | 23638 | ONLINE | SECONDARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 2 rows in set (0.00 sec)
节点 [Node3] 后续也可执行类似步骤。此外,除了 Percona XtraBackup ,我们还可以使用一些逻辑备份/恢复方式,如 [mydumper/mysqldump/MySQL shell 等] 来执行恢复并建立组复制进程。我们在此不对其进行讨论;不过,其过程通常与我们在任何异步复制设置中所做的相同。一旦以预期的 gtid 坐标完成恢复,我们就可以执行下面的命令。
MySQL> change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery'; MySQL> start group_replication;
此外,自 MySQL 8.0.17 起,克隆插件可用来在组复制中进行分布式恢复的远程克隆操作。这需要执行一些额外的配置和步骤。
通过克隆恢复
在 Donor 节点[Node1]上执行:
1) 动态安装插件
node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so'; Query OK, 0 rows affected (0.16 sec)
2) 在数据库中持久化变更
[mysqld] plugin-load-add=mysql_clone.so clone=FORCE_PLUS_PERMANENT
3) 授予复制用户"rsandbox" BACKUP_ADMIN 权限,由其负责分布式恢复流程
node1 [localhost:23637] {root} ((none)) > show grants for rsandbox@'%'; +---------------------------------------------+ | Grants for rsandbox@% | +---------------------------------------------+ | GRANT USAGE ON *.* TO `rsandbox`@`%` | | GRANT BACKUP_ADMIN ON *.* TO `rsandbox`@`%` | | GRANT `R_REPLICATION`@`%` TO `rsandbox`@`%` | +---------------------------------------------+ 3 rows in set (0.00 sec)
4) 验证插件状态
node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone'; +-------------+---------------+ | plugin_name | plugin_status | +-------------+---------------+ | clone | ACTIVE | +-------------+---------------+ 1 row in set (0.00 sec)
在Recipient节点 [Node3] 上执行:
1) 动态安装插件
node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so'; Query OK, 0 rows affected (0.16 sec)
2) 在数据库中持久化变更
[mysqld] plugin-load-add=mysql_clone.so clone=FORCE_PLUS_PERMANENT
3) 验证插件状态
node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone'; +-------------+---------------+ | plugin_name | plugin_status | +-------------+---------------+ | clone | ACTIVE | +-------------+---------------+ 1 row in set (0.00 sec)
4) 在所有相关/群组节点上定义克隆分布式恢复的阈值。节点将使用增量数据同步,除非事务间隙超过下面的阈值,在这种情况下将运行数据克隆过程。在本次测试中,我们使用了一个很小的值,尽管默认值很大。
node3 [localhost:23637] {root} ((none)) > set global group_replication_clone_threshold = 10; Query OK, 0 rows affected (0.00 sec)
5) 最后,启动 Node3。在 MySQL 日志中,我们可以看到通过克隆插件进行的分布式恢复已经开始。该过程将替换或克隆 Donor/Source [Node1] 的整个数据目录。
2024-08-03T18:21:04.039835Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Cloning from a remote group donor.' 2024-08-03T18:21:03.033878Z 0 [Warning] [MY-013469] [Repl] Plugin group_replication reported: 'This member will start distributed recovery using clone. It is due to the number of missing transactions being higher than the configured threshold of 10.' 2024-08-03T18:21:04.150730Z 132 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started ... /Users/aniljoshi/opt/mysql/8.0.36/bin/mysqld: Shutdown complete (mysqld 8.0.36) MySQL Community Server - GPL. 2024-08-03T18:21:09.6NZ mysqld_safe mysqld restarted
6) 接下来,我们可以使用下面的文件和命令跟踪 GTID 的执行和坐标信息。
shell> cat #view_status 2 1 1722709264045254 1722709271116851 127.0.0.1:23637 0 ./mysql-bin.000006 4836 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056
或者
node3 [localhost:23639] {root} ((none)) > select * from performance_schema.clone_statusG; *************************** 1. row *************************** ID: 1 PID: 0 STATE: Completed BEGIN_TIME: 2024-08-03 23:51:04.045 END_TIME: 2024-08-03 23:51:11.117 SOURCE: 127.0.0.1:23637 DESTINATION: LOCAL INSTANCE ERROR_NO: 0 ERROR_MESSAGE: BINLOG_FILE: mysql-bin.000006 BINLOG_POSITION: 4836 GTID_EXECUTED: 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056 1 row in set (0.00 sec) node3 [localhost:23639] {root} ((none)) > show variables like 'gtid_purged'; +---------------+-----------------------------------------------+ | Variable_name | Value | +---------------+-----------------------------------------------+ | gtid_purged | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056 | +---------------+-----------------------------------------------+ 1 row in set (0.01 sec)
否则,我们可以使用下面的命令来设置 gtid_purged,方法是传递上面克隆状态中的gtid。
node3 [localhost:23639] {root} ((none)) > set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-128056 ";
7) 运行 CHANGE REPLICATION 命令并启动组复制。
node3 [localhost:23639] {root} ((none)) > CHANGE REPLICATION SOURCE TO SOURCE_USER='rsandbox', SOURCE_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'; Query OK, 0 rows affected, 2 warnings (0.01 sec) node3 [localhost:23639] {root} ((none)) > start group_replication; Query OK, 0 rows affected (10.69 sec)
8) 最后,我们可以看到 Node3 显示在组复制拓扑中。
node3 [localhost:23639] {root} ((none)) > select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1 | 23639 | ONLINE | SECONDARY | 8.0.36 | XCom | | group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1 | 23637 | ONLINE | PRIMARY | 8.0.36 | XCom | | group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1 | 23638 | ONLINE | SECONDARY | 8.0.36 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 3 rows in set (0.01 sec)
标签:none,set,group,replication,MEMBER,故障,复制,MySQL,localhost From: https://www.cnblogs.com/abclife/p/18362836