案例说明:
主库主机系统重启,触发failover切换,原主库系统启动后,数据库服务被启动,出现“双主”,使用“repmgr node rejoin --force-rewind”恢复集群。通过本案例了解,集群“双主”产生的原因及解决方案,并熟悉sys_rewind在集群恢复中的应用。
数据库版本:
KingbaseES V8R6
一、集群架构
如下所示,集群原节点状态:
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 17 | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 17 | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、主库系统重启
[root@node201 ~]# reboot
三、主库重启后集群状态
如下所示,原主库主机系统重启后,数据库服务自动启动,导致出现“双主”:
1、failover后集群节点状态
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
[WARNING] following issues were detected
- unable to connect to node "node1" (ID: 1)
2、查看节点上timeline变化
如下所示,新主库触发promote后,timeline为(18=1*16^1 + 2 * 16^0),原主库的timeline为(17):
新主库:
[kingbase@node202 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 10:57 0000001100000000000000EC.partial
-rw------- 1 kingbase kingbase 694 Jan 26 09:41 00000011.history
-rw------- 1 kingbase kingbase 16M Jan 31 11:08 0000001200000000000000EC
-rw------- 1 kingbase kingbase 738 Jan 31 11:07 00000012.history
原主库:
[kingbase@node201 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000EC
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000ED
-rw------- 1 kingbase kingbase 694 Jan 3 16:27 00000011.history
drwx------ 2 kingbase kingbase 16K Jan 31 11:06 archive_status
四、模拟“双主”故障
1、触发failover切换
# 集群参数配置
[kingbase@node201 bin]$ cat ../etc/repmgr.conf |grep -E 'recovery|failover'
failover='automatic'
recovery='manual'
# 关闭主库数据库服务
[kingbase@node201 bin]$ ./sys_ctl stop -D ../data/
主库数据库服务被关闭后,将触发集群主备failover切换,在参数recovery=manual情况下,原主库将需要人工参与恢复为备库加入集群。
2、误启动原主库数据库服务
如下所示,如果需要人工将原主库恢复为新的备库加入到集群,需要在原主库data目录下创建standby.signal文件后,再启动数据库服务,数据库服务将启动到备库状态,如果在没有创建standby.signal标识文件下,直接启动原主库数据库服务,将直接进入主节点模式。
[kingbase@node201 bin]$ ./sys_ctl start -D ../data/
2、查看数据库服务进程
如下所示,原主库启动后,数据库服务进程启动为主库,集群出现“双主”。
[kingbase@node201 bin]$ ps -ef |grep kingbase
kingbase 4137 1 0 11:14 ? 00:00:00 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/kingbase -D ../data
kingbase 4138 4137 0 11:14 ? 00:00:00 kingbase: logger
kingbase 4140 4137 0 11:14 ? 00:00:00 kingbase: checkpointer
kingbase 4141 4137 0 11:14 ? 00:00:00 kingbase: background writer
kingbase 4142 4137 0 11:14 ? 00:00:00 kingbase: walwriter
kingbase 4143 4137 0 11:14 ? 00:00:00 kingbase: autovacuum launcher
kingbase 4144 4137 0 11:14 ? 00:00:00 kingbase: archiver last was 0000001100000000000000EC
kingbase 4145 4137 0 11:14 ? 00:00:00 kingbase: stats collector
kingbase 4146 4137 0 11:14 ? 00:00:00 kingbase: kwr collector
kingbase 4147 4137 0 11:14 ? 00:00:00 kingbase: ksh writer
kingbase 4148 4137 0 11:14 ? 00:00:00 kingbase: ksh collector
kingbase 4149 4137 0 11:14 ? 00:00:00 kingbase: logical replication launcher
# 数据库状态
test=# select sys_is_in_recovery();
sys_is_in_recovery()
----------------------
f
(1 row)
3、查看数据库日志
[kingbase@node201 sys_log]$ tail -1000 kingbase-2024-01-31_111429.csv
2024-01-31 11:14:29.129 CST,,,4137,,65b9bb15.1029,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2024-01-31 11:14:29.149 CST,,,4139,,65b9bb15.102b,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system was shut down at 2024-01-31 11:06:32 CST",,,,,,,,,""
2024-01-31 11:14:29.164 CST,,,4137,,65b9bb15.1029,2,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
2024-01-31 11:14:31.775 CST,"esrep","esrep",4156,"192.168.1.201:32776",65b9bb17.103c,1,"ALTER SYSTEM",2024-01-31 11:14:31 CST,4/21,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = ''",,,"internal_rwcmgr"
2024-01-31 11:14:31.783 CST,,,4137,,65b9bb15.1029,3,,2024-01-31 11:14:29 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:14:31.784 CST,,,4137,,65b9bb15.1029,4,,2024-01-31 11:14:29 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to """"",,,,,,,,,""
4、集群节点状态
如下所示,集群出现“双主”状态:
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
[WARNING] following issues were detected
- unable to connect to node "node1" (ID: 1)
五、恢复集群
如下所示,将原主库通过node rejoin方式作为备库加入到集群,以下为日志信息:
1、执行‘repmgr node rejoin’
[kingbase@node201 bin]$ ./repmgr node rejoin -h 192.168.1.202 -U esrep -d esrep --force-rewind
......
[NOTICE] rejoin target is node "node2" (ID: 2)
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=internal_rwcmgr options=-csearch_path="
[DEBUG] local tli: 17; local_xlogpos: 0/EE000028; follow_target_history->tli: 17; follow_target_history->end: 0/EC000FA0
[NOTICE] sys_rewind execution required for this node to attach to rejoin target node 2
[DETAIL] rejoin target server's timeline 18 forked off current database system timeline 17 before current recovery point 0/EE000028
[NOTICE] executing sys_rewind
[DETAIL] sys_rewind command is "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' --source-server='host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"
sys_rewind: servers diverged at WAL location 0/EC000FA0 on timeline 17
sys_rewind: rewinding from last common checkpoint at 0/EB000088 on timeline 17
sys_rewind: find last common checkpoint start time from 2024-01-31 11:18:07.003968 CST to 2024-01-31 11:18:07.204522 CST, in "0.200554" seconds.
sys_rewind: collect the number of WAL files to be processed:3, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.320593 CST, cost "0.115839" seconds.
sys_rewind: read WAL in target server from 0/EB000088 to 0/EE0000A0 (endpoint 0/EE0000A0)
sys_rewind: read the local Wal file information, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.321418 CST, cost "0.000793" seconds.
sys_rewind: file replication start time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:07.827427 CST, cost "0.505974" seconds.
sys_rewind: update the control file: minRecoveryPoint is '0/EC013BE0', minRecoveryPointTLI is '18', and database state is 'in archive recovery'
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6C8/HAC8/kingbase/data/sys_replslot/repmgr_slot_2.rewind' and all the file/dir in it.
sys_rewind: rewind start wal location 0/EB000058 (file 0000001100000000000000EB), end wal location 0/EC013BE0 (file 0000001200000000000000EC). wal data increment:16462(kB). time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:08.543478 CST, in "1.539510" seconds.
sys_rewind: Done!
[NOTICE] 0 files copied to /home/kingbase/cluster/R6C8/HAC8/kingbase/data
[INFO] creating replication slot as user "esrep"
[DEBUG] CreateSlotBySQL(): creating slot "repmgr_slot_1" on upstream
[NOTICE] setting node 1's upstream to node 2
[WARNING] unable to ping "host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2024-01-31 11:18:08.556667
[NOTICE] starting server using "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' -l /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/logfile start"
[NOTICE] start server finish at 2024-01-31 11:18:09.265062
[NOTICE] NODE REJOIN successful
[DETAIL] node 1 is now attached to node 2
2、rewind对比原主库和新主库history文件
如下所示,sys_rewind通过对比源和目标库的history文件,查找节点之间的分叉点。
原主库:
[kingbase@node201 sys_wal]$ cat 00000011.history
1 0/180000A0 no recovery target specified
2 0/190000A0 no recovery target specified
3 0/250000A0 no recovery target specified
4 0/260000A0 no recovery target specified
5 0/AB0000A0 no recovery target specified
6 0/AD0000A0 no recovery target specified
7 0/B10000A0 no recovery target specified
8 0/B30000A0 no recovery target specified
9 0/B60000A0 no recovery target specified
10 0/B70000A0 no recovery target specified
11 0/BE0000A0 no recovery target specified
12 0/BF0000A0 no recovery target specified
13 0/C00000A0 no recovery target specified
14 0/C10000A0 no recovery target specified
15 0/C90000A0 no recovery target specified
16 0/CA0000A0 no recovery target specified
新主库:
[kingbase@node202 bin]$ cat ../data/sys_wal/00000012.history
1 0/180000A0 no recovery target specified
2 0/190000A0 no recovery target specified
3 0/250000A0 no recovery target specified
4 0/260000A0 no recovery target specified
5 0/AB0000A0 no recovery target specified
6 0/AD0000A0 no recovery target specified
7 0/B10000A0 no recovery target specified
8 0/B30000A0 no recovery target specified
9 0/B60000A0 no recovery target specified
10 0/B70000A0 no recovery target specified
11 0/BE0000A0 no recovery target specified
12 0/BF0000A0 no recovery target specified
13 0/C00000A0 no recovery target specified
14 0/C10000A0 no recovery target specified
15 0/C90000A0 no recovery target specified
16 0/CA0000A0 no recovery target specified
17 0/EC000FA0 no recovery target specified --history分叉点
源库和目标库的分叉点:
如下所示,在rewind过程中,源库和目标库之间通过history文件获取分叉点,然后从分叉点之前最近的checkpoint开始对wal日志执行recovery(必须保证从checkpoint开始,源库上的wal日志必须存在,否则将会因为缺失wal日志,rewind失败。):
3、查看新主库数据库日志
2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,1,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: IDENTIFY_SYSTEM",,,,,,,,,"node1"
2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,2,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: START_REPLICATION SLOT ""repmgr_slot_1"" 0/EC000000 TIMELINE 18",,,,,,,,,"node1"
2024-01-31 11:18:11.237 CST,"esrep","esrep",3552,"192.168.1.202:49745",65b9b688.de0,3,"ALTER SYSTEM",2024-01-31 10:55:04 CST,2/4414,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = 'ANY 1(node1)'",,,"internal_rwcmgr"
2024-01-31 11:18:11.239 CST,,,2823,,65b9b66f.b07,25,,2024-01-31 10:54:39 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:18:11.241 CST,,,2823,,65b9b66f.b07,26,,2024-01-31 10:54:39 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to ""ANY 1(node1)""",,,,,,,,,""
2024-01-31 11:18:11.402 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,3,"streaming 0/EC015300",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"standby ""node1"" is now a candidate for quorum synchronous standby",,,,,,,,,"node1"
4、集群恢复完成
[kingbase@node202 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 17 | 0 bytes | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | primary | * running | | default | 100 | 18 | | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
六、总结
sys_rewind检查源和目标集群的时间线历史以确定它们的分叉点,从分叉点之前最近的checkpoint位置开始解析WAL,解析出变动的数据页,然后将变动的数据页拷贝过来,并从分叉点最近的checkpoint开始应用wal日志,最终保证源库和目标库数据一致。
sys_rewind 的使用不限于故障转移,例如,可以提升备用服务器主库,运行一些写入事务,然后重新回滚再次成为备用服务器。