KingbaseES V8R6集群运维案例之---同城双中心switchover案例
案例说明:
在同城双中心执行switchover在线切换后,双中心架构保持不变。
适用版本:
KingbaseES V8R6
集群架构:
一、切换前集群节点状态
如下所示,切换前集群的主库(Primary)位于同城灾备中心,现在执行switchover在线切换,将主库切换到生产中的node1节点。
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node3 | production | 100 | 4 | 0 bytes | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | production | 100 | 4 | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | primary | * running | | local_disaster | 100 | 4 | | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
二、repmgr.conf配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep failover
failover='automatic'
failover_need_server_alive='none'
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
sync_in_same_location='0'
synchronous='sync'
三、执行switchover在线切换
如下所示,switchover切换过程:
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
[WARNING] following problems with command line parameters detected:
database connection parameters not required when executing STANDBY SWITCHOVER
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=0
"
[NOTICE] attempting to pause repmgrd on 3 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[INFO] pausing repmgrd on node "node3" (ID 3)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby
[NOTICE] stopping current primary node "node3" (ID: 3)
[NOTICE] issuing CHECKPOINT on node "node3" (ID: 3)
[DETAIL] executing server command "/home/kingbase/cluster/tptc/rh6/kingbase/bin/sys_ctl -D '/data/kingbase/tptc/rh6/data' -l /home/kingbase/cluster/tptc/rh6/kingbase/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/C000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/C0000A0
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node3" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[INFO] unpausing repmgrd on node "node3" (ID 3)
[INFO] unpause node "node3" (ID 3) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully
四、切换后集群节点状态
1、节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | production | 100 | 5 | | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | production | 100 | 4 | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | local_disaster | 100 | 4 | 0 bytes | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
如下图所示:集群主库已经切换为生产中心的node1,灾备中心备库的upstream和生产中心备库的upstream节点都是node1。
2、流复制状态
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
7114 | 16385 | esrep | node2 | 192.168.1.102 | | 38948 | 2023-06-28 10:32:49.507374+08 | | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | | | | 1 | sync | 2023-06-28 10:35:47.016015+08
8255 | 16385 | esrep | node3 | 192.168.1.103 | | 27536 | 2023-06-28 10:34:25.418926+08 | | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | | | | 2 | potential | 2023-06-28 10:35:49.245420+08
(2 rows)
---切换后,生产中心node1节点为主库,同中心备库node2,同城灾备中心node3节点连接到生产中心主库为异步备库。
五、总结
对于同城双中心的集群架构,支持switchover在线切换,切换前后,双中心的整体架构不发生变化。