首页 > 其他分享 >KingbaseES V8R6集群运维案例之---同城双中心switchover案例

KingbaseES V8R6集群运维案例之---同城双中心switchover案例

时间:2024-04-03 17:15:24浏览次数:23  
标签:switchover node V8R6 esrep 案例 timeout node1 keepalives ID

      KingbaseES V8R6集群运维案例之---同城双中心switchover案例

案例说明:
在同城双中心执行switchover在线切换后,双中心架构保持不变。
适用版本:
KingbaseES V8R6

集群架构:

一、切换前集群节点状态
如下所示,切换前集群的主库(Primary)位于同城灾备中心,现在执行switchover在线切换,将主库切换到生产中的node1节点。

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location       | Priority | Timeline | LSN_Lag | Connection string                                                                                                               
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node3    | production     | 100      | 4        | 0 bytes | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | production     | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | primary | * running |          | local_disaster | 100      | 4        |         | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

二、repmgr.conf配置

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep failover
failover='automatic'
failover_need_server_alive='none'

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
sync_in_same_location='0'
synchronous='sync'

三、执行switchover在线切换

如下所示,switchover切换过程:

[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
[WARNING] following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=0
"
[NOTICE] attempting to pause repmgrd on 3 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[INFO] pausing repmgrd on node "node3" (ID 3)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby
[NOTICE] stopping current primary node "node3" (ID: 3)
[NOTICE] issuing CHECKPOINT on node "node3" (ID: 3)
[DETAIL] executing server command "/home/kingbase/cluster/tptc/rh6/kingbase/bin/sys_ctl  -D '/data/kingbase/tptc/rh6/data' -l /home/kingbase/cluster/tptc/rh6/kingbase/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/C000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/C0000A0
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node3" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[INFO] unpausing repmgrd on node "node3" (ID 3)
[INFO] unpause node "node3" (ID 3) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

四、切换后集群节点状态

1、节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location       | Priority | Timeline | LSN_Lag | Connection string                                                                                                               
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | production     | 100      | 5        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | production     | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | local_disaster | 100      | 4        | 0 bytes | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

如下图所示:集群主库已经切换为生产中心的node1,灾备中心备库的upstream和生产中心备库的upstream节点都是node1。

2、流复制状态

test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 7114 |    16385 | esrep   | node2            | 192.168.1.102 |                 |       38948 | 2023-06-28 10:32:49.507374+08 |              | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8  |           |  |            |             1 | sync       | 2023-06-28 10:35:47.016015+08
 8255 |    16385 | esrep   | node3            | 192.168.1.103 |                 |       27536 | 2023-06-28 10:34:25.418926+08 |              | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8  |           |  |            |             2 | potential  | 2023-06-28 10:35:49.245420+08
(2 rows)

---切换后,生产中心node1节点为主库,同中心备库node2,同城灾备中心node3节点连接到生产中心主库为异步备库。

五、总结
对于同城双中心的集群架构,支持switchover在线切换,切换前后,双中心的整体架构不发生变化。

标签:switchover,node,V8R6,esrep,案例,timeout,node1,keepalives,ID
From: https://www.cnblogs.com/kingbase/p/17736797.html

相关文章

  • PLSQL涉及对象类型能力域的一次代码改造案例
    文章概述本文通过某项目一次针对对象类型中一些不支持的功能项进行代码改造为契机,重新回顾和熟悉了对象类型继承,子父对象转换,函数重载等概念和应用,包括集合类型的一些编码应用场景。通过这个案例可以快速帮助我们熟悉和深刻对PSLQL对象类型和集合类型能力域的掌握。一,问题背景......
  • KingbaseESV8R6等待事件之LWLockBuffer_IO
    说明当并发会话尝试访问同一页面时,等待其他进程完成其输入/输出(I/O)操作时,就会发生LWLock:BufferIO事件。其目的是将同一页读取到共享缓冲区中。等待事件发生过程每个sharedbuffer都有一个与LWLock:BufferIO等待事件关联的I/O锁,每次都必须在共享缓冲池外检索块。此锁用于处理......
  • TensorIR_张量程序抽象案例研究
    !pipinstallapache-tvmimporttvmfromtvm.ir.moduleimportIRModulefromtvm.scriptimporttirasTimportnumpyasnp\(Y_{ij}=\sum_kA_{ik}B_{kj}\)\(C_{ij}=\mathbb{relu}(Y_{ij})=\mathbb{max}(Y_{ij},0)\)dtype="float32"a_......
  • 大数据架构案例(重点)
                      ......
  • 安全架构案例
                                ......
  • 单链表-案例-new
    <!DOCTYPEhtml><htmllang="en"><head>  <metacharset="UTF-8">  <metahttp-equiv="X-UA-Compatible"content="IE=edge">  <metaname="viewport"content="width=d......
  • 双端队列-案例-回文-new
    <!DOCTYPEhtml><htmllang="en"><head>  <metacharset="UTF-8">  <metahttp-equiv="X-UA-Compatible"content="IE=edge">  <metaname="viewport"content="width=d......
  • TPS上不去案例解析
    案例1.TPS原因上不去原因:磁盘满了分析:**磁盘的监控**iostat-xm5(以兆为单位5秒刷新一次) rs,ws每秒读写情况Await等待的时间Svctm磁盘服务时间百分比%util磁盘总占用率(sda盘%util超过了80%,那么sda盘到了瓶颈)解决:加硬件 2.TPS原因上不去原因:系统采用sprin......
  • 嵌入式案例
                      ......
  • 第十一篇【传奇开心果系列】Python自动化办公库技术点案例示例:深度解读Python自动化操
    传奇开心果博文系列系列博文目录Python自动化办公库技术点案例示例系列博文目录前言一、重要作用二、Python操作PDF文件转Word文档介绍三、提高效率示例代码四、保持一致性示例代码五、精确度与质量控制示例代码六、适应复杂需求示例代码七、可扩展性与与集成性示例代码......