首页 > 其他分享 >kingbaseES V8R6集群运维案例之---配置priority防止failover切换案例

kingbaseES V8R6集群运维案例之---配置priority防止failover切换案例

时间:2024-03-29 18:44:57浏览次数:23  
标签:03 01 V8R6 10 priority 案例 12 2021 keepalives

案例说明:

在一主多备的架构中,需要配置一台备库在主备切换时,不能选举为主库。对于repmgr主备切换主库的选择算法如下:

Tips:
Repmgr选举候选备节点会以以下顺序选举:LSN ---->Priority----> Node_ID。
系统会先选举一个LSN比较大者作为候选备节点;如LSN一样,会根据Priority优先级进行比较,该优先级是在配置文件中进行参数配置;如优先级也一样,会比较节点的Node ID,小者会优先选举。在选举主机过程中,权重高的备机具有升主的更高优先级,如果权重为0,则该备机永远不会升级为主机。

repmgr集群,默认主备库优先级都是100:

对于repmgr.conf中的参数priority:
priority=60

权重,在选举主机过程中,权重高的备机具有升主的更高优先级,如果权重为0,则该备机永远不会升级为主机

修改priority参数:

适用版本:
KingbaseES V8R6

测试案例:

1、切换前集群节点状态

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | primary | * running |          | default  | 100      | 10       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running | node248  | default  | 0        | 10       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

** 2、修改repmgr.conf中priority参数**

 [kingbase@node1 etc]$ vi repmgr.conf 
on_bmj=off
.......
priority=0

重新注册备库:

[kingbase@node3 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node243" (ID: 3)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node243" (ID: 3) successfully registered

在集群节点状态信息中,standby的priority显示为0:

3、重启主库系统做failover切换测试
[root@node1 ~]#reboot

4、在备库查看集群节点状态

=== 如下所示,集群未发生切换,只是显示主库不能连接===

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status        | Upstream  | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+---------------+-----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | primary | ? unreachable |           | default  | 100      | ?        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running     | ? node248 | default  | 0        | 10       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - unable to connect to node "node248" (ID: 1)
  - node "node248" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "node243" (ID: 3)'s upstream node "node248" (ID: 1)
  - unable to determine if node "node243" (ID: 3) is attached to its upstream node "node248" (ID: 1)

5、主库系统重启后,启动数据库服务

[kingbase@node1 bin]$ ./sys_ctl start -D ../data
waiting for server to start....2021-03-01 12:03:41.870 CST [5257] LOG:  sepapower extension initialized
2021-03-01 12:03:41.893 CST [5257] LOG:  starting KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2021-03-01 12:03:41.893 CST [5257] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2021-03-01 12:03:41.893 CST [5257] LOG:  listening on IPv6 address "::", port 54321
2021-03-01 12:03:42.030 CST [5257] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2021-03-01 12:03:42.454 CST [5257] LOG:  redirecting log output to logging collector process
2021-03-01 12:03:42.454 CST [5257] HINT:  Future log output will appear in directory "sys_log".
. done
server started

6、集群节点状态恢复正常

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | primary | * running |          | default  | 100      | 10       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running | node248  | default  | 0        | 10       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

7、查看备库repmgr.log日志

=== 提示备库priority=0 ,不能提升为主库===

[2021-03-01 12:43:30] [NOTICE] starting monitoring of node "node243" (ID: 3)
[2021-03-01 12:43:30] [INFO] "connection_check_type" set to "ping"
[2021-03-01 12:43:30] [INFO] monitoring connection to upstream node "node248" (ID: 1)
[2021-03-01 12:43:30] [NOTICE] wal catched_up state changed to 1
[2021-03-01 12:45:00] [WARNING] unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
[2021-03-01 12:45:00] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-03-01 12:45:00] [WARNING] unable to connect to upstream node "node248" (ID: 1)
[2021-03-01 12:45:00] [INFO] sleeping 5 seconds until next reconnection attempt
[2021-03-01 12:45:05] [INFO] checking state of node 1, 1 of 3 attempts
[2021-03-01 12:45:15] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2021-03-01 12:45:15] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-03-01 12:45:15] [INFO] sleeping 5 seconds until next reconnection attempt
[2021-03-01 12:45:20] [INFO] checking state of node 1, 2 of 3 attempts
[2021-03-01 12:45:30] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2021-03-01 12:45:30] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-03-01 12:45:30] [INFO] sleeping 5 seconds until next reconnection attempt
[2021-03-01 12:45:35] [INFO] checking state of node 1, 3 of 3 attempts
[2021-03-01 12:45:45] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2021-03-01 12:45:45] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-03-01 12:45:45] [WARNING] unable to reconnect to node 1 after 3 attempts
[2021-03-01 12:45:45] [NOTICE] setting "wal_retrieve_retry_interval" to 86405000 milliseconds
[2021-03-01 12:45:45] [INFO] sleeping 5 seconds
[2021-03-01 12:45:50] [NOTICE] killing WAL receiver with PID 14273
[2021-03-01 12:45:51] [INFO] WAL receiver with pid 14273 killed
[2021-03-01 12:45:52] [NOTICE] WAL receiver disconnected on all sibling nodes
[2021-03-01 12:45:52] [INFO] WAL receiver disconnected on all 0 sibling nodes
[2021-03-01 12:45:52] [NOTICE] this node's priority is 0 so will not be considered as an automatic promotion candidate
[2021-03-01 12:45:52] [NOTICE] setting "wal_retrieve_retry_interval" to 5000 ms
[2021-03-01 12:45:52] [INFO] follower node awaiting notification from a candidate node

8、修改备库优先级为默认优先级

1)注释或删除priority参数配置

2)重新注册standby

[kingbase@node3 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node243" (ID: 3)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node243" (ID: 3) successfully registered

3)查看集群节点状态

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | primary | * running |          | default  | 100      | 10       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running | node248  | default  | 100      | 10       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

标签:03,01,V8R6,10,priority,案例,12,2021,keepalives
From: https://www.cnblogs.com/kingbase/p/17923261.html

相关文章

  • KingbaseES V8R6集群运维案例之---级联备库upstream节点故障
    KingbaseESV8R6集群运维案例之---级联备库upstream节点故障案例说明:在KingbaseESV8R6集群,构建级联备库后,在其upstream的节点故障后,级联备库如何处理?适用版本:KingbaseESV8R6集群架构:案例一:一、配置集群的recovery参数(allnodes)Tips:关闭备库的aut-recovery机制......
  • 无人车网关案例:记SV900无人清扫车网关的现场应用
    ​随着无人驾驶技术的不断发展,无人车辆已经开始广泛应用于物流配送、环境保洁、巡逻监控等众多领域,助力城市运营更加高效智能。而要实现无人车辆的安全可靠运行,关键在于选择一款性能卓越的车载网络通信系统.在这一背景下,星创易联推出了SV900无人车网关系列产品。它集5G/4G网络......
  • Vue:实现子组件和父组件数据的双向绑定案例和sync修饰符简化
    实现子组件和父组件数据的双向绑定(实现App.vue中的selectId和子组件选中的数据进行双向绑定)代码案例BaseSelect.vue<template><div><select:value="selectId"@change="selectCity"><optionvalue="101">北京</option><op......
  • C/S结构与B/S结构的介绍,优缺点,区别,案例
    C/S结构与B/S结构的介绍,优缺点,区别,案例C/S结构与B/S结构是两种常见的软件架构模式,它们在网络应用和数据管理方面各有特色。以下是关于这两种结构的详细介绍、优缺点以及区别,并附带一些案例。一、C/S结构(客户端/服务器结构)C/S结构是一种软件系统体系结构,它将业务逻辑......
  • Java案例:考试奖励(利用if..else if实现)
    目录1:题目2:分析3:代码展示1:题目小明快期末考试了,小明爸爸对他说,会根据他不同的考试成绩,送他不同的礼物,假如你可以控制小明的得分,请用程序实现小明到底该获得什么样的礼物,并在控制台输出。2:分析1.键盘录入考试成绩2.由于奖励种类比较多.属于多......
  • Golang操作kafka遇到网络问题重试的案例
    草稿0、实际中会遇到网络抖动会导致消费者有一小段时间与kafka连接遇到问题~0、如何模拟网络问题?本地跑多个kafka实例直接关掉其中一个kafka服务??怎么模拟断网??1、kafka-go与sarama都演示一下2、一个consumer消费一个topic的例子;模拟网络问题可以把kafka服务关了~观察一下再开启k......
  • 杀死端口案例
    杀死8080端口号IT蓝月于2018-07-0917:44:35发布阅读量1w收藏6点赞数1版权在进行网络编程的时候,我们经常会遇到一系列问题,其中就有端口号被占用的问题,那我们该如何处理这个问题么? 其实简单。仅需两个步骤:1.找到占用8080端口号的进程2.杀掉该进程 实现方法(windo......
  • Scala第十三章节(作为值的函数及匿名函数、柯里化、闭包及控制抽象以及计算器案例)
    章节目标掌握作为值的函数及匿名函数的用法了解柯里化的用法掌握闭包及控制抽象的用法掌握计算器案例1.高阶函数介绍Scala混合了面向对象和函数式的特性,在函数式编程语言中,函数是“头等公民”,它和Int、String、Class等其他类型处于同等的地位,可以像其他类型的变量一样......
  • 分析方法案例
    https://boardmix.cn/app/share/CAE.CPCQwQwgASoQYfH6G5uMEQpJtTjgmX-1qTAFQAE/vvi2eu,1.正太分布检验拿到数据之后,先看数据是否满足正态分布2.统计分析描述性分析:集中趋势的度量:这些度量描述了数据的中心点或典型值,包括均值(平均数)、中位数和众数。均值是所有观......
  • 金融案例:构建高效统一的需求登记与管理方案
    在金融行业数字化转型背景下,银行等金融机构面临着业务模式创新与数据应用的深度融合。业务上所需要的不再是单纯的数据,而是数据背后映射的业务趋势洞察,只有和业务相结合转化为业务度量指标,经过数据分析处理呈现为报表进行展示,才能真正体现它们的价值。但在需求转化为指标的过程中......