Red Hat Enterprise Linux 7.4 完全支持配置作为集群的第三方设备的独立仲裁设备。它的主要用途是允许集群保持比标准仲裁规则允许更多的节点故障。建议在具有偶数节点的集群中使用仲裁设备。对于双节点群集,使用仲裁设备可以更好地决定在脑裂情况下保留哪些节点。
在配置仲裁设备,您必须考虑以下内容。
建议您在与使用该仲裁设备的集群相同的站点中的不同的物理网络中运行仲裁设备。理想情况下,仲裁设备主机应该独立于主集群,或者至少位于一个独立的 PSU,而不要与 corosync 环或者环位于同一个网络网段。
您不能同时在集群中使用多个仲裁设备。
虽然您不能同时在集群中使用多个仲裁设备,但多个集群可能同时使用一个仲裁设备。每个使用这个仲裁设备的集群都可以使用不同的算法和仲裁选项,因为它们保存在集群节点本身。例如,单个仲裁设备可由一个具有破坏 ( fifty/fifty split)算法的集群和具有 lms (last man standing)算法的第二个群集使用。
不应在现有集群节点中运行仲裁设备。
系统环境:
[root@node201 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
系统架构:
[root@node203 ~]# cat /etc/hosts
192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203 qdevice
一、系统环境部署
1、集群节点部署
[root@node201 ~]# dnf install corosync-qdevice
[root@node201 ~]# rpm -qa |grep qdevice
corosync-qdevice-2.4.5-7.el7_9.2.x86_64
[root@node201 ~]# rpm -qa |grep pcs
pcs-0.9.169-3.el7.centos.3.x86_64
pcsc-lite-libs-1.8.8-8.el7.x86_64
[root@node201 ~]# rpm -qa |egrep 'pacemaker|corosync'
corosynclib-2.4.5-7.el7_9.2.x86_64
pacemaker-1.1.23-1.el7_9.1.x86_64
pacemaker-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-doc-1.1.23-1.el7_9.1.x86_64
corosync-qdevice-2.4.5-7.el7_9.2.x86_64
pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-cli-1.1.23-1.el7_9.1.x86_64
corosync-2.4.5-7.el7_9.2.x86_64
2、仲裁节点部署
[root@node203 ~]# dnf install pcs corosync-qnet
[root@node203 corosync]# yum install -y corosync-qdevice
# 启动pcs服务
[root@node203 ~]# systemctl start pcsd.service
[root@node203 ~]# systemctl status pcsd.service
[root@node203 ~]# systemctl enable pcsd.service
二、建立集群
1、建立用户认证
如下所示,在集群节点及qdevice节点建立哈cluster用户并设置密码:
[root@node201 ~]# id hacluster
uid=003(hacluster) gid=1004(haclient) groups=1004(haclient)
[root@node202 ~]# id hacluster
uid=5001(hacluster) gid=5010(haclient) groups=5010(haclient)
[root@node203 ~]# id hacluster
uid=5001(hacluster) gid=1004(haclient) groups=1004(haclient)
如下所示,在集群节点建立到qdevice节点的认证:
[root@node201 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized
[root@node202 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized
2、创建集群
如下所示,创建集群test_cluster:
[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success
[root@node201 pcs]# pcs cluster start --all
node201: Starting Cluster (corosync)...
node202: Starting Cluster (corosync)...
node203: Starting Cluster (corosync)...
node203: Starting Cluster (pacemaker)...
node202: Starting Cluster (pacemaker)...
node201: Starting Cluster (pacemaker)...
[root@node201 pcs]# pcs cluster status
Cluster Status:
Stack: unknown
Current DC: NONE
Last updated: Thu Aug 29 19:24:45 2024
Last change: Thu Aug 29 19:24:40 2024 by hacluster via crmd on node203
3 nodes configured
0 resource instances configured
PCSD Status:
node203: Online
node201: Online
node202: Online
[root@node201 pcs]# pcs cluster enable --all
node201: Cluster Enabled
node202: Cluster Enabled
node203: Cluster Enabled
三、配置仲裁设备
仲裁设备模型是 net,这是目前唯一支持的模型。net 模型支持以下算法:
- ffsplit :5-fifty split.这为拥有最多活跃节点的分区提供一个投票。
- lMS:le -man-standing.如果节点是集群中唯一可以看到 qnetd 服务器的节点,则它将返回一个投票。
1、配置并启动仲裁设备模型net
[root@node203 ~]# pcs qdevice setup model net --enable --start
Quorum device 'net' initialized
quorum device enabled
Starting quorum device...
quorum device started
2、查看仲裁设备状态
[root@node203 ~]# pcs qdevice status net --full
QNetd address: *:5403
TLS: Supported (client certificate required)
Connected clients: 0
Connected clusters: 0
Maximum send/receive size: 32768/32768 bytes
如下所示,将qdevice访问加入到防火墙:
[root@node203 ~]# firewall-cmd --permanent --add-service=high-availability
FirewallD is not running
[root@node203 ~]# firewall-cmd --add-service=high-availability
FirewallD is not running
3、集群节点添加仲裁
1)集群节点配置corosync.conf(所有集群节点)
[root@node202 corosync]# cat /etc/corosync/corosync.conf |grep -v ^#|grep -v ^$|grep -v '#'
totem {
version: 2
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 7
}
nodelist {
node { ring0_addr: node201
nodeid: 1
}
node { ring0_addr: node202
nodeid: 2
}
}
quorum配置说明:
quorum {
provider: corosync_votequorum # 启动了votequorum
expected_votes: 7 # 7表示,7个节点,quorum为4。如果设置了nodelist参数,expected_votes无效
wait_for_all: 1 # 值为1表示,当集群启动,集群quorum被挂起,直到所有节点在线并加入集群,这个参数是Corosync 2.0新增的。
last_man_standing: 1 # 为1表示,启用LMS特性。默认这个特性是关闭的,即值为0。
# 这个参数开启后,当集群的处于表决边缘(如expected_votes=7,而当前online nodes=4),处于表决边缘状态超过last_man_standing_window参数指定的时间,
# 则重新计算quorum,直到online nodes=2。如果想让online nodes能够等于1,必须启用auto_tie_breaker选项,生产环境不推荐。
last_man_standing_window: 10000 # 单位为毫秒。在一个或多个主机从集群中丢失后,重新计算quorum
2)重启corosync服务
[root@node201 corosync]# systemctl restart corosync
[root@node201 corosync]# systemctl status corosync
3)查看并添加集群节点仲裁配置
[root@node201 ~]# pcs quorum config
Options:
查看quorum状态:
[root@node202 corosync]# pcs quorum status
Quorum information
------------------
Date: Thu Aug 29 17:56:32 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 3232235978
Ring ID: -1062731319/85
Quorate: No
Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 3
Quorum: 4 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Qdevice Name
3232235977 1 NR node201
3232235978 1 NR node202 (local)
3232235979 1 NR node203
如下所示,集群节点添加仲裁设备,并指定算法ffsplit:
[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started
[root@node202 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Error: quorum device is already defined
4)查看添加quorum后的配置
[root@node201 pcs]# pcs quorum config
Options:
Device:
votes: 1
Model: net
algorithm: ffsplit
host: node203
[root@node201 pcs]# pcs quorum status
Quorum information
------------------
Date: Thu Aug 29 19:31:45 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 1
Ring ID: 1/98
Quorate: Yes
Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW node201 (local)
2 1 A,V,NMW node202
3 1 A,V,NMW node203
0 1 Qdevice
[root@node201 pcs]# pcs quorum device status
Qdevice information
-------------------
Model: Net
Node ID: 1
Configured node list:
0 Node ID = 1
1 Node ID = 2
2 Node ID = 3
Membership node list: 1, 2, 3
Qdevice-net information
----------------------
Cluster name: test_cluster
QNetd host: node203:5403
Algorithm: Fifty-Fifty split
Tie-breaker: Node with lowest node ID
State: Connected
5)在仲裁节点查看qdevice连接情况
[root@node203 corosync]# pcs qdevice status net --full
QNetd address: *:5403
TLS: Supported (client certificate required)
Connected clients: 3
Connected clusters: 1
Maximum send/receive size: 32768/32768 bytes
Cluster "test_cluster":
Algorithm: Fifty-Fifty split
Tie-breaker: Node with lowest node ID
Node ID 3:
Client address: ::ffff:192.168.1.203:45974
HB interval: 8000ms
Configured node list: 1, 2, 3
Ring ID: 1.62
Membership node list: 1, 2, 3
Heuristics: Undefined (membership: Undefined, regular: Undefined)
TLS active: Yes (client certificate verified)
Vote: ACK (ACK)
Node ID 1:
Client address: ::ffff:192.168.1.201:40657
HB interval: 8000ms
Configured node list: 1, 2, 3
Ring ID: 1.62
Membership node list: 1, 2, 3
Heuristics: Undefined (membership: Undefined, regular: Undefined)
TLS active: Yes (client certificate verified)
Vote: ACK (ACK)
Node ID 2:
Client address: ::ffff:192.168.1.202:35765
HB interval: 8000ms
Configured node list: 1, 2, 3
Ring ID: 1.62
Membership node list: 1, 2, 3
Heuristics: Undefined (membership: Undefined, regular: Undefined)
TLS active: Yes (client certificate verified)
Vote: No change (ACK)
四、管理仲裁设备
PCS 提供了在本地主机上管理仲裁设备服务(corosync-qnetd)的功能,如下例所示。请注意,这些命令仅影响 corosync-qnetd 服务。
[root@qdevice:~]# pcs qdevice start net
[root@qdevice:~]# pcs qdevice stop net
[root@qdevice:~]# pcs qdevice enable net
[root@qdevice:~]# pcs qdevice disable net
[root@qdevice:~]# pcs qdevice kill net
附件:配置错误案例
案例1、查看qdevice状态异常
如下所示,集群节点查看quorum状态时报错:
[root@node201 ~]# pcs quorum status
Error: Unable to get quorum status: Unable to start votequorum status tracking: CS_ERR_BAD_HANDLE
1)查看corosync.conf配置
2)启动corosync错误
[root@node202 ~]# systemctl restart corosync
Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xe" for details.
3)查看corosync日志
[root@node201 corosync]# tail -1000 /var/log/cluster/corosync.log
Aug 29 17:14:31 [324] node203 corosync notice [SERV ] Service engine loaded: corosync profile loading service [4]
Aug 29 17:14:31 [324] node203 corosync notice [QUORUM] Using quorum provider corosync_votequorum
Aug 29 17:14:31 [324] node203 corosync crit [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Aug 29 17:14:31 [324] node203 corosync error [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Aug 29 17:14:31 [324] node203 corosync error [MAIN ] Corosync Cluster Engine exiting with status 20 at service.c:356.
如下所示,corosync.conf配置了 corosync_votequorum,必须配置expected_votes:
4)修改corosync.conf配置:
5)启动corosync服务
[root@node201 corosync]# systemctl restart corosync
6)查看qdevice状态
[root@node202 ~]# pcs quorum status
Quorum information
------------------
Date: Thu Aug 29 17:41:02 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 3232235978
Ring ID: -1062731319/67
Quorate: No
Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 2
Quorum: 1 Activity blocked
Flags: 2Node WaitForAll LastManStanding
Unable to get node 3232235979 info
Membership information
----------------------
Nodeid Votes Qdevice Name
3232235977 1 NR node201
3232235978 1 NR node202 (local)
3232235979 0 NR node203
案例2、添加qdevice故障
如下所示,集群节点添加qdevice时,出现python错误:
[root@node201 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit
Setting up qdevice certificates on nodes...
Traceback (most recent call last):
File "/usr/sbin/pcs", line 9, in <module>
load_entry_point('pcs==0.9.169', 'console_scripts', 'pcs')()
......
File "/usr/lib/python2.7/site-packages/pcs/common/node_communicator.py", line 160, in url
host="[{0}]".format(self.host) if ":" in self.host else self.host,
TypeError: argument of type 'NoneType' is not iterable
创建集群:
[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success
[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success
如下所示,在创建集群后,节点添加qdevice成功:
[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started
标签:qdevice,可用,node201,node203,node202,仲裁,Linux,root,corosync
From: https://www.cnblogs.com/tiany1224/p/18388841