首页 > 系统相关 >Linux 高可用仲裁设备配置

Linux 高可用仲裁设备配置

时间:2024-08-30 15:26:28浏览次数:8  
标签:qdevice 可用 node201 node203 node202 仲裁 Linux root corosync

Red Hat Enterprise Linux 7.4 完全支持配置作为集群的第三方设备的独立仲裁设备。它的主要用途是允许集群保持比标准仲裁规则允许更多的节点故障。建议在具有偶数节点的集群中使用仲裁设备。对于双节点群集,使用仲裁设备可以更好地决定在脑裂情况下保留哪些节点。
在配置仲裁设备,您必须考虑以下内容。
建议您在与使用该仲裁设备的集群相同的站点中的不同的物理网络中运行仲裁设备。理想情况下,仲裁设备主机应该独立于主集群,或者至少位于一个独立的 PSU,而不要与 corosync 环或者环位于同一个网络网段。
您不能同时在集群中使用多个仲裁设备。
虽然您不能同时在集群中使用多个仲裁设备,但多个集群可能同时使用一个仲裁设备。每个使用这个仲裁设备的集群都可以使用不同的算法和仲裁选项,因为它们保存在集群节点本身。例如,单个仲裁设备可由一个具有破坏 ( fifty/fifty split)算法的集群和具有 lms (last man standing)算法的第二个群集使用。
不应在现有集群节点中运行仲裁设备。

系统环境:

[root@node201 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

系统架构:

[root@node203 ~]# cat /etc/hosts
192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203  qdevice

一、系统环境部署
1、集群节点部署

[root@node201 ~]# dnf install corosync-qdevice
[root@node201 ~]# rpm -qa |grep qdevice
corosync-qdevice-2.4.5-7.el7_9.2.x86_64

[root@node201 ~]# rpm -qa |grep pcs
pcs-0.9.169-3.el7.centos.3.x86_64
pcsc-lite-libs-1.8.8-8.el7.x86_64

[root@node201 ~]# rpm -qa |egrep 'pacemaker|corosync'
corosynclib-2.4.5-7.el7_9.2.x86_64
pacemaker-1.1.23-1.el7_9.1.x86_64
pacemaker-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-doc-1.1.23-1.el7_9.1.x86_64
corosync-qdevice-2.4.5-7.el7_9.2.x86_64
pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
pacemaker-cli-1.1.23-1.el7_9.1.x86_64
corosync-2.4.5-7.el7_9.2.x86_64

2、仲裁节点部署

[root@node203 ~]# dnf install pcs corosync-qnet
[root@node203 corosync]# yum install -y corosync-qdevice
# 启动pcs服务   
[root@node203 ~]# systemctl start pcsd.service
[root@node203 ~]# systemctl status pcsd.service
[root@node203 ~]# systemctl enable pcsd.service

二、建立集群
1、建立用户认证
如下所示,在集群节点及qdevice节点建立哈cluster用户并设置密码:

[root@node201 ~]# id hacluster
uid=003(hacluster) gid=1004(haclient) groups=1004(haclient)
[root@node202 ~]# id hacluster
uid=5001(hacluster) gid=5010(haclient) groups=5010(haclient)
[root@node203 ~]# id hacluster
uid=5001(hacluster) gid=1004(haclient) groups=1004(haclient)

如下所示,在集群节点建立到qdevice节点的认证:

[root@node201 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized
[root@node202 ~]# pcs cluster auth node203
Username: hacluster
Password:
node203: Authorized

2、创建集群
如下所示,创建集群test_cluster:

[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success

[root@node201 pcs]#  pcs cluster start --all
node201: Starting Cluster (corosync)...
node202: Starting Cluster (corosync)...
node203: Starting Cluster (corosync)...
node203: Starting Cluster (pacemaker)...
node202: Starting Cluster (pacemaker)...
node201: Starting Cluster (pacemaker)...

[root@node201 pcs]#  pcs cluster status
Cluster Status:
 Stack: unknown
 Current DC: NONE
 Last updated: Thu Aug 29 19:24:45 2024
 Last change: Thu Aug 29 19:24:40 2024 by hacluster via crmd on node203
 3 nodes configured
 0 resource instances configured
PCSD Status:
  node203: Online
  node201: Online
  node202: Online

[root@node201 pcs]# pcs cluster enable --all
node201: Cluster Enabled
node202: Cluster Enabled
node203: Cluster Enabled

三、配置仲裁设备
仲裁设备模型是 net,这是目前唯一支持的模型。net 模型支持以下算法:

  • ffsplit :5-fifty split.这为拥有最多活跃节点的分区提供一个投票。
  • lMS:le -man-standing.如果节点是集群中唯一可以看到 qnetd 服务器的节点,则它将返回一个投票。

1、配置并启动仲裁设备模型net

[root@node203 ~]# pcs qdevice setup model net --enable --start
Quorum device 'net' initialized
quorum device enabled
Starting quorum device...
quorum device started

2、查看仲裁设备状态

[root@node203 ~]# pcs qdevice status net --full
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              0
Connected clusters:             0
Maximum send/receive size:      32768/32768 bytes

如下所示,将qdevice访问加入到防火墙:

[root@node203 ~]# firewall-cmd --permanent --add-service=high-availability
FirewallD is not running
[root@node203 ~]# firewall-cmd --add-service=high-availability
FirewallD is not running

3、集群节点添加仲裁
1)集群节点配置corosync.conf(所有集群节点)

[root@node202 corosync]#  cat /etc/corosync/corosync.conf |grep -v ^#|grep -v ^$|grep -v '#'
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 239.255.1.1
                mcastport: 5405
                ttl: 1
        }
}
logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}
quorum {
        provider: corosync_votequorum
        expected_votes: 7
}
nodelist {
        node { ring0_addr: node201
               nodeid: 1
        }
        node { ring0_addr: node202
               nodeid: 2
        }
}

quorum配置说明:

quorum {
        provider: corosync_votequorum      # 启动了votequorum
        expected_votes: 7             # 7表示,7个节点,quorum为4。如果设置了nodelist参数,expected_votes无效
        wait_for_all: 1              # 值为1表示,当集群启动,集群quorum被挂起,直到所有节点在线并加入集群,这个参数是Corosync 2.0新增的。
        last_man_standing: 1            # 为1表示,启用LMS特性。默认这个特性是关闭的,即值为0。
                             # 这个参数开启后,当集群的处于表决边缘(如expected_votes=7,而当前online nodes=4),处于表决边缘状态超过last_man_standing_window参数指定的时间,
                             # 则重新计算quorum,直到online nodes=2。如果想让online nodes能够等于1,必须启用auto_tie_breaker选项,生产环境不推荐。
        last_man_standing_window: 10000        # 单位为毫秒。在一个或多个主机从集群中丢失后,重新计算quorum

2)重启corosync服务

[root@node201 corosync]# systemctl restart corosync
[root@node201 corosync]# systemctl status corosync

3)查看并添加集群节点仲裁配置

[root@node201 ~]# pcs quorum config
Options:

查看quorum状态:
[root@node202 corosync]#  pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 17:56:32 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          3232235978
Ring ID:          -1062731319/85
Quorate:          No

Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      3
Quorum:           4 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
3232235977          1         NR node201
3232235978          1         NR node202 (local)
3232235979          1         NR node203

如下所示,集群节点添加仲裁设备,并指定算法ffsplit:

[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started

[root@node202 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Error: quorum device is already defined

4)查看添加quorum后的配置

[root@node201 pcs]# pcs quorum config
Options:
Device:
  votes: 1
  Model: net
    algorithm: ffsplit
    host: node203
	
[root@node201 pcs]# pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 19:31:45 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1/98
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW node201 (local)
         2          1    A,V,NMW node202
         3          1    A,V,NMW node203
         0          1            Qdevice



[root@node201 pcs]# pcs quorum device status
Qdevice information
-------------------
Model:                  Net
Node ID:                1
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
    2   Node ID = 3
Membership node list:   1, 2, 3

Qdevice-net information
----------------------
Cluster name:           test_cluster
QNetd host:             node203:5403
Algorithm:              Fifty-Fifty split
Tie-breaker:            Node with lowest node ID
State:                  Connected

5)在仲裁节点查看qdevice连接情况

[root@node203 corosync]# pcs qdevice status net --full
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              3
Connected clusters:             1
Maximum send/receive size:      32768/32768 bytes
Cluster "test_cluster":
    Algorithm:          Fifty-Fifty split
    Tie-breaker:        Node with lowest node ID
    Node ID 3:
        Client address:         ::ffff:192.168.1.203:45974
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 1:
        Client address:         ::ffff:192.168.1.201:40657
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 2:
        Client address:         ::ffff:192.168.1.202:35765
        HB interval:            8000ms
        Configured node list:   1, 2, 3
        Ring ID:                1.62
        Membership node list:   1, 2, 3
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   No change (ACK)

四、管理仲裁设备
PCS 提供了在本地主机上管理仲裁设备服务(corosync-qnetd)的功能,如下例所示。请注意,这些命令仅影响 corosync-qnetd 服务。

[root@qdevice:~]# pcs qdevice start net
[root@qdevice:~]# pcs qdevice stop net
[root@qdevice:~]# pcs qdevice enable net
[root@qdevice:~]# pcs qdevice disable net
[root@qdevice:~]# pcs qdevice kill net

附件:配置错误案例

案例1、查看qdevice状态异常
如下所示,集群节点查看quorum状态时报错:

[root@node201 ~]#  pcs quorum status
Error: Unable to get quorum status: Unable to start votequorum status tracking: CS_ERR_BAD_HANDLE

1)查看corosync.conf配置

2)启动corosync错误

[root@node202 ~]# systemctl restart corosync
Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xe" for details.

3)查看corosync日志

[root@node201 corosync]# tail -1000 /var/log/cluster/corosync.log

Aug 29 17:14:31 [324] node203 corosync notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Aug 29 17:14:31 [324] node203 corosync notice  [QUORUM] Using quorum provider corosync_votequorum
Aug 29 17:14:31 [324] node203 corosync crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Aug 29 17:14:31 [324] node203 corosync error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Aug 29 17:14:31 [324] node203 corosync error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.

如下所示,corosync.conf配置了 corosync_votequorum,必须配置expected_votes:

4)修改corosync.conf配置:

5)启动corosync服务
[root@node201 corosync]# systemctl restart corosync

6)查看qdevice状态

[root@node202 ~]# pcs quorum status
Quorum information
------------------
Date:             Thu Aug 29 17:41:02 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          3232235978
Ring ID:          -1062731319/67
Quorate:          No

Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      2
Quorum:           1 Activity blocked
Flags:            2Node WaitForAll LastManStanding
Unable to get node 3232235979 info

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
3232235977          1         NR node201
3232235978          1         NR node202 (local)
3232235979          0         NR node203

案例2、添加qdevice故障

如下所示,集群节点添加qdevice时,出现python错误:

[root@node201 corosync]# pcs quorum device add model net host=node203 algorithm=ffsplit
Setting up qdevice certificates on nodes...
Traceback (most recent call last):
  File "/usr/sbin/pcs", line 9, in <module>
    load_entry_point('pcs==0.9.169', 'console_scripts', 'pcs')()
......
  File "/usr/lib/python2.7/site-packages/pcs/common/node_communicator.py", line 160, in url
    host="[{0}]".format(self.host) if ":" in self.host else self.host,
TypeError: argument of type 'NoneType' is not iterable

创建集群:

[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success
[root@node201 pcs]# pcs cluster setup --name test_cluster node201 node202 node203 --force
Destroying cluster on nodes: node201, node202, node203...
node202: Stopping Cluster (pacemaker)...
node203: Stopping Cluster (pacemaker)...
node201: Stopping Cluster (pacemaker)...
node203: Successfully destroyed cluster
node202: Successfully destroyed cluster
node201: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node201', 'node202', 'node203'
node201: successful distribution of the file 'pacemaker_remote authkey'
node203: successful distribution of the file 'pacemaker_remote authkey'
node202: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded

Synchronizing pcsd certificates on nodes node201, node202, node203...
node201: Success
node203: Success
node202: Success
Restarting pcsd on the nodes in order to reload the certificates...
node201: Success
node203: Success
node202: Success

如下所示,在创建集群后,节点添加qdevice成功:

[root@node201 pcs]# pcs quorum device add model net host=node203 algorithm=ffsplit --force
Setting up qdevice certificates on nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Enabling corosync-qdevice...
node203: corosync-qdevice enabled
node201: corosync-qdevice enabled
node202: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
node201: Succeeded
node202: Succeeded
node203: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
node203: corosync-qdevice started
node201: corosync-qdevice started
node202: corosync-qdevice started

标签:qdevice,可用,node201,node203,node202,仲裁,Linux,root,corosync
From: https://www.cnblogs.com/tiany1224/p/18388841

相关文章

  • Linux
    1.用户管理1.1用户管理指令增加1)增加useradduseradd选项说明-u指定用户uid-s指定命令解释器,默认是/bin/bash-M不创建家目录[root@hmk888~]#useradd-u1314-s/sbin/nologin-Mmysqlpasswd修改密码passwd选项说明--stdin非交互设......
  • Linux驱动学习之PWM
    PWM介绍他就是一个总周期不变占空比可调制的方波!PWM的基础知识占空比:有效电平/周期周期:周期不用多说,高中正弦,余弦波都学过。分辨率:最小占空比(如把周期分为10份,那分辨率就是10%)Linux下的pwmLinux内核已经集成了pwm底层部分初始化,我们所要做的就是利用他给的接口,设置......
  • Linux驱动学习之input子系统
    简介input子系统就是管理输入的子系统,和pinctrl、gpio子系统一样,都是Linux内核针对某一类设备而创建的框架。按键、鼠标、键盘、触摸屏等都属于输入设备,linux内核为此专门做了一个叫做input子系统的框架来处理输入事件。输入设备本质上还是字符设备,只是在此基础上套上了......
  • Redis高可用方案:使用Keepalived实现主备双活
    注意:请确保已经安装Redis和keepalived,本文不在介绍如何安装。1、使用版本说明Redis版本:5.0.2Keepalived版本:1.3.5Linux版本:Centos7.9查看Redis版本:/usr/local/redis/bin/redis-cli-v查看Keepalived版本信息:rpm-qa|grepkeepalived或者keepalived-v2、功能实......
  • RabbitMQ 集群与高可用性
    目录单节点与集群部署1.1.单节点部署1.2.集群部署 镜像队列1.定义与工作原理2.配置镜像队列3.应用场景4.优缺点5.Java示例分布式部署1.分布式部署的主要目标2.典型架构设计3.RabbitMQ分布式部署的关键技术4.部署策略和实践5.分布式部署的挑战和解......
  • 修改 Linux 默认 Shell 用 chsh -s /bin/zsh 命令不生效,提示 chsh: Shell not changed
    我想现在应该有很多人都已经使用zsh作为默认的Shell了,尤其是搭配oh-my-zsh之后,真是好用得飞起。一般,我们在切换系统默认的Shell的时候,都会使用 chsh-s/bin/zsh 命令来进行修改。今天我遇到一个问题,在某台老服务器上,使用这个命令无法修改,具体原因未知。始终提示——chsh:She......
  • Linux日志的查看方法
    使用cat命令:显示文件内容,适合快速查看文件开头部分或结合管道命令如grep进行过滤。cat/var/log/syslog|greperror使用tail命令:查看文件末尾部分,常用于查看最新的日志信息。-f选项可实时跟踪日志更新。tail-n50/var/log/syslogtail-f/var/log/syslog使用head......
  • 安全:linux禁止响应ping,不使用防火墙
    一,永久性关闭响应ping查看默认是否允许ping:[root@bloggsapi]#cat/proc/sys/net/ipv4/icmp_echo_ignore_all0 说明:(0表示允许,1表示禁止)编辑sysctl.conf[root@bloggsapi]#vi/etc/sysctl.conf增加一行:#ignorepingnet.ipv4.icmp_echo_ignore_all=1使生效:[root......
  • pg14+etcd+Patroni 高可用配置流程
    pg14+etcd+Patroni高可用配置流程目录pg14+etcd+Patroni高可用配置流程基础配置IP规划:修改超级用户密码:在主库上创建流复制用户修改pg_hba文件备库重新创建数据目录:开启watchdog创建.pgpass生成备库在备库修改同步信息备库查看主库查看同步信息安装etcd下载etcd安装包:创建et......
  • Linux中cd命令的基本用法!
    cd命令是Linux中最常见的命令之一,全拼changedirectory,其命令主要用于切换当前工作目录,本篇文章为大家介绍一下Linux中cd命令的常见用法,一起来看看吧。常见的cd命令用法:1、进入当前工作目录下的目录:cd./2、进入其他目录:cd/home/user/documents/3、......