以下流程介绍了创建运行服务的一个 Pacemaker 集群,当节点上的服务变为不可用时,将其从一个节点切换到另一个节点上。通过这个步骤,您可以了解如何在双节点集群中创建服务,并可以查看在运行该服务的节点出现问题时会出现什么情况。
这个示例步骤配置一个运行 Apache HTTP 服务器的双节点 Pacemaker 集群。然后,您可以停止一个节点上的 Apache 服务来查看该服务仍然可用。


[root@node201 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)


节点为 node01.hw.net和 node02.hw.net。
浮动 IP 地址为。
         [root@node01 ~]# cat /etc/hosts   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6  node01 node01.hw.net  node01 node02.hw.net


  • 两个可以相互通讯的、运行 CentOS 7.9 的节点
  • 一个浮动的 IP 地址,它与一个节点静态分配的 IP 地址处于同一个网络。
  • 运行的节点的名称位于 /etc/hosts 文件中

在这两个节点中,通过 High Availability 频道安装 Red Hat High Availability Add-On 软件包,并启动并启用 pcsd 服务。

[root@node01 ~]# dnf install pcs pacemaker fence-agents-all

[root@node01 ~]# dnf list pcs pacemaker fence-agents-all
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Last metadata expiration check: 0:00:15 ago on Wed 08 May 2024 02:30:49 PM CST.
Installed Packages
fence-agents-all.x86_64                              4.2.1-41.el7                                        @System
pacemaker.x86_64                                     1.1.23-1.el7                                        @System
pcs.x86_64                                           0.9.169-3.el7.centos                                @System
Available Packages
fence-agents-all.x86_64                              4.2.1-41.el7_9.6                                    updates
pacemaker.x86_64                                     1.1.23-1.el7_9.1                                    updates
pcs.x86_64                                           0.9.169-3.el7.centos.3                              updates

[root@node01 ~]# systemctl start pcsd.service
[root@node01 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node01 ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2024-05-08 14:31:49 CST; 39s ago
     Docs: man:pcsd(8)
 Main PID: 27710 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─27710 /usr/bin/ruby /usr/lib/pcsd/pcsd

May 08 14:31:48 node01 systemd[1]: Starting PCS GUI and remote configuration interface...
May 08 14:31:49 node01 systemd[1]: Started PCS GUI and remote configuration interface.

如果您正在运行 firewalld 守护进程,在两个节点上启用红帽高可用性附加组件所需的端口。

# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload

在集群的两个节点上为用户 hacluster 设置密码。

[root@node02 ~]# id hacluster
uid=189(hacluster) gid=189(haclient) groups=189(haclient)

[root@node02 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.

在要运行 pcs 命令的节点上,为集群中的每个节点验证用户 hacluster。

# 移除旧的配置
[root@node01 ~]# pcs cluster  node  remove www.hw.net --force
www.hw.net: Stopping Cluster (pacemaker)...
www.hw.net: Successfully destroyed cluster
Error: Unable to update any nodes

[root@node01 ~]#  pcs cluster auth node01.hw.net node02.hw.net
Username: hacluster
node01.hw.net: Authorized
node02.hw.net: Authorized

[root@node02 ~]# pcs cluster auth node01.hw.net node02.hw.net
node01.hw.net: Already authorized
node02.hw.net: Already authorized

创建名为 my_cluster 的集群,两个节点都作为集群成员。这个命令会创建并启动集群。因为 pcs 配置命令对整个集群的影响,您只需要从集群的一个节点上运行。

[root@node01 ~]#  pcs cluster setup --name my_cluster --start node01.hw.net node02.hw.net
Destroying cluster on nodes: node01.hw.net, node02.hw.net...
node01.hw.net: Stopping Cluster (pacemaker)...
node02.hw.net: Stopping Cluster (pacemaker)...
node01.hw.net: Successfully destroyed cluster
node02.hw.net: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node01.hw.net', 'node02.hw.net'
node01.hw.net: successful distribution of the file 'pacemaker_remote authkey'
node02.hw.net: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01.hw.net: Succeeded
node02.hw.net: Succeeded

Starting cluster on nodes: node01.hw.net, node02.hw.net...
node01.hw.net: Starting Cluster (corosync)...
node02.hw.net: Starting Cluster (corosync)...
node01.hw.net: Starting Cluster (pacemaker)...
node02.hw.net: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes node01.hw.net, node02.hw.net...
node01.hw.net: Success
node02.hw.net: Success
Restarting pcsd on the nodes in order to reload the certificates...
node01.hw.net: Success
node02.hw.net: Success

红帽高可用性集群要求为集群配置隔离功能。需要满足这个要求的原因包括在 Red Hat High Availability 集群中的隔离中。在这里,仅显示在这个配置中故障转移是如何工作的。把 stonith-enabled 集群选项设置为 false 来禁用隔离
对生产集群而言,不要使用 stonith-enabled=false。它通知集群,假设出现故障的节点已被安全隔离。

[root@node01 ~]# pcs status
Cluster name: my_cluster
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 14:45:55 2024
Last change: Wed May  8 14:45:25 2024 by hacluster via crmd on node01.hw.net

2 nodes configured
0 resource instances configured

Online: [ node01.hw.net node02.hw.net ]
No resources
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

[root@node02 ~]# pcs property set stonith-enabled=false

[root@node02 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 14:46:29 2024
Last change: Wed May  8 14:46:17 2024 by root via cibadmin on node01.hw.net

2 nodes configured
0 resource instances configured
Online: [ node01.hw.net node02.hw.net ]No resources
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

运行 pcs cluster status 命令时,可能会显示与系统组件启动时稍有不同示例的输出。

[root@node01 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
 Last updated: Wed May  8 14:47:00 2024
 Last change: Wed May  8 14:46:17 2024 by root via cibadmin on node01.hw.net
 2 nodes configured
 0 resource instances configured

PCSD Status:
  node01.hw.net: Online
  node02.hw.net: Online

在这两个节点中,配置网页浏览器并创建一个网页来显示简单的文本信息。如果您正在运行 firewalld 守护进程,启用 httpd 所需的端口。
不要使用 systemctl enable 启用任何由集群管理的服务在系统引导时启动。

# dnf install -y httpd wget
# firewall-cmd --permanent --add-service=http
# firewall-cmd --reload

# cat <<-END >/var/www/html/index.html
<body>My Test Site - $(hostname)</body>

要让 Apache 资源代理获得 Apache 状态,集群中的每个节点都会在现有配置之外创建一个新的配置来启用状态服务器 URL。

# cat <<-END > /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from
Allow from ::1

创建 IPaddr2 和 apache 资源,供集群管理。'IPaddr2' 资源是一个浮动 IP 地址,它不能是一个已经与物理节点关联的 IP 地址。如果没有指定 'IPaddr2' 资源的 NIC 设备,浮动 IP 必须位于与静态分配的 IP 地址相同的网络中。
您可以使用 pcs resource list 命令显示所有可用资源类型的列表。您可以使用 pcs resource describe resourcetype 命令显示您可以为指定资源类型设置的参数。例如,以下命令显示您可以为类型为 apache 的资源设置的参数:

# pcs resource describe apache

在这个示例中,IP 地址资源和 apache 资源都配置为名为 apachegroup 的组的一部分,这样可确保这些资源在同一节点中运行。

[root@node02 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip= --group apachegroup
[root@node02 ~]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" --group apachegroup
[root@node02 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 14:54:16 2024
Last change: Wed May  8 14:54:06 2024 by root via cibadmin on node02.hw.net

2 nodes configured
2 resource instances configured

Online: [ node01.hw.net node02.hw.net ]

Full list of resources:

 Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started node01.hw.net
     WebSite    (ocf::heartbeat:apache):        Started node01.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# vip和httpd服务加载在node01节点
[root@node01 ~]# ps -ef |grep httpd
root      4868     1  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    4869  4868  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    4870  4868  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    4871  4868  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    4872  4868  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    4873  4868  0 14:54 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid

[root@node01 ~]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:6c:30:8f brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet brd scope global secondary enp0s3
       valid_lft forever preferred_lft forever

[root@node02 ~]# ps -ef |grep httpd

请注意,在这个实例中,apachegroup 服务在节点node01.hw.net 中运行。

将浏览器指向使用您配置的浮动 IP 地址创建的网站。这会显示您定义的文本信息,显示运行网站的节点名称。
停止 apache web 服务。使用 killall -9 模拟应用程序级别的崩溃。
# killall -9 httpd

检查集群状态。您应该可以看到,停止 web 服务会导致操作失败,但集群软件在运行该服务的节点中重启该服务,所以您应该仍然可以访问网页浏览器。


[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 14:57:38 2024
Last change: Wed May  8 14:54:06 2024 by root via cibadmin on node02.hw.net

2 nodes configured
2 resource instances configured

Online: [ node01.hw.net node02.hw.net ]

Full list of resources:
Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started node01.hw.net
     WebSite    (ocf::heartbeat:apache):        Started node01.hw.net

Failed Resource Actions:
* WebSite_monitor_10000 on node01.hw.net 'not running' (7): call=13, status=complete, exitreason='',
    last-rc-change='Wed May  8 14:57:27 2024', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


[root@node01 ~]# pcs resource cleanup WebSite
Cleaned up ClusterIP on node02.hw.net
Cleaned up ClusterIP on node01.hw.net
Cleaned up WebSite on node02.hw.net
Cleaned up WebSite on node01.hw.net
Waiting for 1 reply from the CRMd. OK

[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 14:59:00 2024
Last change: Wed May  8 14:58:58 2024 by hacluster via crmd on node01.hw.net

2 nodes configured
2 resource instances configured

Online: [ node01.hw.net node02.hw.net ]

Full list of resources:

 Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started node01.hw.net
     WebSite    (ocf::heartbeat:apache):        Started node01.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

[root@node01 ~]# pcs node unstandby node01.hw.net


[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 15:00:35 2024
Last change: Wed May  8 14:59:44 2024 by root via cibadmin on node01.hw.net

2 nodes configured
2 resource instances configured

Node node01.hw.net: standby
Online: [ node02.hw.net ]

Full list of resources:

 Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started node02.hw.net
     WebSite    (ocf::heartbeat:apache):        Started node02.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# vip和httpd服务切换到node02
[root@node02 ~]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:7c:f2:c3 brd ff:ff:ff:ff:ff:ff
    inet brd scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet brd scope global secondary enp0s3
       valid_lft forever preferred_lft forever

[root@node02 ~]# ps -ef |grep httpd
root     28894     1  0 14:59 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   29023 28894  0 14:59 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   29024 28894  0 14:59 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   29025 28894  0 14:59 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   29026 28894  0 14:59 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   29027 28894  0 14:59?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid



[root@node01 ~]# pcs node unstandby node01.hw.net
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node01.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed May  8 15:03:00 2024
Last change: Wed May  8 15:02:56 2024 by root via cibadmin on node01.hw.net

2 nodes configured
2 resource instances configured

Online: [ node01.hw.net node02.hw.net ]
Full list of resources:
Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started node02.hw.net
     WebSite    (ocf::heartbeat:apache):        Started node02.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled



[root@node01 ~]# pcs cluster stop --all
node02.hw.net: Stopping Cluster (pacemaker)...
node01.hw.net: Stopping Cluster (pacemaker)...
node02.hw.net: Stopping Cluster (corosync)...
node01.hw.net: Stopping Cluster (corosync)...

