Pacemaker 入门之--- 单节点高可用配置和管理

标签：httpd www 00 hw --- Pacemaker net root 节点

案例说明：
要熟悉您用来创建 Pacemaker 集群的工具和进程，您可以执行以下流程。这些内容适用于想了解集群软件以及如何管理它，而不需要配置集群的用户。
注意
这些步骤并不会创建受支持的红帽集群。受支持的红帽集群至少需要两个节点并配置隔离设备。有关红帽对 RHEL 高可用性集群的支持政策、要求和限制的详情，请参考 RHEL 高可用性集群的支持政策。

2.1. 学习使用 Pacemaker
通过这个过程，您将了解如何使用 Pacemaker 设置集群、如何显示集群状态以及如何配置集群服务。这个示例创建了一个 Apache HTTP 服务器作为集群资源，并显示了集群在资源失败时如何响应。
在本例中：

节点为 www.hw.net。
浮动 IP 地址为 192.168.1.120。

系统版本：

   [root@node01 ~]# cat /etc/centos-release
                CentOS Linux release 7.9.2009 (Core)

主机信息：

[root@node01 ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

     192.168.1.11  node01 www.hw.net

先决条件：

一个用于运行 CentOS 7.9 的单个节点
一个浮动的 IP 地址，它与一个节点静态分配的 IP 地址处于同一个网络。
运行的节点的名称位于 /etc/hosts 文件中

步骤
从 High Availability 频道安装 Red Hat High Availability Add-On 软件包，然后启动并启用 pcsd 服务。

[root@node01 ~]# dnf install pcs pacemaker fence-agents-all

[root@node01 ~]# dnf list pcs pacemaker fence-agents-all
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Last metadata expiration check: 0:08:30 ago on Tue 07 May 2024 03:00:50 PM CST.
Installed Packages
fence-agents-all.x86_64                              4.2.1-41.el7                                        @System
pacemaker.x86_64                                     1.1.23-1.el7                                        @System
pcs.x86_64                                           0.9.169-3.el7.centos                                @System
Available Packages
fence-agents-all.x86_64                              4.2.1-41.el7_9.6                                    updates
pacemaker.x86_64                                     1.1.23-1.el7_9.1                                    updates
pcs.x86_64  

# 启动pcsd服务
# systemctl start pcsd.service
# systemctl enable pcsd.service

[root@node01 ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2024-05-07 15:09:57 CST; 17s ago
     Docs: man:pcsd(8)
           man:pcs(8)
 Main PID: 3790 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─3790 /usr/bin/ruby /usr/lib/pcsd/pcsd

May 07 15:09:56 node01 systemd[1]: Starting PCS GUI and remote configuration interface...
May 07 15:09:57 node01 systemd[1]: Started PCS GUI and remote configuration interface.

如果您正在运行 firewalld 守护进程，启用红帽高可用性附加组件所需的端口。

# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload

在集群的每个节点上为用户 hacluster 设置密码，并为您要从中运行 pcs 命令的集群中每个节点验证用户 hacluster。本例只使用一个节点，您要从这个节点中运行命令。把这一步包括在这个步骤的原因是，它是配置一个被支持的红帽高可用性多节点集群的一个必要步骤。

[root@node01 ~]# id hacluster
uid=189(hacluster) gid=189(haclient) groups=189(haclient)

[root@node01 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.

[root@node01 ~]# pcs cluster auth www.hw.net
Username: hacluster
Password:
www.hw.net: Authorized

创建名为 my_cluster 的集群，具有一个成员并检查集群的状态。这个命令会创建并启动集群。

[root@node01 ~]#  pcs cluster setup --name my_cluster --start www.hw.net
Destroying cluster on nodes: www.hw.net...
www.hw.net: Stopping Cluster (pacemaker)...
www.hw.net: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'www.hw.net'
www.hw.net: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
www.hw.net: Succeeded

Starting cluster on nodes: www.hw.net...
www.hw.net: Starting Cluster (corosync)...
www.hw.net: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes www.hw.net...
www.hw.net: Success
Restarting pcsd on the nodes in order to reload the certificates...
www.hw.net: Success


# 查看资源状态
[root@node01 ~]# pcs status
Cluster name: my_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May  7 15:24:48 2024
Last change: Tue May  7 15:24:38 2024 by hacluster via crmd on www.hw.net

1 node configured
0 resource instances configured

Online: [ www.hw.net ]
No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

红帽高可用性集群要求为集群配置隔离功能。需要满足这个要求的原因包括在 Red Hat High Availability 集群中的隔离中。但是，这里只显示如何使用基本的 Pacemaker 命令，因此将 stonith-enabled 集群选项设置为 false 来禁用隔离功能。
警告
对生产集群而言，不要使用 stonith-enabled=false。它通知集群，假设出现故障的节点已被安全隔离。

[root@node01 ~]# pcs property set stonith-enabled=false
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May  7 15:25:41 2024
Last change: Tue May  7 15:25:35 2024 by root via cibadmin on www.hw.net

1 node configured
0 resource instances configured

Online: [ www.hw.net ]
No resources
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

在您的系统中配置网页浏览器并创建一个网页来显示简单文本信息。如果您正在运行 firewalld 守护进程，启用 httpd 所需的端口。
注意
不要使用 systemctl enable 启用任何由集群管理的服务在系统引导时启动。

# dnf install -y httpd wget
...
# firewall-cmd --permanent --add-service=http
# firewall-cmd --reload

# cat <<-END >/var/www/html/index.html
<html>
<body>My Test Site - $(hostname)</body>
</html>
END

要让 Apache 资源代理获得 Apache 状态，在现有配置中添加以下内容来启用状态服务器 URL。

# cat <<-END > /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
Allow from ::1
</Location>
END

创建 IPaddr2 和 apache 资源，供集群管理。'IPaddr2' 资源是一个浮动 IP 地址，它不能是一个已经与物理节点关联的 IP 地址。如果没有指定 'IPaddr2' 资源的 NIC 设备，浮动 IP 必须位于与静态分配的 IP 地址相同的网络中。
您可以使用 pcs resource list 命令显示所有可用资源类型的列表。您可以使用 pcs resource describe resourcetype 命令显示您可以为指定资源类型设置的参数。例如，以下命令显示您可以为类型为 apache 的资源设置的参数：

[root@node01 ~]# pcs resource list
lsb:netconsole - Initializes network console logging
lsb:network - Bring up/down networking
ocf:heartbeat:aliyun-vpc-move-ip - Move IP within a VPC of the Aliyun ECS
ocf:heartbeat:apache - Manages an Apache Web server instance
ocf:heartbeat:aws-vpc-move-ip - Move IP within a VPC of the AWS EC2
ocf:heartbeat:aws-vpc-route53 - Update Route53 VPC record for AWS EC2
ocf:heartbeat:awseip - Amazon AWS Elastic IP Address Resource Agent
ocf:heartbeat:awsvip - Amazon AWS Secondary Private IP Address Resource Agent
ocf:heartbeat:azure-events - Microsoft Azure Scheduled Events monitoring agent
ocf:heartbeat:azure-lb - Answers Azure Load Balancer health probe requests
.......

[root@node01 ~]# pcs resource describe apache
......
Default operations:
  start: interval=0s timeout=40s
  stop: interval=0s timeout=60s
  monitor: interval=10s timeout=20s

在这个示例中，IP 地址资源和 apache 资源都配置为名为 apachegroup 的组的一部分，这样可确保在配置正常工作的多节点集群时让资源在同一节点中运行。

[root@node01 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.1.120 --group apachegroup
[root@node01 ~]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" --group apachegroup
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May  7 15:36:51 2024
Last change: Tue May  7 15:36:41 2024 by root via cibadmin on www.hw.net
1 node configured
2 resource instances configured

Online: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started www.hw.net
     WebSite    (ocf::heartbeat:apache):        Started www.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# 查看vip资源
[root@node01 ~]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:6c:30:8f brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.11/24 brd 192.168.1.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.1.120/24 brd 192.168.1.255 scope global secondary enp0s3
       valid_lft forever preferred_lft forever

# 查看httpd服务
[root@node01 ~]# ps -ef |grep httpd
root       339  1973  0 15:38 pts/0    00:00:00 grep --color=auto httpd
root     32004     1  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   32005 32004  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   32006 32004  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   32007 32004  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   32008 32004  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache   32009 32004  0 15:36 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid

配置群集资源后，您可以使用 pcs resource show 命令显示为该资源配置的选项。

[root@node01 ~]#  pcs resource show WebSite
 Resource: WebSite (class=ocf provider=heartbeat type=apache)
  Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
  Operations: monitor interval=10s timeout=20s (WebSite-monitor-interval-10s)
              start interval=0s timeout=40s (WebSite-start-interval-0s)
              stop interval=0s timeout=60s (WebSite-stop-interval-0s)

[root@node01 ~]#  pcs resource show ClusterIP
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.1.120
  Operations: monitor interval=10s timeout=20s (ClusterIP-monitor-interval-10s)
              start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)

将浏览器指向使用您配置的浮动 IP 地址创建的网站。这个命令会显示您定义的文本信息。
停止 apache web 服务并检查集群的状态。使用 killall -9 模拟应用程序级别的崩溃。
# killall -9 httpd

检查集群状态。您应该看到停止 web 服务会导致操作失败，但集群软件会重启该服务，您应该仍然可以访问网站。

[root@node01 ~]#  pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May  7 15:41:09 2024
Last change: Tue May  7 15:36:41 2024 by root via cibadmin on www.hw.net

1 node configured
2 resource instances configured

Online: [ www.hw.net ]
Full list of resources:

 Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started www.hw.net
     WebSite    (ocf::heartbeat:apache):        Started www.hw.net

Failed Resource Actions:
* WebSite_monitor_10000 on www.hw.net 'not running' (7): call=13, status=complete, exitreason='',
    last-rc-change='Tue May  7 15:41:02 2024', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

[root@node01 ~]# ps -ef |grep httpd
root      2308     1  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    2309  2308  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    2310  2308  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    2311  2308  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    2312  2308  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache    2313  2308  0 15:41 ?        00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid

您可以在服务启动并再次运行后，清除失败的资源中的失败状态。当您查看集群状态时，失败的操作通知将不再出现。

[root@node01 ~]# pcs resource cleanup WebSite
Cleaned up ClusterIP on www.hw.net
Cleaned up WebSite on www.hw.net
Waiting for 1 reply from the CRMd. OK
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May  7 16:52:14 2024
Last change: Tue May  7 16:52:09 2024 by hacluster via crmd on www.hw.net

1 node configured
2 resource instances configured

Online: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started www.hw.net
     WebSite    (ocf::heartbeat:apache):        Started www.hw.net

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

当您完成查看集群和集群状态后，停止节点上的集群服务。虽然在这里只在一个节点中只启动这个服务，但包含了 --all 参数，它会停止在一个实际的多节点集群中的所有节点上的集群服务。

[root@node01 ~]#  pcs cluster stop --all
www.hw.net: Stopping Cluster (pacemaker)...
www.hw.net: Stopping Cluster (corosync)...
[root@node01 ~]# pcs status
Error: cluster is not currently running on this node
[root@node01 ~]#  pcs cluster start --all
www.hw.net: Starting Cluster (corosync)...
www.hw.net: Starting Cluster (pacemaker)...

[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: NONE
Last updated: Tue May  7 16:53:28 2024
Last change: Tue May  7 16:52:09 2024 by hacluster via crmd on www.hw.net

1 node configured
2 resource instances configured

OFFLINE: [ www.hw.net ]

Full list of resources:

 Resource Group: apachegroup
     ClusterIP  (ocf::heartbeat:IPaddr2):       Stopped
     WebSite    (ocf::heartbeat:apache):        Stopped

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

标签：httpd,www,00,hw,---,Pacemaker,net,root,节点
From： https://www.cnblogs.com/tiany1224/p/18177824

Pacemaker 入门之--- 单节点高可用配置和管理

相关文章

赞助商

阅读排行