案例说明:
要熟悉您用来创建 Pacemaker 集群的工具和进程,您可以执行以下流程。这些内容适用于想了解集群软件以及如何管理它,而不需要配置集群的用户。
注意
这些步骤并不会创建受支持的红帽集群。受支持的红帽集群至少需要两个节点并配置隔离设备。有关红帽对 RHEL 高可用性集群的支持政策、要求和限制的详情,请参考 RHEL 高可用性集群的支持政策。
2.1. 学习使用 Pacemaker
通过这个过程,您将了解如何使用 Pacemaker 设置集群、如何显示集群状态以及如何配置集群服务。这个示例创建了一个 Apache HTTP 服务器作为集群资源,并显示了集群在资源失败时如何响应。
在本例中:
节点为 www.hw.net。
浮动 IP 地址为 192.168.1.120。
系统版本:
[root@node01 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
主机信息:
[root@node01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.11 node01 www.hw.net
先决条件:
- 一个用于运行 CentOS 7.9 的单个节点
- 一个浮动的 IP 地址,它与一个节点静态分配的 IP 地址处于同一个网络。
- 运行的节点的名称位于 /etc/hosts 文件中
步骤
从 High Availability 频道安装 Red Hat High Availability Add-On 软件包,然后启动并启用 pcsd 服务。
[root@node01 ~]# dnf install pcs pacemaker fence-agents-all
[root@node01 ~]# dnf list pcs pacemaker fence-agents-all
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Last metadata expiration check: 0:08:30 ago on Tue 07 May 2024 03:00:50 PM CST.
Installed Packages
fence-agents-all.x86_64 4.2.1-41.el7 @System
pacemaker.x86_64 1.1.23-1.el7 @System
pcs.x86_64 0.9.169-3.el7.centos @System
Available Packages
fence-agents-all.x86_64 4.2.1-41.el7_9.6 updates
pacemaker.x86_64 1.1.23-1.el7_9.1 updates
pcs.x86_64
# 启动pcsd服务
# systemctl start pcsd.service
# systemctl enable pcsd.service
[root@node01 ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2024-05-07 15:09:57 CST; 17s ago
Docs: man:pcsd(8)
man:pcs(8)
Main PID: 3790 (pcsd)
CGroup: /system.slice/pcsd.service
└─3790 /usr/bin/ruby /usr/lib/pcsd/pcsd
May 07 15:09:56 node01 systemd[1]: Starting PCS GUI and remote configuration interface...
May 07 15:09:57 node01 systemd[1]: Started PCS GUI and remote configuration interface.
如果您正在运行 firewalld 守护进程,启用红帽高可用性附加组件所需的端口。
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload
在集群的每个节点上为用户 hacluster 设置密码,并为您要从中运行 pcs 命令的集群中每个节点验证用户 hacluster。本例只使用一个节点,您要从这个节点中运行命令。把这一步包括在这个步骤的原因是,它是配置一个被支持的红帽高可用性多节点集群的一个必要步骤。
[root@node01 ~]# id hacluster
uid=189(hacluster) gid=189(haclient) groups=189(haclient)
[root@node01 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@node01 ~]# pcs cluster auth www.hw.net
Username: hacluster
Password:
www.hw.net: Authorized
创建名为 my_cluster 的集群,具有一个成员并检查集群的状态。这个命令会创建并启动集群。
[root@node01 ~]# pcs cluster setup --name my_cluster --start www.hw.net
Destroying cluster on nodes: www.hw.net...
www.hw.net: Stopping Cluster (pacemaker)...
www.hw.net: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'www.hw.net'
www.hw.net: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
www.hw.net: Succeeded
Starting cluster on nodes: www.hw.net...
www.hw.net: Starting Cluster (corosync)...
www.hw.net: Starting Cluster (pacemaker)...
Synchronizing pcsd certificates on nodes www.hw.net...
www.hw.net: Success
Restarting pcsd on the nodes in order to reload the certificates...
www.hw.net: Success
# 查看资源状态
[root@node01 ~]# pcs status
Cluster name: my_cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May 7 15:24:48 2024
Last change: Tue May 7 15:24:38 2024 by hacluster via crmd on www.hw.net
1 node configured
0 resource instances configured
Online: [ www.hw.net ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
红帽高可用性集群要求为集群配置隔离功能。需要满足这个要求的原因包括在 Red Hat High Availability 集群中的隔离中。但是,这里只显示如何使用基本的 Pacemaker 命令,因此将 stonith-enabled 集群选项设置为 false 来禁用隔离功能。
警告
对生产集群而言,不要使用 stonith-enabled=false。它通知集群,假设出现故障的节点已被安全隔离。
[root@node01 ~]# pcs property set stonith-enabled=false
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May 7 15:25:41 2024
Last change: Tue May 7 15:25:35 2024 by root via cibadmin on www.hw.net
1 node configured
0 resource instances configured
Online: [ www.hw.net ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
在您的系统中配置网页浏览器并创建一个网页来显示简单文本信息。如果您正在运行 firewalld 守护进程,启用 httpd 所需的端口。
注意
不要使用 systemctl enable 启用任何由集群管理的服务在系统引导时启动。
# dnf install -y httpd wget
...
# firewall-cmd --permanent --add-service=http
# firewall-cmd --reload
# cat <<-END >/var/www/html/index.html
<html>
<body>My Test Site - $(hostname)</body>
</html>
END
要让 Apache 资源代理获得 Apache 状态,在现有配置中添加以下内容来启用状态服务器 URL。
# cat <<-END > /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
Allow from ::1
</Location>
END
创建 IPaddr2 和 apache 资源,供集群管理。'IPaddr2' 资源是一个浮动 IP 地址,它不能是一个已经与物理节点关联的 IP 地址。如果没有指定 'IPaddr2' 资源的 NIC 设备,浮动 IP 必须位于与静态分配的 IP 地址相同的网络中。
您可以使用 pcs resource list 命令显示所有可用资源类型的列表。您可以使用 pcs resource describe resourcetype 命令显示您可以为指定资源类型设置的参数。例如,以下命令显示您可以为类型为 apache 的资源设置的参数:
[root@node01 ~]# pcs resource list
lsb:netconsole - Initializes network console logging
lsb:network - Bring up/down networking
ocf:heartbeat:aliyun-vpc-move-ip - Move IP within a VPC of the Aliyun ECS
ocf:heartbeat:apache - Manages an Apache Web server instance
ocf:heartbeat:aws-vpc-move-ip - Move IP within a VPC of the AWS EC2
ocf:heartbeat:aws-vpc-route53 - Update Route53 VPC record for AWS EC2
ocf:heartbeat:awseip - Amazon AWS Elastic IP Address Resource Agent
ocf:heartbeat:awsvip - Amazon AWS Secondary Private IP Address Resource Agent
ocf:heartbeat:azure-events - Microsoft Azure Scheduled Events monitoring agent
ocf:heartbeat:azure-lb - Answers Azure Load Balancer health probe requests
.......
[root@node01 ~]# pcs resource describe apache
......
Default operations:
start: interval=0s timeout=40s
stop: interval=0s timeout=60s
monitor: interval=10s timeout=20s
在这个示例中,IP 地址资源和 apache 资源都配置为名为 apachegroup 的组的一部分,这样可确保在配置正常工作的多节点集群时让资源在同一节点中运行。
[root@node01 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.1.120 --group apachegroup
[root@node01 ~]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" --group apachegroup
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May 7 15:36:51 2024
Last change: Tue May 7 15:36:41 2024 by root via cibadmin on www.hw.net
1 node configured
2 resource instances configured
Online: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
ClusterIP (ocf::heartbeat:IPaddr2): Started www.hw.net
WebSite (ocf::heartbeat:apache): Started www.hw.net
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
# 查看vip资源
[root@node01 ~]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:6c:30:8f brd ff:ff:ff:ff:ff:ff
inet 192.168.1.11/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.120/24 brd 192.168.1.255 scope global secondary enp0s3
valid_lft forever preferred_lft forever
# 查看httpd服务
[root@node01 ~]# ps -ef |grep httpd
root 339 1973 0 15:38 pts/0 00:00:00 grep --color=auto httpd
root 32004 1 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 32005 32004 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 32006 32004 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 32007 32004 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 32008 32004 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 32009 32004 0 15:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
配置群集资源后,您可以使用 pcs resource show 命令显示为该资源配置的选项。
[root@node01 ~]# pcs resource show WebSite
Resource: WebSite (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
Operations: monitor interval=10s timeout=20s (WebSite-monitor-interval-10s)
start interval=0s timeout=40s (WebSite-start-interval-0s)
stop interval=0s timeout=60s (WebSite-stop-interval-0s)
[root@node01 ~]# pcs resource show ClusterIP
Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.1.120
Operations: monitor interval=10s timeout=20s (ClusterIP-monitor-interval-10s)
start interval=0s timeout=20s (ClusterIP-start-interval-0s)
stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
将浏览器指向使用您配置的浮动 IP 地址创建的网站。这个命令会显示您定义的文本信息。
停止 apache web 服务并检查集群的状态。使用 killall -9 模拟应用程序级别的崩溃。
# killall -9 httpd
检查集群状态。您应该看到停止 web 服务会导致操作失败,但集群软件会重启该服务,您应该仍然可以访问网站。
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May 7 15:41:09 2024
Last change: Tue May 7 15:36:41 2024 by root via cibadmin on www.hw.net
1 node configured
2 resource instances configured
Online: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
ClusterIP (ocf::heartbeat:IPaddr2): Started www.hw.net
WebSite (ocf::heartbeat:apache): Started www.hw.net
Failed Resource Actions:
* WebSite_monitor_10000 on www.hw.net 'not running' (7): call=13, status=complete, exitreason='',
last-rc-change='Tue May 7 15:41:02 2024', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node01 ~]# ps -ef |grep httpd
root 2308 1 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 2309 2308 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 2310 2308 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 2311 2308 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 2312 2308 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
apache 2313 2308 0 15:41 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
您可以在服务启动并再次运行后,清除失败的资源中的失败状态。当您查看集群状态时,失败的操作通知将不再出现。
[root@node01 ~]# pcs resource cleanup WebSite
Cleaned up ClusterIP on www.hw.net
Cleaned up WebSite on www.hw.net
Waiting for 1 reply from the CRMd. OK
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: www.hw.net (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Tue May 7 16:52:14 2024
Last change: Tue May 7 16:52:09 2024 by hacluster via crmd on www.hw.net
1 node configured
2 resource instances configured
Online: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
ClusterIP (ocf::heartbeat:IPaddr2): Started www.hw.net
WebSite (ocf::heartbeat:apache): Started www.hw.net
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
当您完成查看集群和集群状态后,停止节点上的集群服务。虽然在这里只在一个节点中只启动这个服务,但包含了 --all 参数,它会停止在一个实际的多节点集群中的所有节点上的集群服务。
[root@node01 ~]# pcs cluster stop --all
www.hw.net: Stopping Cluster (pacemaker)...
www.hw.net: Stopping Cluster (corosync)...
[root@node01 ~]# pcs status
Error: cluster is not currently running on this node
[root@node01 ~]# pcs cluster start --all
www.hw.net: Starting Cluster (corosync)...
www.hw.net: Starting Cluster (pacemaker)...
[root@node01 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: NONE
Last updated: Tue May 7 16:53:28 2024
Last change: Tue May 7 16:52:09 2024 by hacluster via crmd on www.hw.net
1 node configured
2 resource instances configured
OFFLINE: [ www.hw.net ]
Full list of resources:
Resource Group: apachegroup
ClusterIP (ocf::heartbeat:IPaddr2): Stopped
WebSite (ocf::heartbeat:apache): Stopped
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
标签:httpd,www,00,hw,---,Pacemaker,net,root,节点
From: https://www.cnblogs.com/tiany1224/p/18177824