案例说明:
在CentOS 7环境部署gfs2共享存储应用,启动clvmd服务。
系统环境:
[root@node201 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
系统架构:
[root@node203 ~]# cat /etc/hosts
192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203 iscsi server
如下所示,通过iscsi server建立共享存储,在集群节点对共享存储建立gfs2文件系统应用:
集群逻辑卷管理器 (cLVM) :
摘要
当管理群集上的共享储存区时,所有节点必须收到有关对储存子系统所做更改的通知。Logical Volume Manager 2 (LVM2) 广泛用于管理本地储存,已扩展为支持对整个群集中的卷组进行透明管理。可使用与本地储存相同的命令来管理群集卷组。
系统通过不同的工具来协调群集 LVM2:
- 分布式锁管理器 (DLM)
通过锁定机制协调 cLVM 的磁盘访问并调解元数据访问。 - 逻辑卷管理器 2 (LVM2)
支持将一个文件系统灵活分布到多个磁盘上。LVM2 提供虚拟的磁盘空间池。 - 集群逻辑卷管理器 (cLVM)
集成了对 LVM2 元数据的访问,以使每个节点都了解相关更改。cLVM 未集成对共享数据本身的访问;要使 cLVM 可执行此操作,必须在 cLVM 管理的储存区上配置 OCFS2 或其他群集感知应用程序。
一、gfs2配置
gfs2配置参考以下文档:
https://note.youdao.com/ynoteshare/index.html?id=e6803c96c40b9684a82dc2f81479abc6&type=note&_time=1725421394382
二、启动iscsid服务
如下所示,客户端启动iscsid服务后,状态异常,并且系统message日志出现告警日志:
[root@node201 ~]# systemctl status iscsid
● iscsid.service - Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2024-09-04 11:04:02 CST; 11min ago
......
Main PID: 1076 (iscsid)
Status: "Ready to process requests"
Tasks: 1
CGroup: /system.slice/iscsid.service
└─1076 /sbin/iscsid -f
Sep 04 11:15:33 node201 iscsid[1076]: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 04 11:15:35 node201 iscsid[1076]: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_...e (3)
Sep 04 11:15:37 node201 iscsid[1076]: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 04 11:15:39 node201 iscsid[1076]: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_...e (3)
Sep 04 11:15:41 node201 iscsid[1076]: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 04 11:15:43 node201 iscsid[1076]: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_...e (3)
Sep 04 11:15:45 node201 iscsid[1076]: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 04 11:15:47 node201 iscsid[1076]: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_...e (3)
Sep 04 11:15:49 node201 iscsid[1076]: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 04 11:15:51 node201 iscsid[1076]: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_...e (3)
Hint: Some lines were ellipsized, use -l to show in full.
2、系统message日志
如下所示,系统message日志出现异常告警:
[root@node201 KingbaseHA]# tail /var/log/messages
Sep 4 11:14:57 node201 iscsid: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 4 11:14:59 node201 kernel: connection1:0: detected conn error (1020)
Sep 4 11:14:59 node201 iscsid: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Sep 4 11:15:01 node201 iscsid: iscsid: connection1:0 is operational after recovery (1 attempts)
Sep 4 11:15:03 node201 kernel: connection1:0: detected conn error (1020)
Sep 4 11:15:03 node201 iscsid: iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
3、注销另外集群节点iscsi服务访问
[root@node202 ~]# iscsiadm -m node --logoutall=all
Logging out of session [sid: 1, target: iqn.2024-08.pip.cc:server, portal: 192.168.1.203,3260]
Logout of [sid: 1, target: iqn.2024-08.pip.cc:server, portal: 192.168.1.203,3260] successful.
4、当前节点重启iscsid服务
如下所示,重启iscsid服务后,状态正常:
[root@node201 ~]# systemctl restart iscsid
[root@node201 ~]# systemctl status iscsid
● iscsid.service - Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2024-09-04 11:16:36 CST; 1s ago
......
Main PID: 3153 (iscsid)
Status: "Syncing existing session(s)"
Tasks: 2
CGroup: /system.slice/iscsid.service
├─3153 /sbin/iscsid -f
└─3156 /sbin/iscsid -f
Sep 04 11:16:36 node201 systemd[1]: Stopped Open-iSCSI.
Sep 04 11:16:36 node201 systemd[1]: Starting Open-iSCSI...
Sep 04 11:16:36 node201 systemd[1]: Started Open-iSCSI.
# 系统message日志信息正常
[root@node201 ~]# tail /var/log/messages
Sep 4 11:16:31 node201 dnf: Repository epel-debuginfo is listed more than once in the configuration
Sep 4 11:16:31 node201 dnf: Repository epel-source is listed more than once in the configuration
Sep 4 11:16:31 node201 dnf: Metadata cache refreshed recently.
Sep 4 11:16:31 node201 systemd: Started dnf makecache.
Sep 4 11:16:36 node201 systemd: Stopping Open-iSCSI...
Sep 4 11:16:36 node201 iscsid: iscsid: iscsid shutting down.
Sep 4 11:16:36 node201 systemd: Stopped Open-iSCSI.
Sep 4 11:16:36 node201 systemd: Starting Open-iSCSI...
Sep 4 11:16:36 node201 systemd: Started Open-iSCSI.
Sep 4 11:16:38 node201 iscsid: iscsid: connection1:0 is operational after recovery (1 attempts)
三、启动集群服务
[root@node201 ~]# pcs cluster start --all
node201: Starting Cluster (corosync)...
node202: Starting Cluster (corosync)...
node201: Starting Cluster (pacemaker)...
node202: Starting Cluster (pacemaker)...
# 查看集群服务状态
[root@node201 ~]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: node202 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
* Last updated: Wed Sep 4 11:19:51 2024
* Last change: Wed Sep 4 10:08:27 2024 by root via crm_node on node201
* 2 nodes configured
* 0 resource instances configured
Node List:
* Online: [ node201 node202 ]
Full List of Resources:
* No resources
[root@node201 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node202 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Wed Sep 4 11:19:58 2024
Last change: Wed Sep 4 10:08:27 2024 by root via crm_node on node201
2 nodes configured
0 resource instances configured
PCSD Status:
node201: Online
node202: Online
四、启动clvmd服务
1、所有节点启动dlm服务
[root@node201 ~]# systemctl start dlm
[root@node202 ~]# systemctl start dlm
2、启动clvmd服务
如下所示,当前节点启动clvmd服务正常:
[root@node201 ~]# systemctl start clvmd
[root@node201 ~]# systemctl status clvmd
● clvmd.service - LSB: This service is Clusterd LVM Daemon.
Loaded: loaded (/etc/rc.d/init.d/clvmd; bad; vendor preset: disabled)
Active: active (running) since Wed 2024-09-04 11:18:22 CST; 5s ago
Docs: man:systemd-sysv-generator(8)
Process: 3686 ExecStart=/etc/rc.d/init.d/clvmd start (code=exited, status=0/SUCCESS)
Main PID: 3696 (clvmd)
Tasks: 3
CGroup: /system.slice/clvmd.service
└─3696 /usr/sbin/clvmd -T30
Sep 04 11:18:21 node201 systemd[1]: Starting LSB: This service is Clusterd LVM Daemon....
Sep 04 11:18:22 node201 clvmd[3696]: Cluster LVM daemon started - connected to Corosync
Sep 04 11:18:22 node201 clvmd[3686]: Starting clvmd:
Sep 04 11:18:22 node201 clvmd[3686]: Activating VG(s): 1 logical volume(s) in volume group "gfsvg" now active
Sep 04 11:18:22 node201 clvmd[3686]: 3 logical volume(s) in volume group "centos" now active
Sep 04 11:18:22 node201 clvmd[3686]: [ OK ]
Sep 04 11:18:22 node201 systemd[1]: Started LSB: This service is Clusterd LVM Daemon..
Hint: Some lines were ellipsized, use -l to show in full.
# 集群另外节点
[root@node202 ~]# systemctl start clvmd
[root@node202 ~]# systemctl status clvmd
● clvmd.service - LSB: This service is Clusterd LVM Daemon.
Loaded: loaded (/etc/rc.d/init.d/clvmd; bad; vendor preset: disabled)
Active: active (running) since Wed 2024-09-04 11:59:37 CST; 5s ago
Docs: man:systemd-sysv-generator(8)
Process: 6954 ExecStart=/etc/rc.d/init.d/clvmd start (code=exited, status=0/SUCCESS)
Main PID: 6965 (clvmd)
Tasks: 3
Memory: 31.4M
CGroup: /system.slice/clvmd.service
└─6965 /usr/sbin/clvmd -T30
Sep 04 11:59:36 node202 systemd[1]: Starting LSB: This service is Clusterd LVM Daemon....
Sep 04 11:59:37 node202 clvmd[6965]: Cluster LVM daemon started - connected to Corosync
Sep 04 11:59:37 node202 clvmd[6954]: Starting clvmd:
Sep 04 11:59:37 node202 clvmd[6954]: Activating VG(s): 1 logical volume(s) in volume group "gfsvg" now active
Sep 04 11:59:37 node202 clvmd[6954]: 3 logical volume(s) in volume group "centos" now active
Sep 04 11:59:37 node202 clvmd[6954]: [ OK ]
Sep 04 11:59:37 node202 systemd[1]: Started LSB: This service is Clusterd LVM Daemon..
Hint: Some lines were ellipsized, use -l to show in full.
五、集群另外节点注册iscsi服务
1、连接访问iscsi服务
[root@node202 ~]# /usr/sbin/iscsiadm -m node -T iqn.2024-08.pip.cc:server -p 192.168.1.203 --login
Logging in to [iface: default, target: iqn.2024-08.pip.cc:server, portal: 192.168.1.203,3260] (multiple)
Login to [iface: default, target: iqn.2024-08.pip.cc:server, portal: 192.168.1.203,3260] successful.
# 磁盘信息
[root@node202 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 102.9G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 102.4G 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 3G 0 lvm [SWAP]
└─centos-home 253:2 0 49.3G 0 lvm /home
sdb 8:16 0 10.7G 0 disk
sdc 8:32 0 4G 0 disk
sdd 8:48 0 128M 0 disk
sde 8:64 0 512M 0 disk
sdf 8:80 0 128M 0 disk
└─sdf1 8:81 0 96M 0 part
sdg 8:96 0 2.2G 0 disk
sdh 8:112 0 8.3G 0 disk
sr0 11:0 1 1024M 0 rom
2、查看当前节点iscsid服务状态
如下所示,iscsid服务状态正常:
[root@node201 ~]# systemctl status iscsid
● iscsid.service - Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2024-09-04 11:16:36 CST; 5min ago
......
Main PID: 3153 (iscsid)
Status: "Ready to process requests"
Tasks: 1
CGroup: /system.slice/iscsid.service
└─3153 /sbin/iscsid -f
Sep 04 11:16:36 node201 systemd[1]: Stopped Open-iSCSI.
Sep 04 11:16:36 node201 systemd[1]: Starting Open-iSCSI...
Sep 04 11:16:36 node201 systemd[1]: Started Open-iSCSI.
Sep 04 11:16:38 node201 iscsid[3153]: iscsid: connection1:0 is operational after recovery (1 attempts)
# 系统message日志信息正常
[root@node201 ~]# tail /var/log/messages
Sep 4 11:18:21 node201 kernel: dlm: Using TCP for communications
Sep 4 11:18:22 node201 clvmd: Cluster LVM daemon started - connected to Corosync
Sep 4 11:18:22 node201 clvmd: Starting clvmd:
Sep 4 11:18:22 node201 clvmd: Activating VG(s): 1 logical volume(s) in volume group "gfsvg" now active
Sep 4 11:18:22 node201 clvmd: 3 logical volume(s) in volume group "centos" now active
Sep 4 11:18:22 node201 clvmd: [ OK ]
Sep 4 11:18:22 node201 systemd: Started LSB: This service is Clusterd LVM Daemon..
Sep 4 11:20:01 node201 systemd: Started Session 4 of user root.
六、总结
当集群节点作为iscsi client同时向iscsi server建立会话连接时,产生session冲突,导致iscsid服务状态异常,无法启动clvmd服务,可以将集群一个节点的session注销后,启动节点的clvmd服务后,重新登录访问iscsi server。