ceph 集群维护
http://docs.ceph.org.cn/rados/ ceph 集群配置、部署与运维
通过套接字进行单机管理
在ceph的节点上使用socket管理只针对ceph的节点单机管理并不会对所有节点生效
node 节点:
root@ceph-node1:~# ll /var/run/ceph/
total 0
drwxrwx--- 2 ceph ceph 140 Nov 13 10:16 ./
drwxr-xr-x 25 root root 840 Nov 13 10:17 ../
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-osd.0.asok=
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-osd.1.asok=
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-osd.2.asok=
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-osd.3.asok=
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-osd.4.asok=
mon节点
root@ceph-mon1:~# ll /var/run/ceph/
total 0
drwxrwx--- 2 ceph ceph 60 Nov 13 10:16 ./
drwxr-xr-x 25 root root 840 Nov 13 10:18 ../
srwxr-xr-x 1 ceph ceph 0 Nov 13 10:16 ceph-mon.ceph-mon1.asok=
可在 node 节点或者 mon 节点通过 ceph 命令进行单机管理本机的 mon 或者 osd 服务
先将 admin 认证文件同步到 mon 或者 node 节点:
cephadmin@ceph-deploy:~/ceph-cluster$ pwd
/home/cephadmin/ceph-cluster
cephadmin@ceph-deploy:~/ceph-cluster$ scp ceph.client.admin.keyring root@172.16.100.31:/etc/ceph
查看ceph --admin-socket 查看osd的socket 使用帮助
root@ceph-node1:~# ceph --admin-socket /var/run/ceph/ceph-osd.4.asok --help|less
在 mon 节点获取 --admin-daemon 服务帮助
root@ceph-mon1:~# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok help
查看mon状态
root@ceph-mon1:~# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok mon_status
查看mon配置信息
root@ceph-mon1:~# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok config show | less
ceph 集群的停止或重启
OSD的维护
重启之前,要提前设置 ceph 集群不要将 OSD 标记为 out,以及将backfill和recovery设置为no,避免 node 节点关闭服务后osd被踢出 ceph 集群外,以及存储池进行修复数据,等待节点维护完成后,再将所有标记取消设置。
cephadmin@ceph-deploy:~/ceph-cluster$ ceph osd set noout
noout is set
cephadmin@ceph-deploy:~$ ceph osd set norecover
norecover is set
cephadmin@ceph-deploy:~$ ceph osd set nobackfill
nobackfill is set
当ceph的节点恢复时,就是用unset取消标记,使集群的osd开始重新服务,并开始修复数据。
cephadmin@ceph-deploy:~/ceph-cluster$ ceph osd unset noout
noout is unset
cephadmin@ceph-deploy:~/ceph-cluster$ ceph osd unset nobackfill
nobackfill is unset
cephadmin@ceph-deploy:~/ceph-cluster$ ceph osd unset norecover
norecover is unset
ceph集群服务停机关闭顺序
1、确保ceph集群当前为noout、nobackfill、norecover状态
2、关闭存储客户端停止读写数据
3、如果使用了 RGW,关闭 RGW
4、关闭 cephfs 元数据服务
5、关闭 ceph OSD
6、关闭 ceph manager
7、关闭 ceph monitor
ceph集群启动顺序
启动 ceph monitor
启动 ceph manager
启动 ceph OSD
关闭 cephfs 元数据服务
启动 RGW
启动存储客户端
启动服务后取消 noout-->ceph osd unset noout
添加节点服务器
0、查看当前ceph osd数量
当前osd数量为20个
1、ceph-deploy 添加新节点 node5 的 hosts 主机名解析,并分发到所有ceph节点
root@ceph-deploy:~# vim /etc/hosts
172.16.100.40 ceph-deploy.example.local ceph-deploy
172.16.100.31 ceph-node1.example.local ceph-node1
172.16.100.32 ceph-node2.example.local ceph-node2
172.16.100.33 ceph-node3.example.local ceph-node3
172.16.100.34 ceph-node4.example.local ceph-node4
172.16.100.41 ceph-node5.example.local ceph-node5
172.16.100.35 ceph-mon1.example.local ceph-mon1
172.16.100.36 ceph-mon2.example.local ceph-mon2
172.16.100.37 ceph-mon3.example.local ceph-mon3
172.16.100.38 ceph-mgr1.example.local ceph-mgr1
172.16.100.39 ceph-mgr2.example.local ceph-mgr2
2、node05 添加仓库源
#支持 https 镜像仓库源:
root@ceph-node5:~# apt install -y apt-transport-https ca-certificates curl software-properties-common
#导入 key:
root@ceph-node5:~# wget -q -O- 'https://mirrors.tuna.tsinghua.edu.cn/ceph/keys/release.asc' | sudo apt-key add -
OK
#添加ceph 16.x pacific版本仓库地址
root@ceph-node5:~# apt-add-repository 'deb https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific/ bionic main'
#更新apt仓库
root@ceph-node5:~# apt update
3、node05节点安装 python2环境
root@ceph-node5:~# apt install python-pip
配置pip加速
root@ceph-node1:~# mkdir .pip
root@ceph-node1:~# vim .pip/pip.conf
[global]
index-url = https://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host=mirrors.aliyun.com
4、node5节点创建cephadmin用户
root@ceph-node5:~# groupadd -r -g 2022 cephadmin && useradd -r -m -s /bin/bash -u 2022 -g 2022 cephadmin && echo cephadmin:123456 | chpasswd
添加cephadmin用户sudo权限
root@ceph-node5:~# echo "cephadmin ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
5、配置 ceph-deploy 节点免秘钥登录 node5节点
cephadmin@ceph-deploy:~$ ssh-copy-id cephadmin@ceph-node5
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/cephadmin/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
6、ceph-deploy添加node5到集群
cephadmin@ceph-deploy:~/ceph-cluster$ ceph-deploy install --no-adjust-repos --nogpgcheck ceph-node5
7、将admin秘钥推送到node5节点
cephadmin@ceph-deploy:~/ceph-cluster$ ceph-deploy admin ceph-node5
node5节点验证admin秘钥
8、为node5安装ceph基础运行环境
osd数据磁盘擦除前必须在node节点安装ceph运行环境
cephadmin@ceph-deploy:~/ceph-cluster$ ceph-deploy install --release pacific ceph-node5
移除所有多余不可用的包组
root@ceph-node5:~# apt autoremove
列出所有node5节点所有的数据磁盘
标签:deploy,root,ceph,命令,集群,cephadmin,node5,节点 From: https://www.cnblogs.com/punchlinux/p/17061830.html