机器角色:cloudstack虚拟机的宿主机;ceph存储机器。
事件:ceph存储的物理机器由于内存异常,需要停机更换,仅仅是把该物理机上面的虚拟机迁移走,同时启动了停机维护,然后就直接关机。结果造成重启之后ceph异常
原因:由于异常关闭,ceph进程的相关信息没有正常关闭,信息没有同步到文件系统,如pid文件等信息
现象并尝试解决:
1)检查osd的整体信息
[root@haha1~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 101.91998 root default
-2 25.48000 host haha-50
1 3.64000 osd.1 up 1.00000 1.00000
2 3.64000 osd.2 up 1.00000 1.00000
3 3.64000 osd.3 up 1.00000 1.00000
4 3.64000 osd.4 up 1.00000 1.00000
5 3.64000 osd.5 up 1.00000 1.00000
6 3.64000 osd.6 up 1.00000 1.00000
0 3.64000 osd.0 up 1.00000 1.00000
-3 25.48000 host XKDHhost1-51
7 3.64000 osd.7 up 1.00000 1.00000
9 3.64000 osd.9 up 1.00000 1.00000
10 3.64000 osd.10 down 0 1.00000
11 3.64000 osd.11 down 0 1.00000
12 3.64000 osd.12 up 1.00000 1.00000
13 3.64000 osd.13 up 1.00000 1.00000
2)osd显示的是down,但是通过
[root@haha1 ~]# /etc/init.d/ceph status osd.11
=== osd.11 ===
osd.11: running {"version":"0.94.2"}
3)重启osd.11尝试解决
[root@haha1 ~]# /etc/init.d/ceph restart osd.11
=== osd.11 ===
=== osd.11 ===
Stopping Ceph osd.11 on haha1...kill 7330...kill 7330...done #有kill,可以正常重启
=== osd.11 ===
create-or-move updated item name 'osd.11' weight 3.64 at location {host=XKDHhost1-51,root=default} to crush map
Starting Ceph osd.11 on haha1...
Running as unit run-35058.service.
4)osd.10启动异常
[root@haha1 ~]# /etc/init.d/ceph start osd.10
=== osd.10 ===
create-or-move updated item name 'osd.10' weight 3.64 at location {host=haha1,root=default} to crush map
Starting Ceph osd.10 on haha1...
Running as unit run-36525.service.
[root@haha1 ~]# /etc/init.d/ceph status osd.10
=== osd.10 ===
osd.10: not running.
s=a>create-or-move updated item name 'osd.11' weight 3.64 at location {root=default} to crush map
Starting Ceph osd.11 on haha1...
Running as unit run-35058.service.
用一个例子来演示会更加清晰