OSD自然OUT之后无法再加入集群
企业云平台产品中心共享知识库
Exported on 03/08/2021
Table of Contents
相关下载链接:
OSD自然OUT之后无法再加入集群.pdf1
- - - - - - - - 这是一条华丽的分割线 - - - - - - - -
1 https://iwiki.woa.com/download/attachments/119527381/ OSD%E8%87%AA%E7%84%B6OUT%E4%B9%8B%E5%90%8E%E6%97%A0%E6%B3%95%E5%86%8D%E5%8A%A0%E5%85%
A5%E9%9B%86%E7%BE%A4.pdf?api=v2&modificationDate=1586334532000&version=1
问题描述
osd.1 用实际环境启动失败的osd代替
OSD服务器关机之后时间较长导致osd被out出去,再次加入时osd日志
/var/log/ceph/ceph-osd.1.log报错如下:
2020-04-03 10:36:33.740785 7fcf4dab4d80 0 osd.1 74 crush map has features 288514051259236352, adjusting msgr requires for clients
2020-04-03 10:36:33.740792 7fcf4dab4d80 0 osd.1 74 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons
2020-04-03 10:36:33.740796 7fcf4dab4d80 0 osd.1 74 crush map has features 1009089991638532096, adjusting msgr requires for osds
2020-04-03 10:36:33.899665 7fcf4dab4d80 0 osd.1 74 load_pgs
2020-04-03 10:36:37.564629 7fcf4dab4d80 0 osd.1 74 load_pgs opened 6001 pgs
2020-04-03 10:36:37.566801 7fcf4dab4d80 0 osd.1 74 using weightedpriority op queue with priority op cut off at 64.
2020-04-03 10:36:37.568280 7fcf4dab4d80 -1 osd.1 74 log_to_monitors {default=true}
2020-04-03 10:36:39.973545 7fcf4dab4d80 -1 osd.1 74 init authentication failed: (22)
Invalid argument
关键信息 init authentication failed: (22) Invalid argument
将mon的auth debug级别设置到10/10可以看到如下信息
ceph daemon mon.$HOSTNAME config set debug_auth 20/20
mon认证报错日志
2020-04-04 12:22:08.233215 7f4322b4e700 10 In get_auth_session_handler for protocol
0
2020-04-04 12:22:08.235600 7f4326b56700 10 cephx server osd.1: start_session server_challenge 1d0a94ecfe0b5ec9
2020-04-04 12:22:08.239489 7f4326b56700 10 cephx server osd.1: handle_request get_auth_session_key for osd.1
2020-04-04 12:22:08.239532 7f4326b56700 0 mon.openstack-con01@0(leader).auth v210 caught error when trying to handle auth request, probably malformed request
问题原因
gdb 过程不做说明
gdb调试可知 osd keyring配置错误
解决方法
注释ceph集群服务器上/etc/ceph/ceph.conf [global] 节,重新启动OSD即可
keyring=/etc/ceph/ceph.client.admin.keyring systemctl restart ceph-osd@1
ceph osd tree|grep osd.1
# 输出如下,up后面的字段即为in状态,如果为0或者为空则为非in状态
1 hdd 2.49750 osd.1 up 1.00000 1.00000
等待一段时间,ceph osd tree 看到这个 osd 为 up 且为in则无须做其他操作如果为up 但没有in 执行如下命令即可
ceph osd in osd.1
验证步骤
systemctl status ceph-osd@1 # 状态为
ceph osd tree
# 可以看到该osd节点状态为 up 且 in
标签:10,04,v1,osd.1,03,20210308,ceph,124828,osd From: https://www.cnblogs.com/xuning-xuning/p/17351965.html