K8S集群启动失败
一、问题现象
问题的起因:来源于大年初二的停电
上班后第一天:开始重启所有服务,就发现了k8s集群无法启动了。。
[root@test ~]# kubectl get nodes
The connection to the server 10.0.7.16:6443 was refused - did you specify the right host or port?
二、解决思路
###查看kubelet的状态
[root@test ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 一 2024-02-12 08:47:31 CST; 5 days ago
Docs: https://kubernetes.io/docs/
Main PID: 980 (kubelet)
CGroup: /system.slice/kubelet.service
└─980 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/con...
###查看日志
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.760521 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.861049 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.961809 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.062716 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.163402 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.264104 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.364707 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.465786 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.566598 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.667122 980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
###查看防火墙
[root@test ~]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
###最后在日志里发现下面这段的错误,提示证书过期了。。。。由于这套集群是之前同事搭建的,未做记录,故不知。。又碰巧遇到了这次停电
导致集群启动失败。
W0217 05:51:33.036279 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 0 }. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2024-02-17T05:51:33Z is after 2024-01-12T09:25:13Z". Reconnecting...
###最后延长证书期限,并记录运维文档
https://blog.csdn.net/gotheon/article/details/133700695
标签:11,node,启动,980,40,kubelet,集群,test,k8s
From: https://www.cnblogs.com/world-of-yuan/p/18027174