环境
操作系统:centos7.9.2009
集群架构:三个节点,一主两从,k8s版本v1.21.5,kubesphere安装的集群,应该算是kubeadm部署的集群
ip:192.168.106.130,192.168.106.131,192.168.106.132
集群状态:3个节点证书过期,全都挂掉
这是我2022年在虚拟机装的集群,现在时间是2024年3月29日
报错信息
[root@master ~]# kubectl get nodes
The connection to the server 192.168.123.130:6443 was refused - did you specify the right host or port?
查看系统日志less /var/log/messages
Mar 29 11:27:25 master systemd: Started Kubernetes systemd probe.
Mar 29 11:27:25 master kubelet: I0329 11:27:25.394468 5881 server.go:440] "Kubelet version" kubeletVersion="v1.21.5"
Mar 29 11:27:25 master kubelet: I0329 11:27:25.394653 5881 server.go:851] "Client rotation is on, will bootstrap in background"
Mar 29 11:27:25 master kubelet: E0329 11:27:25.395672 5881 bootstrap.go:265] part of the existing bootstrap client certificate in /etc/kubernetes/kube
let.conf is expired: 2023-08-28 17:29:43 +0000 UTC
Mar 29 11:27:25 master kubelet: E0329 11:27:25.395709 5881 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: unable to load bootstrap
kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory"
Mar 29 11:27:25 master systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
Mar 29 11:27:25 master systemd: Unit kubelet.service entered failed state.
Mar 29 11:27:25 master systemd: kubelet.service failed.
证书已经在2023-08-28过期
查看证书时间确认一下 kubeadm certs check-expiration
证书更新
- 先更新证书以启动kubelet
kubeadm certs renew all
输出如下
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
- 以上输出最后提示需要restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd,,在此之前先看看证书是否更新
kubeadm certs check-expiration
[root@master ~]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0329 17:20:26.792993 60459 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [16
9.254.25.10]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Mar 29, 2025 08:39 UTC 364d no
apiserver Mar 29, 2025 08:37 UTC 364d ca no
apiserver-kubelet-client Mar 29, 2025 08:37 UTC 364d ca no
controller-manager.conf Mar 29, 2025 08:39 UTC 364d no
front-proxy-client Mar 29, 2025 08:37 UTC 364d front-proxy-ca no
scheduler.conf Mar 29, 2025 08:39 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Aug 25, 2032 17:29 UTC 8y no
front-proxy-ca Aug 25, 2032 17:29 UTC 8y no
- 可以看到已经更新了,但kubelet的配置文件等等这些还是使用的旧证书,因此,此时的kubelet等服务还是不能启动的状态,因此,这个时候需要删除这些服务的配置文件,使用kubeadm重新生成这些文件
rm -rf /etc/kubernetes/*.conf
kubeadm init phase kubeconfig all
root@master:~# rm -rf /etc/kubernetes/*.conf
root@master:~# kubeadm init phase kubeconfig all
I1212 23:35:49.775848 19629 version.go:255] remote version is much newer: v1.26.0; falling back to: stable-1.22
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
- 然后就可以重启kubelet了,
systemctl restart kubelet
[root@master ~]# systemctl restart kubelet
[root@master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 五 2024-03-29 17:24:56 CST; 4s ago
Docs: http://kubernetes.io/docs/
Main PID: 884 (kubelet)
Tasks: 13
Memory: 34.8M
CGroup: /system.slice/kubelet.service
└─884 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --conf...
- 更新admin配置,将新生成的 admin.conf 文件拷贝,替换 ~/.kube 目录下的 config 文件
cp /etc/kubernetes/admin.conf ~/.kube/config
- 查看节点状态
kubectl get nodes
此时master节点就更新完成了,接下来就是工作节点
---工作节点
解决方案为:由于整个集群是kubeadm搭建的,而etcd是静态pod 形式存在在master节点的,因此,master节点恢复后,确认etcd正常后,工作节点重新加入集群即可
- 删除工作节点
kubectl delete nodes 工作节点
root@master:~# kubectl delete nodes worker1
node "worker1" deleted
root@master:~# kubectl delete nodes worker2
node "worker2" deleted
- 生成加入的命令
kubeadm token create --print-join-command
root@master:~# kubeadm token create --print-join-command
kubeadm join 192.168.106.130:6443 --token 6vzr7y.mtrs8arvtt6xo6lg --discovery-token-ca-cert-hash sha256:3c816d1b3c2c8a54087876a31e2936b
6b5cc247c0328feb12098e939cfea7467
- 在工作节点重设节点
kubeadm reset -f
(131和132节点操作)
[root@worker1 ~]# kubeadm reset -f
[preflight] Running pre-flight checks
W0329 16:49:24.099865 23892 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manage
r.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
- 重新加入集群
kubeadm join 192.168.106.130:6443 --token 6vzr7y.mtrs8arvtt6xo6lg --discovery-token-ca-cert-hash sha256:3c816d1b3c2c8a54087876a31e2936b 6b5cc247c0328feb12098e939cfea7467
[root@worker1 ~]# kubeadm join 192.168.106.130:6443 --token 6vzr7y.mtrs8arvtt6xo6lg --discovery-token-ca-cert-hash sha256:3c816d1b3c2c8a54087876a31e2936b
6b5cc247c0328feb12098e939cfea7467
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0329 16:51:16.036841 23959 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [16
9.254.25.10]
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
- 此时,回到master节点,查看节点状态,可以看到恢复正常了
kubectl get nodes
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 578d v1.21.5
worker1 Ready <none> 3m27s v1.21.5
worker2 Ready <none> 3m21s v1.21.5
标签:证书,29,更新,kubeconfig,kubelet,master,conf,kubeadm,k8s
From: https://www.cnblogs.com/wszzn/p/18141114