前提概要:该k8s集群为测试集群
故障报错1:
排障:
查询kube-apiserver服务状态:
可以看出cni使用了docker和cri-dockerd两种,所以涉及:unix:///run/containerd/containerd.sock unix:///var/run/cri-dockerd.sock两个
查询etcd服务状态:
etcd的数据文件损坏了,要做数据恢复,而我这是实验环境,没搞etcd备份就只能重置集群了
需要在每台机器上执行:
然后初始化集群:
在master节点执行:
[root@master ~]# kubeadm init --ignore-preflight-errors=SystemVerification --cri-socket unix:///var/run/cri-dockerd.sock I0409 07:02:10.310074 11794 version.go:256] remote version is much newer: v1.29.3; falling back to: stable-1.28 [init] Using Kubernetes version: v1.28.8 [preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher
如果遇到报错:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
解决办法:# echo "1">/proc/sys/net/bridge/bridge-nf-call-iptables
当出现上面信息时,表示init初始化成功,然后在node1、node2节点进行加入到节点:
[root@master ~]# mkdir -p $HOME/.kube [root@master ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@master ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config [root@master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master Ready control-plane 82s v1.28.2 node1 Ready <none> 17s v1.28.2 node2 Ready <none> 6s v1.28.2
但是执行kubectl apply -f kube-flannel.yml一直报错:
Error registering network: failed to acquire lease: node "master" pod cidr not assigned
报错信息是cidr没有分配,于是继续reset,然后init的时候加上该参数:
[root@master ~]# kubeadm init --ignore-preflight-errors=SystemVerification --cri-socket unix:///var/run/cri-dockerd.sock --service-cidr=10.96.0.0/16 --pod-network-cidr=10.244.0.0/16
然后node1、node2继续join:
[root@node1 ~]# kubeadm join 192.168.77.100:6443 --token caidkf.z5ygdrotujz09y1z \ > --discovery-token-ca-cert-hash sha256:a753ca9b794a43912b3bfca5e52a788ca222e672a3630879b585f2eb841fc65e --cri-socket unix:///var/run/cri-dockerd.sock [root@node2 ~]# kubeadm join 192.168.77.100:6443 --token caidkf.z5ygdrotujz09y1z \ > --discovery-token-ca-cert-hash sha256:a753ca9b794a43912b3bfca5e52a788ca222e672a3630879b585f2eb841fc65e --cri-socket unix:///var/run/cri-dockerd.sock
然后master节点做一些config配置:
[root@master ~]# rm -rf /root/.kube/* [root@master ~]# mkdir -p $HOME/.kube [root@master ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@master ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
集群状态,pod状态如下:
[root@master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master Ready control-plane 59s v1.28.2 node1 Ready <none> 22s v1.28.2 node2 Ready <none> 27s v1.28.2
如果上述过程中出现如下报错:
[root@master ~]# kubectl get nodes Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") [root@master ~]# kubectl get nodes Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
解决办法:
先删除config缓存,然后创建新的:
[root@master ~]# rm -rf /root/.kube/* [root@master ~]# mkdir -p $HOME/.kube [root@master ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@master ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
参考文章:
https://zhuanlan.zhihu.com/p/646238661
https://blog.csdn.net/qq_40460909/article/details/114707380
https://blog.csdn.net/qq_21127151/article/details/120929170
标签:bridge,cri,kubernetes,--,故障,集群,master,kube,root From: https://www.cnblogs.com/jsonhc/p/18124777