背景:
由于课程代码都是基于amd64架构进行编写的,这将导致我的主力机arm64架构机器无法顺利进行实验内容,因此我得在x64的机器上进行实验内容,先是需要搭建K8S环境,此处省略搭建步骤,在我进行kubeadm init操作后,发现镜像拉取一直不成功,镜像地址我写的是默认从K8S官方地址拉取镜像的(这里提一下为什么不写国内镜像地址的原因,原因在于国内镜像仓库更新速度过慢,有时候拉取一些images时会找不到),于是我在我的宿主机开启了代理启用了全局代理模式,发现我的K8S集群仍是无法拉取镜像,提示TimeOut。
于是,我将流量转发配置写在了containerd.service文件内,如以下所示:
root@Y76-Master01-16-181:~# cat /usr/lib/systemd/system/containerd.service # Copyright The containerd Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 # 添加以下三行 Environment="HTTPS_PROXY=http://172.164.17.103:9999" Environment="HTTP_PROXY=http://172.164.17.103:9999" Environment="ALL_PROXY=socks5://172.164.17.103:9999" # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=infinity # Comment TasksMax if your systemd version does not supports it. # Only systemd 226 and above support this version. TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target
添加代理后,再次进行kubeadm init 操作后,K8S的镜像能顺利被拉到本地中,即通过 crictl命令查看
root@Y76-Master01-16-181:~# crictl -r unix:///var/run/containerd/containerd.sock images
此处,集群组件状态一切正常,都处于Runing状态,于是我进行calico部署,calico-node Pod都处于Runing状态,唯独calico-kube-controllers Pod一直处于创建中,查看Pod详细信息
root@Y76-Master01-16-181:~# kubectl describe pod -n kube-system calico-kube-controllers-9449c44c5-v8ssv Normal Scheduled 72s default-scheduler Successfully assigned kube-system/calico-kube-controllers-57b57c56f-wz4wm to y76-node01-16-182 Warning FailedCreatePodSandBox 52s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c0805304ad1009d138d00cad8b5a4d9ddfdd27b8d6a8a886d4df4690cace4452": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": net/http: TLS handshake timeout Normal SandboxChanged 5s (x3 over 51s) kubelet Pod sandbox changed, it will be killed and re-created.
到此处,我进行了一系列排查,但没能解决calico-kube-controllers 状态问题,即使我创建新的Pod也是无法成功创建出来,报错如上图一致,当百思不得其解时,我将虚拟机都还原成原先的快照,填写了国内的镜像地址后进行kubeadm init时,能成功了将所有组件的Pod都Runing起来
思考:
仅仅是镜像地址不同,但却是两个结果,这不应该。我联想到了我一开始的proxy代理操作(即在containerd.service配置了代理),于是,我将虚拟机再次还原快照,重新填写K8S官方的镜像仓库地址,再次进行kubeadm 初始化时,遇到了同样问题,我将containrd.service的配置改成如下内容:
root@Y76-Master01-16-181:~# cat /usr/lib/systemd/system/containerd.service # Copyright The containerd Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 # 优化成以下内容 Environment="HTTPS_PROXY=http://172.164.17.103:9999" Environment="HTTP_PROXY=http://172.164.17.103:9999" Environment="NO_PROXY=localhost,127.0.0.1,172.16.0.0/12,10.96.0.0/12,10.244.0.0/16" # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=infinity # Comment TasksMax if your systemd version does not supports it. # Only systemd 226 and above support this version. TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target
随时将配置同步给其他节点且重启containerd服务
root@Y76-Master01-16-181:~# ansible all -m copy -a "src=/usr/lib/systemd/system/containerd.service dest=/usr/lib/systemd/system/containerd.service" root@Y76-Master01-16-181:~# ansible all -m shell -a "systemctl daemon-reload && systemctl restart containerd.service "
果不其然,Pod状态一切正常
root@Y76-Master01-16-181:~# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-9449c44c5-v8ssv 1/1 Running 0 92m calico-node-97qbc 1/1 Running 3 (38m ago) 6h1m calico-node-bl59h 1/1 Running 2 (178m ago) 6h1m calico-node-rzzq7 1/1 Running 2 (178m ago) 6h1m coredns-567c556887-8knp9 1/1 Running 3 (51m ago) 8h coredns-567c556887-dwg6d 1/1 Running 2 (178m ago) 8h etcd-y76-master01-16-181 1/1 Running 3 (178m ago) 8h kube-apiserver-y76-master01-16-181 1/1 Running 3 (178m ago) 8h kube-controller-manager-y76-master01-16-181 1/1 Running 6 (46m ago) 5h46m kube-proxy-88nd6 1/1 Running 2 (178m ago) 5h47m kube-proxy-vrgtp 1/1 Running 2 (178m ago) 5h47m kube-proxy-z5jmc 1/1 Running 2 (178m ago) 5h47m kube-scheduler-y76-master01-16-181 1/1 Running 6 (46m ago) 8h
总结:
在进行排错时,应当回想操作过程中自己执行了哪些操作,再排查问题时,应当细究自己做的操作会有怎样的影响,例如此次操作,我将proxy代理给了宿主机,这意味着我的Pod会把流量转发给宿主机,通过宿主机进行通信,而Pod要通信的对端IP地址正是我定义的Pod网段(10.96.0.0/12,10.244.0.0/16),这通过宿主机进行通信肯定是找不到对端的。
标签:License,16,containerd,controllers,Running,kube,calico From: https://www.cnblogs.com/Ky150/p/18288133