无法获取容器统计信息( Failed to get system container stats)
- 查看日志报错如下
[root@k8s-master ~]# tail -f /var/log/messages
Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
- 解决方法
vi /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
CPUAccounting=true ## 添加 CPUAccounting=true 选项,开启 systemd CPU 统计功能
MemoryAccounting=true ## 添加 MemoryAccounting=true 选项,开启 systemd Memory 统计功能
systemctl daemon-reload
systemctl restart kubele
CS不健康(Unhealthy)
- cs状态如下
[root@k8s-master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
- 解决方法(进入这两个配置文件把port=0都注释并保存即可开启scheduler, control-manager的10251,10252端口)
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml
#- --port=0
[root@k8s-master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
k8s安装CNI插件出现:dial tcp 10.96.0.1:443: i/o timeout
- 查看容器日志
[root@k8s-master ~]# kubectl logs -n kube-system calico-kube-controllers-848c5d445f-xpxrn
2023-07-27 20:56:59.010 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout
发现原因:我的Pod网段(因为只有一个calico Pod健康检查失败)与宿主机重叠了
- 解决方法:重装k8s集群并修改Pod网段
# 删除calico资源
kubectl delete -f calico.yaml
# 重置集群并删除残余文件(集群所有节点)
# kubeadm reset
# rm -rf /etc/kubernetes
# rm -rf /var/lib/etcd/
# rm -rf $HOME/.kube
# 修改配置文件
[root@k8s-master ~]# cat init-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.110.9 #改成你的Master节点ip地址
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-master
# taints: #污点可以直接注释
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
#imageRepository: k8s.gcr.io
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers #默认镜像仓库是国外的,改成阿里云的
kind: ClusterConfiguration
kubernetesVersion: v1.19.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/12 #这个不能根宿主机网段重叠
scheduler: {}
# 重新初始化master节点
[root@k8s-master ~]# kubeadm init --config=init-config.yaml
[root@k8s-master ~]# kubectl apply -f calico.yaml
- 查看Pod状态(发现全部Running)
[root@k8s-master ~]# kubectl get po -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-848c5d445f-8xk6m 1/1 Running 0 3m40s 10.255.156.65 k8s-node1 <none> <none>
kube-system calico-node-hdtpg 1/1 Running 0 3m40s 192.168.110.11 k8s-node2 <none> <none>
kube-system calico-node-jrldz 1/1 Running 0 3m40s 192.168.110.10 k8s-node1 <none> <none>
kube-system calico-node-xcbtg 1/1 Running 0 3m40s 192.168.110.9 k8s-master <none> <none>
kube-system coredns-6c76c8bb89-dbrmq 1/1 Running 0 14m 10.252.82.193 k8s-master <none> <none>
kube-system coredns-6c76c8bb89-qhs76 1/1 Running 0 14m 10.252.82.194 k8s-master <none> <none>
kube-system etcd-k8s-master 1/1 Running 0 14m 192.168.110.9 k8s-master <none> <none>
kube-system kube-apiserver-k8s-master 1/1 Running 0 14m 192.168.110.9 k8s-master <none> <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 0 14m 192.168.110.9 k8s-master <none> <none>
kube-system kube-proxy-6bgbx 1/1 Running 0 11m 192.168.110.11 k8s-node2 <none> <none>
kube-system kube-proxy-szxlq 1/1 Running 0 11m 192.168.110.10 k8s-node1 <none> <none>
kube-system kube-proxy-wjjrc 1/1 Running 0 14m 192.168.110.9 k8s-master <none> <none>
kube-system kube-scheduler-k8s-master 1/1 Running 0 14m 192.168.110.9 k8s-master <none> <none>
etcd集群部署启动不了服务
- 查看etcd集群是否健康
[root@k8s-master1~]# etcdctl--cacert=/etc/kubernetes/pki/ca.crt--cert=/etc/etcd/pki/etcd_client.crt--key=/etc/etcd/pki/etcd_client.key--endpoints=https://192.168.110.9:2380,https://192.168.110.10:2380,https://192.168.110.11:2380endpointhealth
{"level":"warn","ts":"2023-08-02T00:31:09.970+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-c05bba48-35ca-4dd8-97ca-ecfd8ffdc87b/192.168.110.11:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.11:2380:connect:connectionrefused\""}{"level":"warn","ts":"2023-08-02T00:31:09.970+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-879806a0-0f1f-4196-9c9e-71d6094b0294/192.168.110.9:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.9:2380:connect:connectionrefused\""}{"level":"warn","ts":"2023-08-02T00:31:09.971+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-562b2de9-28c9-4505-b29c-642c44b2993f/192.168.110.10:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.10:2380:connect:connectionrefused\""}https://192.168.110.11:2380isunhealthy:failedtocommitproposal:contextdeadlineexceededhttps://192.168.110.9:2380isunhealthy:failedtocommitproposal:contextdeadlineexceededhttps://192.168.110.10:2380isunhealthy:failedtocommitproposal:contextdeadlineexceeded
- 查看日志:发现对端url发现http不等于https,那么就是配置文件写错了
[root@k8s-master1 ~]# journalctl -ex --no-pager
--initial-cluster has etcd1=https://192.168.110.9:2380 but missing from --initial-advertise-peer-urls=http://192.168.110.9:2380 ("http://192.168.110.9:2380"(resolved from "http://192.168.110.9:2380") != "https://192.168.110.9:2380"(resolved from "https://192.168.110.9:2380"))
- 解决方法:修改配置文件(etcd所有节点)
# vim /etc/etcd/etcd.conf
ETCD_LISTEN_PEER_URLS=https://192.168.110.9:2380
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.110.9:2380
# 重启etcd服务
systemctl daemon-reload && systemctl restart etcd
标签:k8s,Kubernetes,排错,192.168,system,master,110.9,kube
From: https://www.cnblogs.com/AsahiNine/p/17602369.html