首页 > 其他分享 >Kubernetes排错

Kubernetes排错

时间:2023-08-29 22:45:13浏览次数:41  
标签:k8s Kubernetes 排错 192.168 system master 110.9 kube

无法获取容器统计信息( Failed to get system container stats)

  • 查看日志报错如下
[root@k8s-master ~]# tail -f /var/log/messages
Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
  • 解决方法
vi /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
CPUAccounting=true              ## 添加 CPUAccounting=true 选项,开启 systemd CPU 统计功能
MemoryAccounting=true           ## 添加 MemoryAccounting=true 选项,开启 systemd Memory 统计功能  

systemctl daemon-reload
systemctl restart kubele

CS不健康(Unhealthy)

  • cs状态如下
[root@k8s-master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0               Healthy     {"health":"true"}
  • 解决方法(进入这两个配置文件把port=0都注释并保存即可开启scheduler, control-manager的10251,10252端口)
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
[root@k8s-master ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml
#- --port=0

[root@k8s-master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true"}

k8s安装CNI插件出现:dial tcp 10.96.0.1:443: i/o timeout

  • 查看容器日志
[root@k8s-master ~]# kubectl logs -n kube-system calico-kube-controllers-848c5d445f-xpxrn
2023-07-27 20:56:59.010 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout

发现原因:我的Pod网段(因为只有一个calico Pod健康检查失败)与宿主机重叠了

  • 解决方法:重装k8s集群并修改Pod网段
# 删除calico资源
kubectl delete -f calico.yaml

# 重置集群并删除残余文件(集群所有节点)
# kubeadm reset
# rm -rf /etc/kubernetes
# rm -rf /var/lib/etcd/
# rm -rf $HOME/.kube

# 修改配置文件
[root@k8s-master ~]# cat init-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.110.9      #改成你的Master节点ip地址
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: k8s-master                          
#  taints:                 #污点可以直接注释
#  - effect: NoSchedule
#    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
#imageRepository: k8s.gcr.io
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers   #默认镜像仓库是国外的,改成阿里云的
kind: ClusterConfiguration
kubernetesVersion: v1.19.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/12                                           #这个不能根宿主机网段重叠
scheduler: {}

# 重新初始化master节点
[root@k8s-master ~]# kubeadm init --config=init-config.yaml
[root@k8s-master ~]# kubectl apply -f calico.yaml
  • 查看Pod状态(发现全部Running)
[root@k8s-master ~]# kubectl get po -A -owide   
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-848c5d445f-8xk6m   1/1     Running   0          3m40s   10.255.156.65    k8s-node1    <none>           <none>
kube-system   calico-node-hdtpg                          1/1     Running   0          3m40s   192.168.110.11   k8s-node2    <none>           <none>
kube-system   calico-node-jrldz                          1/1     Running   0          3m40s   192.168.110.10   k8s-node1    <none>           <none>
kube-system   calico-node-xcbtg                          1/1     Running   0          3m40s   192.168.110.9    k8s-master   <none>           <none>
kube-system   coredns-6c76c8bb89-dbrmq                   1/1     Running   0          14m     10.252.82.193    k8s-master   <none>           <none>
kube-system   coredns-6c76c8bb89-qhs76                   1/1     Running   0          14m     10.252.82.194    k8s-master   <none>           <none>
kube-system   etcd-k8s-master                            1/1     Running   0          14m     192.168.110.9    k8s-master   <none>           <none>
kube-system   kube-apiserver-k8s-master                  1/1     Running   0          14m     192.168.110.9    k8s-master   <none>           <none>
kube-system   kube-controller-manager-k8s-master         1/1     Running   0          14m     192.168.110.9    k8s-master   <none>           <none>
kube-system   kube-proxy-6bgbx                           1/1     Running   0          11m     192.168.110.11   k8s-node2    <none>           <none>
kube-system   kube-proxy-szxlq                           1/1     Running   0          11m     192.168.110.10   k8s-node1    <none>           <none>
kube-system   kube-proxy-wjjrc                           1/1     Running   0          14m     192.168.110.9    k8s-master   <none>           <none>
kube-system   kube-scheduler-k8s-master                  1/1     Running   0          14m     192.168.110.9    k8s-master   <none>           <none>

etcd集群部署启动不了服务

  • 查看etcd集群是否健康
[root@k8s-master1~]# etcdctl--cacert=/etc/kubernetes/pki/ca.crt--cert=/etc/etcd/pki/etcd_client.crt--key=/etc/etcd/pki/etcd_client.key--endpoints=https://192.168.110.9:2380,https://192.168.110.10:2380,https://192.168.110.11:2380endpointhealth
{"level":"warn","ts":"2023-08-02T00:31:09.970+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-c05bba48-35ca-4dd8-97ca-ecfd8ffdc87b/192.168.110.11:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.11:2380:connect:connectionrefused\""}{"level":"warn","ts":"2023-08-02T00:31:09.970+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-879806a0-0f1f-4196-9c9e-71d6094b0294/192.168.110.9:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.9:2380:connect:connectionrefused\""}{"level":"warn","ts":"2023-08-02T00:31:09.971+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retryingofunaryinvokerfailed","target":"endpoint://client-562b2de9-28c9-4505-b29c-642c44b2993f/192.168.110.10:2380","attempt":0,"error":"rpcerror:code=DeadlineExceededdesc=latestbalancererror:allSubConnsareinTransientFailure,latestconnectionerror:connectionerror:desc=\"transport:Errorwhiledialingdialtcp192.168.110.10:2380:connect:connectionrefused\""}https://192.168.110.11:2380isunhealthy:failedtocommitproposal:contextdeadlineexceededhttps://192.168.110.9:2380isunhealthy:failedtocommitproposal:contextdeadlineexceededhttps://192.168.110.10:2380isunhealthy:failedtocommitproposal:contextdeadlineexceeded
  • 查看日志:发现对端url发现http不等于https,那么就是配置文件写错了
[root@k8s-master1 ~]# journalctl -ex --no-pager
--initial-cluster has etcd1=https://192.168.110.9:2380 but missing from --initial-advertise-peer-urls=http://192.168.110.9:2380 ("http://192.168.110.9:2380"(resolved from "http://192.168.110.9:2380") != "https://192.168.110.9:2380"(resolved from "https://192.168.110.9:2380"))
  • 解决方法:修改配置文件(etcd所有节点)
# vim /etc/etcd/etcd.conf
ETCD_LISTEN_PEER_URLS=https://192.168.110.9:2380
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.110.9:2380

# 重启etcd服务
systemctl daemon-reload && systemctl restart etcd

标签:k8s,Kubernetes,排错,192.168,system,master,110.9,kube
From: https://www.cnblogs.com/AsahiNine/p/17602369.html

相关文章

  • CoreDNS之光:Kubernetes中的服务发现策略
     原创 云原生百宝箱 云原生百宝箱 2023-08-2121:35 发表于河南收录于合集#Kubernetes21个#CNCF生态4个点击上方蓝字 ......
  • Kubernetes
    一、简介   k8s、Google开源。容器编排引擎。   k8s的目标是让部署容器化的应用简单且高效【powerful】,k8s提供应用部署,规划,更新,维护的一种机制。   支持自动化部署、大规模可伸缩。应用容器化管理。二、组成    一个k8s系统,通常称为一个k8s集群【......
  • Kubernetes集群部署三节点yum部署
    一、修改主机名(三个节点都操作)vi/etc/hostname#跟换主机名systemctlrestartsystemd-hostnamed#修改完成后重新链接服务二、同步时间(三个节点都操作)yuminstallntpdate-yntpdatetime.windows.com#时区设置cp/usr/share/zoneinfo/Asia/Shanghai/etc/localtime#最......
  • Kubernetes三主两从集群搭建
    安装前必读请不要使用带中文的服务器和克隆的虚拟机生产环境建议使用二进制安装方式请将该文档复制一份,然后进行更改安装,并记录每一个步骤的返回信息,有问题可以直接发送部署文档进行问答,解决更加迅kubeadm高可用安装k8s集群最新版基本环境配置Kubeadm安装方式自1.14版本以后,安装方......
  • Kubernetes编程—— 如何操作自定义资源
    如何操作自定义资源client-go为每种kubernetes内置资源提供对应的clientset和informer。那如果我们要监听和操作自定义资源对象,应该如何做呢?这里我们有两种方式:我理解意思是说:1、使用client-go提供的dynamicClient来操作自定义操作资源对象,当......
  • Kubernetes 对接 GlusterFS 磁盘扩容实战
    前言知识点定级:入门级使用HeketiTopology扩容磁盘使用HeketiCLI扩容磁盘实战服务器配置(架构1:1复刻小规模生产环境,配置略有不同)主机名IPCPU内存系统盘数据盘用途ks-master-0192.168.9.912450100KubeSphere/k8s-masterks-master-1192.1......
  • kubernetes 指标监控 metrics-server 的配置
    kubernetes指标监控metrics-server的配置 apiVersion:v1kind:ServiceAccountmetadata:labels:k8s-app:metrics-servername:metrics-servernamespace:kube-system---apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:labels......
  • kubernetes client-go快速入门及源码阅读
    client-go是kubernetes官方维护的一个go语言客户端,用于与k8s集群交互,使用client-go可以很方便的完成k8s的二次开发(似乎也必不可少),无论是稳定性还是健壮性都有充分的保障。client-go代码版本:v0.20.2个人水平有些,一定会出现不严谨或者错误的地方,如有错误麻烦评论指正,谢谢版......
  • 关于Kubernetes-v1.23.6-集群测试-创建一个nginx的deployment进行验证
    关于k8s集群环境搭建完成后,我们可以通过创建一个deployment进行效果的测试这里以nginx为例,还是在k8s-master上进行创建kubectlcreatedeploymentnginx--image=nginxkubectlexposedeploymentnginx--port=80--type=NodePort这里--port只是指定了容器(container )暴......
  • 如何基于 Kubernetes 实现优质开发者平台体验?
    内部开发者平台(或IDP)是使开发团队能够更快、更轻松、更一致地交付应用程序的基础设施。Kubernetes本身是一个功能强大的平台,但它引入了太多复杂性和功能,因此不能简单地将其作为IDP交给开发团队。若要期望他们能取得成功,非常重要的一点是要设置一些防护措施,使他们能够有效地使......