k8s常用命令集合
k8s常用命令集合:
# 查看当前集群的所有的节点
kubectl get node
# 显示 Node 的详细信息(一般用不着)
kubectl describe node node1
# 查看所有的pod
kubectl get pod --all-namespaces
# 查看pod的详细信息
kubectl get pods -o wide --all-namespaces
# 查看所有创建的服务
kubectl get service
# 查看所有的deploy
kubectl get deploy
# 重启 pod(这个方式会删除原来的pod,然后再重新生成一个pod达到重启的目的)
# 有yaml文件的重启
kubectl replace --force -f xxx.yaml
# 无yaml文件的重启
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | kubectl replace --force -f -
# 查看pod的详细信息
kubectl describe pod nfs-client-provisioner-65c77c7bf9-54rdp -n default
# 根据 yaml 文件创建Pod资源
kubectl apply -f pod.yaml
# 删除基于 pod.yaml 文件定义的Pod
kubectl delete -f pod.yaml
# 查看容器的日志
kubectl logs <pod-name>
# 实时查看日志
kubectl logs -f <pod-name>
# 若 pod 只有一个容器,可以不加 -c
kubectl log <pod-name> -c <container_name>
# 返回所有标记为 app=frontend 的 pod 的合并日志
kubectl logs -l app=frontend
# 通过bash获得 pod 中某个容器的TTY,相当于登录容器
# kubectl exec -it <pod-name> -c <container-name> -- bash
eg:
kubectl exec -it redis-master-cln81 -- bash
# 查看 endpoint 列表
kubectl get endpoints
# 查看已有的token
kubeadm token list
排错1:Node状态不可用(pod kube-flannel-ds-xxxx状态:Init:ImagePullBackOff)
node节点状态不可用问题:
[root@M001 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m001 Ready control-plane 78m v1.26.0 192.168.11.120 <none> CentOS Stream 9 5.14.0-325.el9.x86_64 containerd://1.6.21
n001 NotReady <none> 31m v1.26.0 192.168.11.121 <none> CentOS Stream 9 5.14.0-325.el9.x86_64 containerd://1.6.21
n002 Ready <none> 27m v1.26.0 192.168.11.122 <none> CentOS Stream 9 5.14.0-171.el9.x86_64 containerd://1.6.21
[root@M001 ~]#
已经确认:
当 Kubernetes 中的节点状态显示为 "NotReady" 时,这通常表示节点无法正常工作或与控制平面失去连接。
以下是一些可能导致节点状态为 "NotReady" 的常见问题和解决方法:
- 网络问题:检查节点是否能够与其他节点和控制平面进行通信。确保网络配置正确,并且节点能够通过所需的端口和协议与其他组件进行通信。
==通讯正常 - CNI 插件问题:检查容器网络接口(CNI)插件的配置和状态。CNI 插件负责为容器提供网络功能,并在节点上设置网络接口。确保 CNI 插件正确安装并正常运行。
==我们使用的是:Flannel
2.1 检查配置文件
在Node节点上没有发现/etc/cni/net.d/
下存在有关Flannel的配置文件(Ready的Node上有)
2.2 检查pods状态
在M001上检查:
[root@M001 ~]# kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-5nzr9 0/1 Init:ImagePullBackOff 1 40m
kube-flannel-ds-g94hc 0/1 CrashLoopBackOff 18 (3m54s ago) 73m
kube-flannel-ds-jmc5d 0/1 CrashLoopBackOff 11 (4m24s ago) 37m
[root@M001 ~]#
备注:
"Init:ImagePullBackOff" 是表示 Flannel pod 的初始化容器(Init Container)处于无法拉取镜像的状态。
该错误通常是由于以下原因之一导致的:
- 镜像地址错误:请确保使用的 Flannel 镜像地址正确且可访问。检查 Pod 的配置文件或部署文件,确认所使用的镜像名称和版本是否正确。如果有需要,您可以尝试更换为其他可用的 Flannel 镜像源或版本。
- 仓库凭据问题:如果使用私有镜像仓库,可能需要提供有效的凭据来拉取镜像。确认在 Kubernetes 集群中的机密(Secret)中配置了正确的仓库凭据,并确保这些凭据与正在使用的镜像仓库匹配。
- 网络问题:如果节点无法连接到互联网或无法访问所需的镜像仓库,将无法拉取镜像。确保节点具有与其他正常工作的节点相同的网络配置,并能够访问所需的网络资源,包括镜像仓库。
- 防火墙问题:某些防火墙设置可能会限制节点对外部网络的访问,这可能导致拉取镜像失败。检查节点的防火墙规则,并确保允许所需端口的访问以及相关的网络流量。
2.3 修复pod的异常状态;移除pod
在移除Pod之前使用以下命令查看节点上的所有运行中的 Pod:
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=n001
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-5nzr9 0/1 Init:ImagePullBackOff 1 46m
kube-system kube-proxy-5q92t 1/1 Running 1 (34m ago) 46m
[root@M001 ~]#
然后,对于每个返回的 Pod,使用以下命令删除它们:
kubectl delete pod <pod-name> --namespace <namespace>
<pod-name>
是要删除的 Pod 的名称(注意区分大小写的),<namespace>
是该 Pod 所属的命名空间。重复此步骤,直到节点上没有任何运行的 Pod。
kubectl delete pod kube-flannel-ds-5nzr9 --namespace kube-flannel
kubectl delete pod kube-proxy-5q92t --namespace kube-system
提示:如果出现删除时迟迟无响应的情况,则需要在node节点(注意是node节点)上查看服务containerd
是否已经启动,没启动的情况下手动启动该服务。
[root@M001 ~]# kubectl delete pod kube-flannel-ds-5nzr9 --namespace kube-flannel
pod "kube-flannel-ds-5nzr9" deleted
[root@M001 ~]#
删除成功后,再检查:
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=n001
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-77b9l 0/1 CrashLoopBackOff 3 (49s ago) 3m11s
kube-system kube-proxy-5q92t 1/1 Running 2 (3m17s ago) 58m
[root@M001 ~]#
发现情况已经有改变;我们就不再继续删除其他pod了;
[root@M001 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m001 Ready control-plane 115m v1.26.0 192.168.11.120 <none> CentOS Stream 9 5.14.0-325.el9.x86_64 containerd://1.6.21
n001 Ready <none> 67m v1.26.0 192.168.11.121 <none> CentOS Stream 9 5.14.0-325.el9.x86_64 containerd://1.6.21
n002 Ready <none> 64m v1.26.0 192.168.11.122 <none> CentOS Stream 9 5.14.0-171.el9.x86_64 containerd://1.6.21
[root@M001 ~]#
接下来尝试修复新的故障(状态:CrashLoopBackOff
)。
- 资源不足:检查节点的资源使用情况,包括 CPU、内存和存储。如果节点资源不足,它可能无法正常工作或被标记为 "NotReady"。可以考虑增加节点的资源配额或重新分配任务以减轻节点的负载。
- 节点故障:检查节点本身的状态和健康状况。节点可能遇到硬件故障、操作系统问题或其他不可预知的失败。确保节点正常运行,并且没有任何错误或异常。
- 容器运行时问题:Kubernetes 使用容器运行时来管理容器,例如 Docker、containerd 或 CRI-O。检查容器运行时的日志和状态,确保它们正常运行,并且没有发生任何错误或故障。
排错2:所有节点上pod kube-flannel-ds-xxxx状态:CrashLoopBackOff)
Pod 的状态为 CrashLoopBackOff
表示该 Pod 在启动后不断地崩溃和重启。这通常是由于 Pod 内部的容器出现了持续性错误或异常导致的。
解决 CrashLoopBackOff
状态的问题,可以按照以下步骤进行排查:
- 查看 Pod 日志: 使用以下命令查看 Pod 的日志,以获取更多关于错误或异常的详细信息:
kubectl logs <pod-name> -n <namespace>
<pod-name>
是出现 CrashLoopBackOff
状态的 Pod 的名称,<namespace>
是 Pod 所属的命名空间。检查日志中是否有任何报错信息,可能可以帮助确定问题所在。
[root@M001 ~]# kubectl logs kube-flannel-ds-77b9l -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:13:06.428205 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:13:06.428264 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0616 09:13:06.439102 1 main.go:228] Failed to create SubnetManager: error parsing subnet config: invalid character '#' looking for beginning of object key string
[root@M001 ~]#
这看起来就是配置文件中有问题;而flannel的配置文件是:kube-flannel.yml
删除网络配置中的注释部分:
重新应用flannel配置
#删除
[root@M001 ~]# kubectl delete -f kube-flannel.yml
namespace "kube-flannel" deleted
clusterrole.rbac.authorization.k8s.io "flannel" deleted
clusterrolebinding.rbac.authorization.k8s.io "flannel" deleted
serviceaccount "flannel" deleted
configmap "kube-flannel-cfg" deleted
daemonset.apps "kube-flannel-ds" deleted
[root@M001 ~]#
#检查
[root@M001 ~]# kubectl get ns
NAME STATUS AGE
default Active 143m
kube-node-lease Active 143m
kube-public Active 143m
kube-system Active 143m
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=m001
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5bbd96d687-c948c 0/1 ContainerCreating 0 143m
kube-system coredns-5bbd96d687-tpqs6 0/1 ContainerCreating 0 143m
kube-system etcd-m001 1/1 Running 0 143m
kube-system kube-apiserver-m001 1/1 Running 0 143m
kube-system kube-controller-manager-m001 1/1 Running 0 143m
kube-system kube-proxy-khc5n 1/1 Running 0 143m
kube-system kube-scheduler-m001 1/1 Running 0 143m
#再应用
[root@M001 ~]# kubectl apply -f kube-flannel.yml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
[root@M001 ~]#
[root@M001 ~]# kubectl get ns
NAME STATUS AGE
default Active 144m
kube-flannel Active 5s
kube-node-lease Active 144m
kube-public Active 144m
kube-system Active 144m
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=m001
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-fc2fb 0/1 CrashLoopBackOff 1 (11s ago) 19s
kube-system coredns-5bbd96d687-c948c 0/1 ContainerCreating 0 143m
kube-system coredns-5bbd96d687-tpqs6 0/1 ContainerCreating 0 143m
kube-system etcd-m001 1/1 Running 0 144m
kube-system kube-apiserver-m001 1/1 Running 0 144m
kube-system kube-controller-manager-m001 1/1 Running 0 144m
kube-system kube-proxy-khc5n 1/1 Running 0 143m
kube-system kube-scheduler-m001 1/1 Running 0 144m
[root@M001 ~]#
问题依旧。
[root@M001 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default ngx 0/1 ContainerCreating 0 75m
kube-flannel kube-flannel-ds-bssdt 0/1 CrashLoopBackOff 4 (28s ago) 2m4s
kube-flannel kube-flannel-ds-fc2fb 0/1 CrashLoopBackOff 4 (28s ago) 2m5s
kube-flannel kube-flannel-ds-hrlb4 0/1 CrashLoopBackOff 4 (26s ago) 2m4s
kube-system coredns-5bbd96d687-c948c 0/1 ContainerCreating 0 145m
kube-system coredns-5bbd96d687-tpqs6 0/1 ContainerCreating 0 145m
kube-system etcd-m001 1/1 Running 0 146m
kube-system kube-apiserver-m001 1/1 Running 0 145m
kube-system kube-controller-manager-m001 1/1 Running 0 145m
kube-system kube-proxy-5q92t 1/1 Running 2 (43m ago) 98m
kube-system kube-proxy-b245t 1/1 Running 0 95m
kube-system kube-proxy-khc5n 1/1 Running 0 145m
kube-system kube-scheduler-m001 1/1 Running 0 145m
[root@M001 ~]#
再次查看日志:
[root@M001 ~]# kubectl logs kube-flannel-ds-bssdt -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:32.875469 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:32.875698 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0616 09:34:32.886229 1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:32.886617 1 kube.go:485] Starting kube subnet manager
I0616 09:34:33.887163 1 kube.go:151] Node controller sync successful
I0616 09:34:33.887187 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n002
I0616 09:34:33.887192 1 main.go:234] Installing signal handlers
I0616 09:34:33.887331 1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:33.887347 1 match.go:206] Determining IP address of default interface
I0616 09:34:33.887627 1 match.go:259] Using interface with name ens33 and address 192.168.11.122
I0616 09:34:33.887651 1 match.go:281] Defaulting external address to interface address (192.168.11.122)
I0616 09:34:33.887793 1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:33.888320 1 main.go:334] Error registering network: failed to acquire lease: node "n002" pod cidr not assigned
W0616 09:34:33.888508 1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0616 09:34:33.888729 1 main.go:522] Stopping shutdownHandler...
[root@M001 ~]# kubectl logs kube-flannel-ds-fc2fb -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:27.296223 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:27.296364 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0616 09:34:27.304363 1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:27.304426 1 kube.go:485] Starting kube subnet manager
I0616 09:34:28.304699 1 kube.go:151] Node controller sync successful
I0616 09:34:28.304801 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - m001
I0616 09:34:28.304825 1 main.go:234] Installing signal handlers
I0616 09:34:28.304987 1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:28.305082 1 match.go:206] Determining IP address of default interface
I0616 09:34:28.305483 1 match.go:259] Using interface with name ens33 and address 192.168.11.120
I0616 09:34:28.305598 1 match.go:281] Defaulting external address to interface address (192.168.11.120)
I0616 09:34:28.305650 1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:28.305914 1 main.go:334] Error registering network: failed to acquire lease: node "m001" pod cidr not assigned
I0616 09:34:28.306086 1 main.go:522] Stopping shutdownHandler...
W0616 09:34:28.306091 1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
[root@M001 ~]# kubectl logs kube-flannel-ds-hrlb4 -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:30.437216 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:30.437290 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0616 09:34:30.449034 1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:30.449066 1 kube.go:485] Starting kube subnet manager
I0616 09:34:31.449763 1 kube.go:151] Node controller sync successful
I0616 09:34:31.449795 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n001
I0616 09:34:31.449803 1 main.go:234] Installing signal handlers
I0616 09:34:31.449885 1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:31.449906 1 match.go:206] Determining IP address of default interface
I0616 09:34:31.450696 1 match.go:259] Using interface with name ens33 and address 192.168.11.121
I0616 09:34:31.450794 1 match.go:281] Defaulting external address to interface address (192.168.11.121)
I0616 09:34:31.450902 1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:31.451147 1 main.go:334] Error registering network: failed to acquire lease: node "n001" pod cidr not assigned
W0616 09:34:31.451351 1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0616 09:34:31.451362 1 main.go:522] Stopping shutdownHandler...
[root@M001 ~]#
根据报错提示:
确保节点 "m001" 的配置正确,并且已为其分配了正确的Pod CIDR。您可以使用以下命令检查节点的配置情况:
[root@M001 ~]# kubectl describe node m001
Name: m001
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=m001
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 16 Jun 2023 15:07:33 +0800
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: m001
AcquireTime: <unset>
RenewTime: Fri, 16 Jun 2023 17:36:58 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 16 Jun 2023 17:32:33 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 16 Jun 2023 17:32:33 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 16 Jun 2023 17:32:33 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 16 Jun 2023 17:32:33 +0800 Fri, 16 Jun 2023 16:26:16 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.11.120
Hostname: m001
Capacity:
cpu: 2
ephemeral-storage: 38700584Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1789588Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 35666458156
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1687188Ki
pods: 110
System Info:
Machine ID: 714a13eae5c04693ad91bf1cdbcf706f
System UUID: 9ca14d56-178a-3d0f-d26c-94bcbee0f0e5
Boot ID: f943b00b-7105-4d26-ac7e-1cbf4b261bd5
Kernel Version: 5.14.0-325.el9.x86_64
OS Image: CentOS Stream 9
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.21
Kubelet Version: v1.26.0
Kube-Proxy Version: v1.26.0
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-fc2fb 100m (5%) 0 (0%) 50Mi (3%) 0 (0%) 5m34s
kube-system coredns-5bbd96d687-c948c 100m (5%) 0 (0%) 70Mi (4%) 170Mi (10%) 149m
kube-system coredns-5bbd96d687-tpqs6 100m (5%) 0 (0%) 70Mi (4%) 170Mi (10%) 149m
kube-system etcd-m001 100m (5%) 0 (0%) 100Mi (6%) 0 (0%) 149m
kube-system kube-apiserver-m001 250m (12%) 0 (0%) 0 (0%) 0 (0%) 149m
kube-system kube-controller-manager-m001 200m (10%) 0 (0%) 0 (0%) 0 (0%) 149m
kube-system kube-proxy-khc5n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 149m
kube-system kube-scheduler-m001 100m (5%) 0 (0%) 0 (0%) 0 (0%) 149m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 950m (47%) 0 (0%)
memory 290Mi (17%) 340Mi (20%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CIDRNotAvailable 4m30s (x38 over 149m) cidrAllocator Node m001 status is now: CIDRNotAvailable
[root@M001 ~]#
查看容器日志:
[root@M001 ~]# cat /run/flannel/subnet.env
cat: /run/flannel/subnet.env: No such file or directory
[root@M001 ~]# kubectl logs kube-flannel-ds-fc2fb -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0619 08:24:03.670815 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0619 08:24:03.670900 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0619 08:24:03.681661 1 kube.go:144] Waiting 10m0s for node controller to sync
I0619 08:24:03.681716 1 kube.go:485] Starting kube subnet manager
I0619 08:24:04.682401 1 kube.go:151] Node controller sync successful
I0619 08:24:04.682488 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - m001
I0619 08:24:04.682494 1 main.go:234] Installing signal handlers
I0619 08:24:04.682865 1 main.go:542] Found network config - Backend type: vxlan
I0619 08:24:04.682913 1 match.go:206] Determining IP address of default interface
I0619 08:24:04.683997 1 match.go:259] Using interface with name ens33 and address 192.168.11.120
I0619 08:24:04.684037 1 match.go:281] Defaulting external address to interface address (192.168.11.120)
I0619 08:24:04.684245 1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0619 08:24:04.684536 1 main.go:334] Error registering network: failed to acquire lease: node "m001" pod cidr not assigned
I0619 08:24:04.684744 1 main.go:522] Stopping shutdownHandler...
W0619 08:24:04.685011 1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
[root@M001 ~]#
看到:
Error registering network: failed to acquire lease: node "m001" pod cidr not assigned
看来还是pod cidr
的配置出现问题。
检查k8s Cluster的pod CIDR配置:
[root@M001 ~]# kubectl cluster-info dump | grep -m 1 cluster-cidr
"--cluster-cidr=10.100.0.0/16",
[root@M001 ~]#
这里好像不对;检查下初始化k8s的yml配置文件:
[root@M001 ~]#
[root@M001 ~]# cat kubeadm-init.yml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.11.120
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
# name: node
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: xml_k8s
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.26.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.100.0.0/16
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd #申明cgroup用 systemd
failSwapOn: false
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
[root@M001 ~]#
上面的podSubnet
手误,写错了地址。
原因呢查到,接下来就是如何解决问题了,当然不想整个群集删除重建喽。
步骤:
a) 修改群集配置
集群配置 configmap, 在 networking 下 修正 podSubnet。
[root@M001 ~]# kubectl edit cm kubeadm-config -n kube-system
onfigmap/kubeadm-config edited
[root@M001 ~]#
---
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: xml_k8s
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.26.0
networking:
dnsDomain: cluster.local
podSubnet: 10.112.0.0/12 #修改这里使其不与servicesubnet冲突,也要和flannel配置文件一致
serviceSubnet: 10.96.0.0/12
scheduler: {}
kind: ConfigMap
metadata:
creationTimestamp: "2023-06-16T07:07:35Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "199"
uid: 1c652fe9-a733-4793-a6f0-cc9d2d779644
---
b) 修改 controller-manager 静态 pod 的启动参数
增加
--allocate-node-cidrs=true
--cluster-cidr=10.112.0.0/12
[root@M001 ~]# kubectl edit cm kubeadm-config -n kube-system
onfigmap/kubeadm-config edited
[root@M001 ~]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml
[root@M001 ~]# cat /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubecnotallow=/etc/kubernetes/controller-manager.conf
- --authorization-kubecnotallow=/etc/kubernetes/controller-manager.conf
- --bind-address=127.0.0.1
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --allocate-node-cidrs=true #增加这里
- --cluster-cidr=10.112.0.0/12 #修改这里使其和flannel配置文件一致
- --cluster-name=xml_k8s
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --cnotallow=*,bootstrapsigner,tokencleaner
- --kubecnotallow=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --use-service-account-credentials=true
image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10257
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-controller-manager
resources:
requests:
cpu: 200m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10257
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
name: flexvolume-dir
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/kubernetes/controller-manager.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
type: DirectoryOrCreate
name: flexvolume-dir
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/kubernetes/controller-manager.conf
type: FileOrCreate
name: kubeconfig
status: {}
[root@M001 ~]#
c) 检查配置生效情况
[root@M001 ~]# kubectl cluster-info dump | grep -m 1 cluster-cidr
"--cluster-cidr=10.112.0.0/12",
[root@M001 ~]#
注意:如果更新较慢,可以手动删除相关 pod,如
kubectl delete pod -n kube-system kube-flannel-ds-amd64-???
d) 确认flannel配置文件的设定
flannel的配置文件是:kube-flannel.yml
;检查其中的Network
确保与前面几步的设定完全一致。
e) 逐一确认各节点的状态
#M001
[root@M001 ~]# kubectl describe node m001
Name: m001
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=m001
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"b2:50:68:29:17:b5"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.11.120
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 16 Jun 2023 15:07:33 +0800
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: m001
AcquireTime: <unset>
RenewTime: Mon, 19 Jun 2023 16:58:48 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 19 Jun 2023 16:55:02 +0800 Mon, 19 Jun 2023 16:55:02 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Mon, 19 Jun 2023 16:54:31 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 19 Jun 2023 16:54:31 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 19 Jun 2023 16:54:31 +0800 Fri, 16 Jun 2023 15:07:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 19 Jun 2023 16:54:31 +0800 Fri, 16 Jun 2023 16:26:16 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.11.120
Hostname: m001
Capacity:
cpu: 2
ephemeral-storage: 38700584Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1789588Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 35666458156
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1687188Ki
pods: 110
System Info:
Machine ID: 714a13eae5c04693ad91bf1cdbcf706f
System UUID: 9ca14d56-178a-3d0f-d26c-94bcbee0f0e5
Boot ID: f943b00b-7105-4d26-ac7e-1cbf4b261bd5
Kernel Version: 5.14.0-325.el9.x86_64
OS Image: CentOS Stream 9
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.21
Kubelet Version: v1.26.0
Kube-Proxy Version: v1.26.0
PodCIDR: 10.112.0.0/24 #出现了
PodCIDRs: 10.112.0.0/24 #出现了
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-fc2fb 100m (5%) 0 (0%) 50Mi (3%) 0 (0%) 2d23h
kube-system coredns-5bbd96d687-c948c 100m (5%) 0 (0%) 70Mi (4%) 170Mi (10%) 3d1h
kube-system coredns-5bbd96d687-tpqs6 100m (5%) 0 (0%) 70Mi (4%) 170Mi (10%) 3d1h
kube-system etcd-m001 100m (5%) 0 (0%) 100Mi (6%) 0 (0%) 3d1h
kube-system kube-apiserver-m001 250m (12%) 0 (0%) 0 (0%) 0 (0%) 3d1h
kube-system kube-controller-manager-m001 200m (10%) 0 (0%) 0 (0%) 0 (0%) 4m55s
kube-system kube-proxy-khc5n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d1h
kube-system kube-scheduler-m001 100m (5%) 0 (0%) 0 (0%) 0 (0%) 3d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 950m (47%) 0 (0%)
memory 290Mi (17%) 340Mi (20%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CIDRNotAvailable 6m59s (x92 over 3d1h) cidrAllocator Node m001 status is now: CIDRNotAvailable
Normal RegisteredNode 4m26s node-controller Node m001 event: Registered Node m001 in Controller
[root@M001 ~]#
#N001
[root@M001 ~]# kubectl describe node n001
Name: n001
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=n001
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"4e:99:fd:f9:6a:d5"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.11.121
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 16 Jun 2023 15:55:01 +0800
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: n001
AcquireTime: <unset>
RenewTime: Mon, 19 Jun 2023 17:00:56 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 19 Jun 2023 16:56:45 +0800 Mon, 19 Jun 2023 16:56:45 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Mon, 19 Jun 2023 16:59:45 +0800 Fri, 16 Jun 2023 16:50:08 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 19 Jun 2023 16:59:45 +0800 Fri, 16 Jun 2023 16:50:08 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 19 Jun 2023 16:59:45 +0800 Fri, 16 Jun 2023 16:50:08 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 19 Jun 2023 16:59:45 +0800 Fri, 16 Jun 2023 16:51:48 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.11.121
Hostname: n001
Capacity:
cpu: 2
ephemeral-storage: 38700584Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1789588Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 35666458156
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1687188Ki
pods: 110
System Info:
Machine ID: 7e170c574de842a6b03a8c74efbc7755
System UUID: 6d2b4d56-c0f1-6a16-8354-64e1136889f0
Boot ID: bff2448c-9be0-4c49-94bd-e13d66671e55
Kernel Version: 5.14.0-325.el9.x86_64
OS Image: CentOS Stream 9
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.21
Kubelet Version: v1.26.0
Kube-Proxy Version: v1.26.0
PodCIDR: 10.112.1.0/24
PodCIDRs: 10.112.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-hrlb4 100m (5%) 0 (0%) 50Mi (3%) 0 (0%) 2d23h
kube-system kube-proxy-5q92t 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (5%) 0 (0%)
memory 50Mi (3%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CIDRNotAvailable 8m42s (x99 over 3d1h) cidrAllocator Node n001 status is now: CIDRNotAvailable
Normal RegisteredNode 6m27s node-controller Node n001 event: Registered Node n001 in Controller
[root@M001 ~]#
#N002
[root@M001 ~]# kubectl describe node n002
Name: n002
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=n002
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f6:03:6c:3e:42:cb"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.11.122
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 16 Jun 2023 15:58:11 +0800
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: n002
AcquireTime: <unset>
RenewTime: Mon, 19 Jun 2023 17:01:16 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 19 Jun 2023 16:56:13 +0800 Mon, 19 Jun 2023 16:56:13 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Mon, 19 Jun 2023 16:58:22 +0800 Fri, 16 Jun 2023 15:58:11 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 19 Jun 2023 16:58:22 +0800 Fri, 16 Jun 2023 15:58:11 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 19 Jun 2023 16:58:22 +0800 Fri, 16 Jun 2023 15:58:11 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 19 Jun 2023 16:58:22 +0800 Fri, 16 Jun 2023 15:59:40 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.11.122
Hostname: n002
Capacity:
cpu: 1
ephemeral-storage: 38700584Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 756904Ki
pods: 110
Allocatable:
cpu: 1
ephemeral-storage: 35666458156
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 654504Ki
pods: 110
System Info:
Machine ID: a9e6288c521d4f868c036c3b466209c4
System UUID: 03794d56-7ee4-42f8-c910-3bd20730aef1
Boot ID: c3775db2-5fae-4906-adca-23427f4c22d3
Kernel Version: 5.14.0-171.el9.x86_64
OS Image: CentOS Stream 9
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.21
Kubelet Version: v1.26.0
Kube-Proxy Version: v1.26.0
PodCIDR: 10.112.2.0/24
PodCIDRs: 10.112.2.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default ngx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d
kube-flannel kube-flannel-ds-bssdt 100m (10%) 0 (0%) 50Mi (7%) 0 (0%) 2d23h
kube-system kube-proxy-b245t 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d1h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (10%) 0 (0%)
memory 50Mi (7%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CIDRNotAvailable 10m (x83 over 3d1h) cidrAllocator Node n002 status is now: CIDRNotAvailable
Normal RegisteredNode 6m52s node-controller Node n002 event: Registered Node n002 in Controller
[root@M001 ~]#
f) 检查pods的运行状态
#运行情况
[root@M001 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default ngx 1/1 Running 0 3d
kube-flannel kube-flannel-ds-bssdt 1/1 Running 60 (11m ago) 2d23h
kube-flannel kube-flannel-ds-fc2fb 1/1 Running 60 (12m ago) 2d23h
kube-flannel kube-flannel-ds-hrlb4 1/1 Running 60 (10m ago) 2d23h
kube-system coredns-5bbd96d687-c948c 1/1 Running 0 3d1h
kube-system coredns-5bbd96d687-tpqs6 1/1 Running 0 3d1h
kube-system etcd-m001 1/1 Running 0 3d1h
kube-system kube-apiserver-m001 1/1 Running 0 3d1h
kube-system kube-controller-manager-m001 1/1 Running 0 8m37s
kube-system kube-proxy-5q92t 1/1 Running 2 (3d ago) 3d1h
kube-system kube-proxy-b245t 1/1 Running 0 3d1h
kube-system kube-proxy-khc5n 1/1 Running 0 3d1h
kube-system kube-scheduler-m001 1/1 Running 0 3d1h
[root@M001 ~]#
#某一个的日志
[root@M001 ~]# kubectl logs kube-flannel-ds-hrlb4 -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0619 08:56:44.161875 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0619 08:56:44.162426 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0619 08:56:44.172385 1 kube.go:144] Waiting 10m0s for node controller to sync
I0619 08:56:44.172539 1 kube.go:485] Starting kube subnet manager
I0619 08:56:44.176719 1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.2.0/24]
I0619 08:56:44.176756 1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.0.0/24]
I0619 08:56:45.173029 1 kube.go:151] Node controller sync successful
I0619 08:56:45.173057 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n001
I0619 08:56:45.173062 1 main.go:234] Installing signal handlers
I0619 08:56:45.173321 1 main.go:542] Found network config - Backend type: vxlan
I0619 08:56:45.173344 1 match.go:206] Determining IP address of default interface
I0619 08:56:45.173709 1 match.go:259] Using interface with name ens33 and address 192.168.11.121
I0619 08:56:45.173730 1 match.go:281] Defaulting external address to interface address (192.168.11.121)
I0619 08:56:45.173769 1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
W0619 08:56:45.182753 1 main.go:595] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
I0619 08:56:45.182820 1 main.go:481] Current network or subnet (10.112.0.0/12, 10.112.1.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0619 08:56:45.183633 1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.1.0/24]
I0619 08:56:45.204871 1 main.go:356] Setting up masking rules
I0619 08:56:45.207593 1 main.go:407] Changing default FORWARD chain policy to ACCEPT
I0619 08:56:45.209693 1 iptables.go:290] generated 7 rules
I0619 08:56:45.212622 1 main.go:435] Wrote subnet file to /run/flannel/subnet.env
I0619 08:56:45.212704 1 main.go:439] Running backend.
I0619 08:56:45.213367 1 iptables.go:290] generated 3 rules
I0619 08:56:45.214654 1 vxlan_network.go:64] watching for new subnet leases
I0619 08:56:45.215811 1 watch.go:51] Batch elem [0] is { subnet.Event{Type:0, Lease:subnet.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa700200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:subnet.LeaseAttrs{PublicIP:0xc0a80b7a, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x66, 0x36, 0x3a, 0x30, 0x33, 0x3a, 0x36, 0x63, 0x3a, 0x33, 0x65, 0x3a, 0x34, 0x32, 0x3a, 0x63, 0x62, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0619 08:56:45.215870 1 watch.go:51] Batch elem [0] is { subnet.Event{Type:0, Lease:subnet.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa700000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:subnet.LeaseAttrs{PublicIP:0xc0a80b78, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x62, 0x32, 0x3a, 0x35, 0x30, 0x3a, 0x36, 0x38, 0x3a, 0x32, 0x39, 0x3a, 0x31, 0x37, 0x3a, 0x62, 0x35, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0619 08:56:45.219194 1 main.go:460] Waiting for all goroutines to exit
I0619 08:56:45.235656 1 iptables.go:283] bootstrap done
I0619 08:56:45.242457 1 iptables.go:283] bootstrap done
[root@M001 ~]#
g) 看看pod的ip地址
[root@M001 ~]# kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default ngx 1/1 Running 0 3d 10.112.2.2 n002 <none> <none>
kube-flannel kube-flannel-ds-bssdt 1/1 Running 60 (17m ago) 2d23h 192.168.11.122 n002 <none> <none>
kube-flannel kube-flannel-ds-fc2fb 1/1 Running 60 (18m ago) 2d23h 192.168.11.120 m001 <none> <none>
kube-flannel kube-flannel-ds-hrlb4 1/1 Running 60 (16m ago) 2d23h 192.168.11.121 n001 <none> <none>
kube-system coredns-5bbd96d687-c948c 1/1 Running 0 3d2h 10.112.0.3 m001 <none> <none>
kube-system coredns-5bbd96d687-tpqs6 1/1 Running 0 3d2h 10.112.0.2 m001 <none> <none>
kube-system etcd-m001 1/1 Running 0 3d2h 192.168.11.120 m001 <none> <none>
kube-system kube-apiserver-m001 1/1 Running 0 3d2h 192.168.11.120 m001 <none> <none>
kube-system kube-controller-manager-m001 1/1 Running 0 14m 192.168.11.120 m001 <none> <none>
kube-system kube-proxy-5q92t 1/1 Running 2 (3d ago) 3d1h 192.168.11.121 n001 <none> <none>
kube-system kube-proxy-b245t 1/1 Running 0 3d1h 192.168.11.122 n002 <none> <none>
kube-system kube-proxy-khc5n 1/1 Running 0 3d2h 192.168.11.120 m001 <none> <none>
kube-system kube-scheduler-m001 1/1 Running 0 3d2h 192.168.11.120 m001 <none> <none>
[root@M001 ~]#
[root@M001 ~]# kubectl get po -n kube-flannel -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-bssdt 1/1 Running 60 (24m ago) 2d23h 192.168.11.122 n002 <none> <none>
kube-flannel-ds-fc2fb 1/1 Running 60 (25m ago) 2d23h 192.168.11.120 m001 <none> <none>
kube-flannel-ds-hrlb4 1/1 Running 60 (23m ago) 2d23h 192.168.11.121 n001 <none> <none>
[root@M001 ~]#
至此,问题解决。
如果你的情况不是这样的,可以继续参考下面的步骤尝试解决。
- 检查容器配置文件: 确保容器的配置文件正确且完整。检查容器镜像、启动命令、环境变量等参数,确保它们与应用程序要求相匹配。如果配置文件中有错误,可能导致容器启动失败并进入
CrashLoopBackOff
状态。 - 更新容器镜像或配置: 如果发现容器镜像或配置文件存在问题,可以尝试更新它们。根据需要修改容器镜像版本或重新编写配置文件,并使用以下命令将更改的配置应用到 Pod:
kubectl apply -f <config-file>
<config-file>
是更新后的配置文件路径。
- 检查资源限制: 确保 Pod 设置的资源限制(如 CPU 和内存)适合容器中运行的应用程序。如果资源限制过低,可能导致应用程序无法正常运行并进入
CrashLoopBackOff
状态。 - 检查依赖项: 如果应用程序依赖其他服务或资源(如数据库、配置信息等),确保这些依赖项已正确配置且可访问。如果依赖项无法满足,可能会导致应用程序崩溃并进入
CrashLoopBackOff
状态。
使用验证
创建Pod
创建:
[root@M001 ~]# kubectl run ngx --image=nginx:alpine --port=80
pod/ngx created
[root@M001 ~]#
查看:
[root@M001 ~]# kubectl get pods ngx -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ngx 1/1 Running 0 3d1h 10.112.2.2 n002 <none> <none>
[root@M001 ~]#
#等状态变成running
创建SVC
创建:
[root@M001 ~]# kubectl expose pod ngx --target-port 80 --type NodePort
service/ngx exposed
[root@M001 ~]#
#这里的80指的是ngx这个pod的80端口
查看:
[root@M001 ~]# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3d2h
ngx NodePort 10.107.222.138 <none> 80:31718/TCP 14s
[root@M001 ~]# kubectl get service ngx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ngx NodePort 10.102.254.255 <none> 80:31718/TCP 20s
[root@M001 ~]#
#访问需要使用
#http://192.168.11.120:31718
验证访问
nginx 的 svc 是通过 NodePort
的方式暴露出来的, 直接通过浏览器访问 31718
端口。
如何将service的端口固定下来呢?
要创建一个 Kubernetes Service 并将其绑定到固定的端口上,您需要创建一个 Service 对象,并为该 Service 指定一个端口。
以下是创建一个固定端口的 Service 的示例 YAML 文件:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: ClusterIP
ports:
- protocol: TCP
port: <port-number>
targetPort: <target-port>
selector:
app: my-app
请将 <port-number>
替换为您想要使用的端口号,<target-port>
替换为将流量转发到的容器端口号,my-service
替换为您想要的 Service 名称,app: my-app
是一个标签选择器,它将 Service 与具有 app=my-app
标签的 Pod 关联起来。
将上述 YAML 文件保存为 service.yaml
,然后使用以下命令创建 Service:
kubectl apply -f service.yaml
这将创建一个名为 my-service
的 Service,并将其绑定到指定的端口。
请注意,这个示例创建的是一个 ClusterIP 类型的 Service,它只在集群内部可用。如果您需要从集群外部访问该 Service,您可以考虑使用其他类型的 Service,如 NodePort 或 LoadBalancer。
以上,就是本文的全部内容;下一篇将会介绍如何部署k8s的图形化管理系统。
文章准备仓促,可能存在错别字或者表述不清甚至错误的情况,如果大家有发现文章不妥之处,真诚欢迎留言,本人会尽力修正。
喜欢本文的朋友请三连哦,谢谢!
标签:kube,root,Kubernets,0%,go,K8s,安装,M001,flannel From: https://blog.51cto.com/mlxia/7491666