说明:日常运维中或多或少遇到k8s节点调整配置,或者k8s集群中某节点有问题,需要下架操作。
以k8s集群中节点172.24.80.2节点需要扩容为例,共三步:
#暂停节点172.24.80.2调度,使节点172.24.80.2不可用,使节点不接收新的pod
kubectl cordon 172.24.80.2
#驱逐节点上运行的pod到其他节点,
kubectl drain --ignore-daemonsets --delete-emptydir-data 172.24.80.2
注释:--delete-emptydir-data 删除emptyDir数据
--ignore-daemonsets 忽略DeamonSet,否则DeamonSet被删除后,仍会自动重建
#节点变更完成后,恢复172.24.80.2可调度
kubectl uncordon 172.24.80.2
下面是报错操作过程
[root@cvm-01 ~]# kubectl cordon 172.24.80.2
node/172.24.80.2 cordoned
没有使用--ignore-daemonsets和--delete-emptydir-data参数报错,提示让加上该参数
[root@cvm-01 ~]# kubectl drain 172.24.80.2
node/172.24.80.2 already cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "172.24.80.2", aborting command...
There are pending nodes to be drained:
172.24.80.2
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/csi-cbs-node-8r7nw, kube-system/ip-masq-agent-kvbcs, kube-system/kube-proxy-7xzt4, kube-system/node-problem-detector-542vq, kube-system/oom-guard-8wf28, kube-system/serf-agent-rh5sb, kube-system/tke-bridge-agent-tkz8f, kube-system/tke-cni-agent-xcj4g, kube-system/tke-log-agent-8qq72, kube-system/tke-monitor-agent-bfg9n, kube-system/tke-node-exporter-pw2hk, monitoring/node-exporter-pz8wp
cannot delete Pods with local storage (use --delete-emptydir-data to override): monitoring/alertmanager-main-1, monitoring/prometheus-k8s-1
只添加--delete-emptydir-data参数一样报错,缺--ignore-daemonsets参数。
[root@cvm-01 ~]# kubectl drain --delete-emptydir-data 172.24.80.2
node/172.24.80.2 already cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "172.24.80.2", aborting command...
There are pending nodes to be drained:
172.24.80.2
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/csi-cbs-node-8r7nw, kube-system/ip-masq-agent-kvbcs, kube-system/kube-proxy-7xzt4, kube-system/node-problem-detector-542vq, kube-system/oom-guard-8wf28, kube-system/serf-agent-rh5sb, kube-system/tke-bridge-agent-tkz8f, kube-system/tke-cni-agent-xcj4g, kube-system/tke-log-agent-8qq72, kube-system/tke-monitor-agent-bfg9n, kube-system/tke-node-exporter-pw2hk, monitoring/node-exporter-pz8wp
正确执行pod驱逐
[root@cvm-01 ~]# kubectl drain --ignore-daemonsets --delete-emptydir-data 172.24.80.2
node/172.24.80.2 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/csi-cbs-node-8r7nw, kube-system/ip-masq-agent-kvbcs, kube-system/kube-proxy-7xzt4, kube-system/node-problem-detector-542vq, kube-system/oom-guard-8wf28, kube-system/serf-agent-rh5sb, kube-system/tke-bridge-agent-tkz8f, kube-system/tke-cni-agent-xcj4g, kube-system/tke-log-agent-8qq72, kube-system/tke-monitor-agent-bfg9n, kube-system/tke-node-exporter-pw2hk, monitoring/node-exporter-pz8wp
evicting pod monitoring/prometheus-k8s-1
evicting pod easypie-pre/pre-easypie-nacos-1
evicting pod kube-system/coredns-78964c5667-f76ds
evicting pod kube-system/serf-holder-7869fcfdf5-6nzkb
evicting pod monitoring/alertmanager-main-1
pod/coredns-78964c5667-f76ds evicted
pod/serf-holder-7869fcfdf5-6nzkb evicted
pod/prometheus-k8s-1 evicted
pod/alertmanager-main-1 evicted
pod/pre-easypie-nacos-1 evicted
node/172.24.80.2 evicted
执行过程截图