节点污点与Pod容忍度
我们在创建Pod的时候对我们的节点或者两个pod之间去挑选节点,污点是在node节点上打的,污点和容忍度可以理解为一男一女找对象,男女之间都有缺点,容忍度,如果俩人谈对象,如果对方有某些缺点,你容忍不了,那你们也谈不了对象,走不到最后。node本身有一些污点,这个污点本身不是坏的意思,就是给他打了标签值,可以把污点理解为标签,node节点有包含某个标签的污点存在,那作为pod如果能够容忍有这个污点的节点存在,那就可以把这个pod创建到这个节点上,如果容忍不了那么有这首歌污点的节点就创建不上,这就是污点和容忍度的一个关系。
污点是在node节点上来打的,但是容忍度是在pod里面定义的。
taint(污点) 定义在node节点上
tolerations(容忍度) 定义在pod资源对象里面
上图有三个节点,node 1和node 2节点打了污点,node 1节点上污点是蜘蛛有个螃蟹,node 2节点上污点是老鼠和蜘蛛
我现在有个pod-A 我只能容忍有螃蟹和蜘蛛的污点存在,那现在只有node1 和node3 上面能运行,node3 一个污点都没有 所有他肯定符合,node2的污点 pod-A 容忍不了所以不符合要求,结合不了,也就运行不了,如果pod无容忍度呢 pod-B没有容忍度,那就只能运行在node 3 没有污点的节点上
我们创建出来的集群的节点都会有个污点
[root@k8s-master1 9-7]# kubectl describe nodes k8s-master1
Name: k8s-master1
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-master1
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 172.17.8.0/16
projectcalico.org/IPv4IPIPTunnelAddr: 192.26.159.128
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 04 Sep 2023 21:14:35 +0800
Taints: node-role.kubernetes.io/control-plane:NoSchedule #######污点
#node-role.kubernetes.io/control-plane:NoSchedule
# NoSchedule 不可调度的意思 如果你创建的pod没有这个容忍度的话是创建不出来的
#我们创建的各种各样的pod都没有运行在master上过
#除了一些特殊情况 可以运行在master 看下面
Unschedulable: false
Lease:
HolderIdentity: k8s-master1
AcquireTime: <unset>
RenewTime: Thu, 07 Sep 2023 11:11:50 +0800
Conditions:
#########下面省略###########
我们在最初创建集群角色的时候,都会有容忍度
[root@k8s-master1 9-7]# kubectl get pods -n kube-system -owide
NAME READY STATUS RESTARTS IP NODE
calico-kube-controllers-6bd6b69df9-2kmtw 1/1 Running 6 (142m ago) 192.26.159.149 k8s-master1
#这个能够创建在master因为他设置了容忍度
calico-node-5hl6q 1/1 Running 7 (142m ago) 172.17.8.0 k8s-master1
#这个能够创建在master因为他设置了容忍度
calico-node-869cl 1/1 Running 7 (141m ago) 172.17.8.3 k8s-node3.guoguo.com
calico-node-qbcbp 1/1 Running 6 (141m ago) 172.17.8.2 k8s-node2.guoguo.com
calico-node-w5vfm 1/1 Running 7 (141m ago) 172.17.8.1 k8s-node1.guoguo.com
calico-typha-77fc8866f5-tg2r8 1/1 Running 7 (141m ago) 172.17.8.1 k8s-node1.guoguo.com
coredns-567c556887-2qwgk 1/1 Running 6 (142m ago) 192.26.159.148 k8s-master1
coredns-567c556887-lqzxx 1/1 Running 6 (142m ago) 192.26.159.147 k8s-master1
etcd-k8s-master1 1/1 Running 6 (142m ago) 172.17.8.0 k8s-master1
kube-apiserver-k8s-master1 1/1 Running 6 (142m ago) 172.17.8.0 k8s-master1
kube-controller-manager-k8s-master1 1/1 Running 6 (142m ago) 172.17.8.0 k8s-master1
kube-proxy-jhcz7 1/1 Running 6 (141m ago) 172.17.8.2 k8s-node2.guoguo.com
kube-proxy-k4crc 1/1 Running 6 (142m ago) 172.17.8.0 k8s-master1
kube-proxy-rnm8f 1/1 Running 7 (141m ago) 172.17.8.1 k8s-node1.guoguo.com
kube-proxy-zn9gn 1/1 Running 6 (141m ago) 172.17.8.3 k8s-node3.guoguo.com
kube-scheduler-k8s-master1 1/1 Running 7 (142m ago) 172.17.8.0 k8s-master1
metrics-server-684999f4d6-8sqk6 1/1 Running 6 (141m ago) 192.28.252.233 k8s-node2.guoguo.com
我们看下在master上节点上的pod
[root@k8s-master1 9-7]# kubectl describe pods -n kube-system calico-node-5hl6q
Name: calico-node-5hl6q
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: calico-node
Node: k8s-master1/172.17.8.0
Start Time: Mon, 04 Sep 2023 21:16:18 +0800
Labels: controller-revision-hash=87fbfdcf
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
###########中间省略########
kube-api-access-c2fc4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
#容忍度 容忍 Noschedule
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
master节点上的网络插件是为了与node节点通信用的,如果没有这网网络插件那就无法进行通信
我们默认创建的时候master节点上都会给带一个污点(taint)叫Noschedule污点,如果你容忍不了Noschedule这个污点,你当前所有的pod是无法在这个master上运行的
污点与容忍度概念
effect标识
effect就是等级标签,有效表示用于定义pod上排除等级,上面的Noschedule可以运行在NoExecute标签上
effect本身有三个等级,叫效用标识,效用标识值有三个
NoSchedule
第一个叫 NoSchedule :不能容忍此污点的Pod对象不可调度到当前节点,属于强制型约束关系,但添加污点对节点现存的Pod对象不产生影响
意思就是我最初没有这个污点的时候,比如我创建的pod在 node-1节点绑定了,但后面给node-1节点打污点,就之前在node-1运行的pod不会给驱逐,不会影响之前创建的pod
PreferNoSchedule
第二个叫PreferNoSchedule:他是NoSchedule的一个柔性约束版本,调度器尽量会确保将那些不能容忍这个污点的pod调度到拥有这个污点的node机器上去,这是尽量,不是一定,但是如果没有符合要求的时候,也能够在上面运行,添加该类效用的污点同样对 节点上现存的Pod对象不产生影响。就是你不能吃芹菜,我尽量不让你吃,但是就剩下芹菜了,你不得不吃
NoExecute
第三个叫NoExecute:不能容忍此污点的新Pod对象不可调度到当前节点,属于强制性约束关系,而且节点上现存的Pod对象因节点污点变动或Pod容忍度变动而不在满足匹配条件时,Pod对象将会被驱逐,此外,在Pod对象上定义容忍度时,它支持两种操作符
这个比较暴力,你特么不服就给我滚
不光新建的影响,之前存在的也影响
给节点创建污点
[root@k8s-master1 9-7]# kubectl taint node k8s-node1.guoguo.com node-key=production:NoSchedule
node/k8s-node1.guoguo.com tainted
#污点本身是个key value的值,key和value是自己定义的 冒号后面是级别 他的级别是NoSchdule
#这条命令使用kubectl给node k8s-node1.guoguo.com添加了一个taint(污点),名称是node-key=production,效果是NoSchedule。
#Taint是Kubernetes用于控制pod可以被调度到哪些node的一个机制。这里的taint的效果是NoSchedule,意思是如果一个pod没有toleration与这个taint匹配,那么它就不会被调度到这个node上。
#这样做的效果是将node k8s-node1.guoguo.com标记为仅用于运行production的pod,非production的pod不会被调度到这个node上。
#整条命令的作用就是将node k8s-node1.guoguo.com专门用于运行production环境的pod。如果想撤销这个污点,可以使用kubectl taint node k8s-node1.guoguo.com node-key=production:NoSchedule- 来删除它。
也就是我现在新创建的pod如果没有pod的容忍污点是production 那么就无法在node1节点创建pod
查看一下污点
[root@k8s-master1 9-7]# kubectl describe nodes k8s-node1.guoguo.com | grep -i taint
Taints: node-key=production:NoSchedule
#-i 忽略大小写
用yaml格式打印出来
[root@k8s-master1 9-7]# kubectl get node k8s-node1.guoguo.com -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 172.17.8.1/16
projectcalico.org/IPv4IPIPTunnelAddr: 192.26.131.128
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-09-04T13:15:02Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
gpu: ""
kubernetes.io/arch: amd64
kubernetes.io/hostname: k8s-node1.guoguo.com
kubernetes.io/os: linux
rack: "001"
name: k8s-node1.guoguo.com
resourceVersion: "147117"
uid: 6abbae3f-4fe3-47b7-8aba-aabfc097219c
spec:
podCIDR: 172.16.1.0/24
podCIDRs:
- 172.16.1.0/24
taints: #污点
- effect: NoSchedule #Noschedule
key: node-key #key名
value: production #values 值
status:
addresses:
- address: 172.17.8.1
type: InternalIP
- address: k8s-node1.guoguo.com
type: Hostname
#####省略下面########
如果添加多个相同key value 名称和值的污点 如果效用标识不同 也就是 哪三种类型不一样,那代表的是多个污点
#再加个相同的污点,只是效用标识不一样 上一个是用的NoSchedule 这次用的PreferNoSchedule
[root@k8s-master1 9-7]# kubectl taint node k8s-node1.guoguo.com node-key=production:PreferNoSchedule
node/k8s-node1.guoguo.com tainted
查看一下
[root@k8s-master1 9-7]# kubectl describe node k8s-node1.guoguo.com | grep -A1 -i taint
Taints: node-key=production:NoSchedule
node-key=production:PreferNoSchedule
#虽然同名同值 但是不同的效用标识 就表现出不同的taint 不同的污点
删除污点
[root@k8s-master1 9-7]# kubectl taint node k8s-node1.guoguo.com node-key:PreferNoSchedule-
node/k8s-node1.guoguo.com untainted
查看一下
[root@k8s-master1 9-7]# kubectl describe node k8s-node1.guoguo.com | grep -A1 -i taint
Taints: node-key=production:NoSchedule
Unschedulable: false
一次性删除所有污点
[root@k8s-master1 9-7]# kubectl patch nodes k8s-node1.guoguo.com -p '{"spec":{"taint":[]}}'
Warning: unknown field "spec.taint"
node/k8s-node1.guoguo.com patched (no change)
容忍度
[root@k8s-master1 9-7]# kubectl explain pod.spec.tolerations
KIND: Pod
VERSION: v1
RESOURCE: tolerations <[]Object>
DESCRIPTION:
If specified, the pod's tolerations.
The pod this Toleration is attached to tolerates any taint that matches the
triple <key,value,effect> using the matching operator <operator>.
FIELDS:
effect <string>
#taint的effect,可选NoSchedule、PreferNoSchedule、NoExecute
Effect indicates the taint effect to match. Empty means match all taint
effects. When specified, allowed values are NoSchedule, PreferNoSchedule
and NoExecute.
Possible enum values:
- `"NoExecute"` Evict any already-running pods that do not tolerate the
taint. Currently enforced by NodeController.
- `"NoSchedule"` Do not allow new pods to schedule onto the node unless
they tolerate the taint, but allow all pods submitted to Kubelet without
going through the scheduler to start, and allow all already-running pods to
continue running. Enforced by the scheduler.
- `"PreferNoSchedule"` Like TaintEffectNoSchedule, but the scheduler tries
not to schedule new pods onto the node, rather than prohibiting new pods
from scheduling onto the node entirely. Enforced by the scheduler.
key <string>
#要容忍的taint
Key is the taint key that the toleration applies to. Empty means match all
taint keys. If the key is empty, operator must be Exists; this combination
means to match all values and all keys.
operator <string>
#operator:操作符,默认为Equal,可选Exists
#Equal是等于的意思 就是eq的意思 等于
Operator represents a key's relationship to the value. Valid operators are
Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for
value, so that a pod can tolerate all taints of a particular category.
Possible enum values:
- `"Equal"`
- `"Exists"`
tolerationSeconds <integer>
#容忍时间,针对NoExecute effect的taint
#如果说我node节点污点使用的效用标识是NoExecute 强制驱逐 ,如果原来所在pod没有容忍这些污点,那么在多少秒后会驱逐pod,这里定义一个容忍 时间 也就是驱逐倒计时
#注意如果使用这个值 那么上面 effect 字段 必须设置为NoExecute
TolerationSeconds represents the period of time the toleration (which must
be of effect NoExecute, otherwise this field is ignored) tolerates the
taint. By default, it is not set, which means tolerate the taint forever
(do not evict). Zero and negative values will be treated as 0 (evict
immediately) by the system.
value <string>
#要容忍的taint的value,为空表示匹配所有value
Value is the taint value the toleration matches to. If the operator is
Exists, the value should be empty, otherwise just a regular string.
如果在node节点定义了一个污点以后,如果想把pod运行在有污点的node节点上,在pod里面加容忍度就可以了
写一个
查看一下当前pod所在地方
[root@k8s-master1 pod]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-test-1 1/1 Running 0 5s 192.26.131.184 k8s-node1.guoguo.com
然后我们给node1 节点加一个 污点效用标识使用NoExecute 不能容忍
[root@k8s-master1 pod]# kubectl taint node k8s-node1.guoguo.com node-key=production:NoExecute
node/k8s-node1.guoguo.com tainted
[root@k8s-master1 pod]# kubectl get pods -owide
No resources found in default namespace.
直接给驱逐 删除了...... 因为我们是直接创建的pod 没有用deployment 所以直接给删掉了
然后分别给node 2 node3 节点加污点 效用标识 均使用 NoExecute
[root@k8s-master1 pod]# kubectl taint node k8s-node2.guoguo.com node-key=backup:NoExecute
node/k8s-node2.guoguo.com tainted
[root@k8s-master1 pod]# kubectl taint node k8s-node3.guoguo.com node-key=backup:NoExecute
node/k8s-node3.guoguo.com tainted
现在node-1污点为 node-key=production:NoExecute
node-2 和node-3 污点为 node-key=backup:NoExecute
我们写个pod容忍污点为 node-key=production
apiVersion: apps/v1
kind: Deployment
metadata:
name: tolerations-nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: images.guoguo.com/apps/nginx:1.22.1
ports:
- containerPort: 80
tolerations: #容忍度
- key: "node-key" #污点名
value: "production" #污点值 可不写 不写就是默认所有值都可以
effect: NoExecute #配合下面的 的驱逐时间 意思就是如果node节点加了 新的污点 pod如果不能容忍 那么就驱逐pod
tolerationSeconds: 10 #注意如果使用这个值 那么上面 effect 字段 必须设置为NoExecute
然后就调度到node1节点了
[root@k8s-master1 9-7]# kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE
tolerations-nginx-85954fcc7f-25l6g 1/1 Running 0 4s 192.26.131.173 k8s-node1.guoguo.com
tolerations-nginx-85954fcc7f-bl2g5 1/1 Running 0 4s 192.26.131.174 k8s-node1.guoguo.com
tolerations-nginx-85954fcc7f-npngb 1/1 Running 0 4s 192.26.131.175 k8s-node1.guoguo.com
因为只有node1的污点 pod可以容忍
其他的node的污点pod无法容忍
我们给node1加其他污点看看 pod是否发生变化 如果给node1加污点那么pod就没地方去了
给node1去掉旧污点(pod可容忍的) 然后 加新污点(pod不可容忍的) 然后给node2 去除之前的污点(pod不可容忍),看看会不会调度到node2上面去
[root@k8s-master1 ~]# kubectl taint node k8s-node2.guoguo.com node-key:NoExecute-
node/k8s-node2.guoguo.com untainted
#去除node2的污点
[root@k8s-master1 ~]# kubectl taint node k8s-node1.guoguo.com node-key:NoExecute-
node/k8s-node1.guoguo.com untainted
#去除node1的污点(pod可容忍)
[root@k8s-master1 ~]# kubectl taint node k8s-node1.guoguo.com node-key=backup:NoExecute
node/k8s-node1.guoguo.com tainted
#给node1加新污点(pod不可容忍)
然后再看
[root@k8s-master1 ~]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE
tolerations-nginx-85954fcc7f-6s6z9 1/1 Running 0 3m16s 192.28.252.253 k8s-node2.guoguo.com
tolerations-nginx-85954fcc7f-b2dkc 1/1 Running 0 5m47s 192.28.252.252 k8s-node2.guoguo.com
tolerations-nginx-85954fcc7f-p7ksb 1/1 Running 0 3m16s 192.28.252.254 k8s-node2.guoguo.com
已经全部调度到node2节点,因为现在node1 有pod不可容忍的污点,node2没有污点
[root@k8s-master1 ~]# kubectl describe nodes k8s-node2.guoguo.com | grep -A2 -i taint
Taints: <none>
Unschedulable: false
Lease:
[root@k8s-master1 ~]# kubectl describe nodes k8s-node1.guoguo.com | grep -A2 -i taint
Taints: node-key=backup:NoExecute
Unschedulable: false
Lease:
标签:node,Kubernetes,guoguo,污点,Pod,k8s,com,pod
From: https://blog.51cto.com/u_15971294/7409288