K8S中的调度策略--节点亲和性、pod亲和性
上面实验了nodeName和nodeSelector,其中,nodeName是通过节点的名称进行区分,在一些特定场景下还是很有用的,如果将节点调度到某一高性能节点。但是nodeName还是显得有点过于严苛。nodeSelector则是通过node的标签进行选择。同样的,通过node的label进行pod资源调度的还有nodeAffinity、podAffinity、podantiAffinity
亲和性语法
kubectl explain pod.spec.affinity
requiredDuringSchedulingIgnoredDuringExecution: 调度器只有在规则被满足的时候才能执行调度。此功能类似于 nodeSelector, 但其语法表达能力更强。
preferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。
IgnoredDuringExecution 意味着如果节点标签在 Kubernetes 调度 Pod 后发生了变更,Pod 仍将继续运行
节点亲和性nodeAffinity--强亲和性
requiredDuringSchedulingIgnoredDuringExecution
[root@master-worker-node-1 pod]# cat test-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-node-affinity-1
labels:
function: test-node-affinity
spec:
containers:
- name: test-node-affinity-1
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ['/bin/sh','-c','sleep 12345']
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test-node-affinity
operator: In
values:
- target-node
创建pod
[root@master-worker-node-1 pod]# kubectl apply -f test-node-affinity.yaml
pod/test-node-affinity-1 created
因为没有node满足这个label,故pod无法运行
[root@master-worker-node-1 pod]# kubectl get nodes --show-labels | grep target
[root@master-worker-node-1 pod]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
test-node-affinity-1 0/1 Pending 0 3m50s
pod处于pending状态是因为无法调度
[root@master-worker-node-1 pod]# kubectl describe pods test-node-affinity-1 | tail -4
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m42s default-scheduler 0/4 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
给某一节点添加label
[root@master-worker-node-1 pod]# kubectl label nodes only-worker-node-3 test-node-affinity=target-node
node/only-worker-node-3 labeled
一会发现后,pod运行正常
[root@master-worker-node-1 pod]# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
test-node-affinity-1 0/1 Pending 0 3m50s
test-node-affinity-1 0/1 Pending 0 6m51s
test-node-affinity-1 0/1 ContainerCreating 0 6m51s
test-node-affinity-1 0/1 ContainerCreating 0 6m52s
test-node-affinity-1 1/1 Running 0 6m53s
节点亲和性nodeAffinity--弱亲和性
preferredDuringSchedulingIgnoredDuringExecution
先删除环境中所有node的label
[root@master-worker-node-1 pod]# kubectl label nodes only-worker-node-3 test-node-affinity-
node/only-worker-node-3 unlabeled
[root@master-worker-node-1 pod]# cat test-node-affinity-2.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-node-affinity-2
labels:
function: test-node-affinity
spec:
containers:
- name: test-node-affinity-2
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ['/bin/sh','-c','sleep 12345']
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 40
preference:
matchExpressions:
- key: test-node-affinity
operator: In
values:
- target-node-1
创建pod
[root@master-worker-node-1 pod]# kubectl apply -f test-node-affinity-2.yaml
pod/test-node-affinity-2 created
虽然没有任何一个node满足pod prefer的label要求,但是pod仍然可以正常运行
[root@master-worker-node-1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-node-affinity-2 1/1 Running 0 26s 10.244.54.8 only-worker-node-4 <none> <none>
弱亲和性里面的weight字段需要有两个选项才有意思
[root@master-worker-node-1 pod]# cat test-node-affinity-3.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-node-affinity-3
labels:
function: test-node-affinity
spec:
containers:
- name: test-node-affinity-3
image: busybox:latest
imagePullPolicy: IfNotPresent
command: ['/bin/sh','-c','sleep 12345']
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 40 # 有两个可选项的时候,weight才用意义。
preference:
matchExpressions:
- key: test-node-affinity
operator: In
values:
- target-node-1
- weight: 80
preference:
matchExpressions:
- key: test-node-affinity
operator: In
values:
- target-node-2
给node添加对应的label
[root@master-worker-node-1 pod]# kubectl label nodes only-worker-node-3 test-node-affinity=target-node-1
node/only-worker-node-3 labeled
[root@master-worker-node-1 pod]# kubectl label nodes only-worker-node-4 test-node-affinity=target-node-2
node/only-worker-node-4 labeled
因为target-node-2的weight的权重更大,按理应该会被调度到only-worker-node-4上
[root@master-worker-node-1 pod]# kubectl apply -f test-node-affinity-3.yaml
pod/test-node-affinity-3 created
[root@master-worker-node-1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-node-affinity-2 1/1 Running 0 18m 10.244.54.8 only-worker-node-4 <none> <none>
test-node-affinity-3 1/1 Running 0 12s 10.244.54.9 only-worker-node-4 <none> <none>
节点反亲和性
节点反亲和性没有专门的语法,可以通过operator中的
NotIn、DoesNotExist和node污点实现。
小结
node节点亲和性分为强亲和性(required)和弱亲和性(preferred)
required是只有满足了才会被调度,perferred是尽力满足,无法满足也能调度
node反亲和性可通过NotIn、DoesNotExist和node污点实现