1、基础知识
1.1、简介
根据我们对Docker的学习,我们知道,以镜像打包技术为基础的容器技术环境,它运行起来的效果就类 似于一个"黑盒",默认情况下我们不知道里面是一种什么样的运行环境,我们为了实时的监控容器应用环境的 运行状态,容器提供了inspect的方法,让我们主动来采集相关的状态数据,但是这种方式太繁琐了。 实际上,我们需要一种可以及时的获取容器的各种运行状态数据,所以对于容器任务编排的环境下,他们 都应该考虑到一种场景:主动的将容器运行的相关数据暴露出来 -- 数据暴露接口。常见的就是 包含大量 metric指标数据的API接口。 对于k8s内部的pod环境来说,常见的这些API接口有: process health 状态健康检测接口 metrics 监控指标接口 readiness 容器可读状态的接口 liveness 容器存活状态的接口 tracing 全链路监控的埋点(探针)接口 logs 容器日志接口
1.2、检测相关属性
1.2.1、LivenessProbe
livenessProbe:存活性探针,用于判断容器是不是健康,如果不满足健康条件,那么 Kubelet 将根据 Pod 中设置的 restartPolicy (重启策略)来判断,
Pod 是否要进行重启操作。LivenessProbe按照配置去探测 ( 进程、或者端口、或者命令执行后是否成功等等),来判断容器是不是正常。如果探测不到,
代表容器不健康(可以配置连续多少次失败才记为不健康),则 kubelet 会杀掉该容器,并根据容器的重启策略做相应的处理。如果未配置存活探针,
则默认容器启动为通过(Success)状态。即探针返回的值永远是 Success。即Success后pod状态是RUNING 参考资料:kubectl explain pod.spec.containers.livenessProbe
1.2.2、ReadinessProbe
readinessProbe 就绪性探针,用于判断容器内的程序是否存活(或者说是否健康),只有程序(服务)正常, 容器开始对外提供网络访问(启动完成并就绪)。
容器启动后按照readinessProbe配置进行探测,无问题后结果为成功即状态为 Success。pod的READY状态为 true,从0/1变为1/1。如果失败继续为0/1,
状态为 false。若未配置就绪探针,则默认状态容器启动后为Success。对于此pod、此pod关联的Service资源、EndPoint 的关系也将基于 Pod 的 Ready
状态进行设置,如果 Pod 运行过程中 Ready 状态变为 false,则系统自动从 Service资源 关联的 EndPoint 列表中去除此pod,届时service资源接收到GET请求后,
kube-proxy将一定不会把流量引入此pod中,通过这种机制就能防止将流量转发到不可用的 Pod 上。如果 Pod 恢复为 Ready 状态。将再会被加回 Endpoint 列表。
kube-proxy也将有概率通过负载机制会引入流量到此pod中。 参考资料:kubectl explain pod.spec.containers.ReadnessProbe
1.2.3、StartupProbe
k8s在1.16版本后增加startupProbe探针,主要解决在复杂的程序中readinessProbe、livenessProbe探针无法更好的判断程序是否启动、是否存活。
进而引入startupProbe探针为readinessProbe、livenessProbe探针服务。 startupProbe探针与另两种区别 如果三个探针同时存在,先执行startupProbe探针,其他两个探针将会被暂时禁用,直到pod满足startupProbe探针配置的条件,其他2个探针启动,如果不满足按照规则重启容器 另外两种探针在容器启动后,会按照配置,直到容器消亡才停止探测,而startupProbe探针只是在容器启动后按照配置满足一次后,不在进行后续的探测。 参照资料:kubectl explain pod.spec.containers.startupProbe
1.3、探针类型
1.3.1、ExecAction
直接执行命令,命令成功返回表示探测成功;
1.3.2、TCPSocketAction
端口能正常打开,即成功
1.3.3、HTTPGetAction
向指定的path发HTTP请求,2xx, 3xx的响应码表示成功
1.3.4、总结
注意:每种检测机制都支持这三种探针机制
1.4、相关的属性
spec: containers: - name: … image: … livenessProbe: exec <Object> # 命令式探针 httpGet <Object> # http GET类型的探针 tcpSocket <Object> # tcp Socket类型的探针 initialDelaySeconds <integer> # 发起初次探测请求的延后时长 periodSeconds <integer> # 请求周期 timeoutSeconds <integer> # 超时时长,默认是1。 successThreshold <integer> # 连续成功几次,才表示状态正常,默认值是1 failureThreshold <integer> # 连续失败几次,才表示状态异常,默认值是3 注意: 这里面仅仅罗列的livenessProbe ,readnessProbe 的属性与livenessProbe一样
2、探针简单的入门示例
2.1、exec
2.1.1、存活性探针yaml
cat >pod-health-cmd.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-exec-pod namespace: default spec: containers: - name: liveness-exec-container image: busybox imagePullPolicy: IfNotPresent command: ["/bin/sh","-c","touch /tmp/healthy; sleep 3; rm -rf /tmp/healthy;sleep 3600"] livenessProbe: exec: command: ["test", "-e","/tmp/healthy"] initialDelaySeconds: 1 periodSeconds: 3 EOF # 主要的功能,创建容器后,创建一个文件,隔3秒,删除文件,然后开启存活性探针,检测文件是否存在。
2.1.1、运行的效果
[root@master1 deplay]# kubectl apply -f pod-health-cmd.yml && kubectl get pod liveness-exec-pod -w pod/liveness-exec-pod created NAME READY STATUS RESTARTS AGE liveness-exec-pod 0/1 ContainerCreating 0 0s liveness-exec-pod 1/1 Running 0 1s liveness-exec-pod 1/1 Running 1 (0s ago) 43s liveness-exec-pod 1/1 Running 2 (0s ago) 85s liveness-exec-pod 1/1 Running 3 (0s ago) 2m7s liveness-exec-pod 1/1 Running 4 (0s ago) 2m49s liveness-exec-pod 1/1 Running 5 (0s ago) 3m31s # 重试到第5次后,报循环错误 liveness-exec-pod 0/1 CrashLoopBackOff 5 (0s ago) 4m13s
4、探针重启策略
# 帮助命令参数 kubectl explain pod.spec.restartPolicy Always:当容器终止退出,总是重启容器,默认策略 OnFailure:当容器异常退出(退出状态码非0)时,才重启容器 Never:当容器终止退出,从不重启容器
5、存活性探针-liveness
cat >/usr/local/bin/demo.py<<'EOF' #!/usr/bin/python3 # from flask import Flask, request, abort, Response, jsonify as flask_jsonify, make_response import argparse import sys, os, getopt, socket, json, time app = Flask(__name__) @app.route('/') def index(): return ('kubernetes pod-test v0.1!! ClientIP: {}, ServerName: {}, ' 'ServerIP: {}!\n'.format(request.remote_addr, socket.gethostname(), socket.gethostbyname(socket.gethostname()))) @app.route('/hostname') def hostname(): return ('ServerName: {}\n'.format(socket.gethostname())) health_status = {'livez': 'OK', 'readyz': 'OK'} probe_count = {'livez': 0, 'readyz': 0} @app.route('/livez', methods=['GET','POST']) def livez(): if request.method == 'POST': status = request.form['livez'] health_status['livez'] = status return '' else: if probe_count['livez'] == 0: time.sleep(5) probe_count['livez'] += 1 if health_status['livez'] == 'OK': return make_response((health_status['livez']), 200) else: return make_response((health_status['livez']), 506) @app.route('/readyz', methods=['GET','POST']) def readyz(): if request.method == 'POST': status = request.form['readyz'] health_status['readyz'] = status return '' else: if probe_count['readyz'] == 0: time.sleep(15) probe_count['readyz'] += 1 if health_status['readyz'] == 'OK': return make_response((health_status['readyz']), 200) else: return make_response((health_status['readyz']), 507) @app.route('/configs') def configs(): return ('DEPLOYENV: {}\nRELEASE: {}\n'.format(os.environ.get('DEPLOYENV'), os.environ.get('RELEASE'))) @app.route("/user-agent") def view_user_agent(): # user_agent=request.headers.get('User-Agent') return('User-Agent: {}\n'.format(request.headers.get('user-agent'))) def main(argv): port = 80 host = '0.0.0.0' debug = False if os.environ.get('PORT') is not None: port = os.environ.get('PORT') if os.environ.get('HOST') is not None: host = os.environ.get('HOST') try: opts, args = getopt.getopt(argv,"vh:p:",["verbose","host=","port="]) except getopt.GetoptError: print('server.py -p <portnumber>') sys.exit(2) for opt, arg in opts: if opt in ("-p", "--port"): port = arg elif opt in ("-h", "--host"): host = arg elif opt in ("-v", "--verbose"): debug = True app.run(host=str(host), port=int(port), debug=bool(debug)) if __name__ == "__main__": main(sys.argv[1:]) EOF关于探针的py测试代码
5.1、exec
5.1.1、存活性探针yml
cat > pod-liveness-exec.yml <<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-exec-demo namespace: default spec: restartPolicy: OnFailure containers: - name: demo image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent livenessProbe: exec: command: ["test", "-e","/tmp/healthy"] initialDelaySeconds: 5 timeoutSeconds: 1 periodSeconds: 5 EOF # 主要作用探测文件是否存在
5.1.2、容器的运行效果
]# kubectl apply -f pod-liveness-exec.yml && kubectl get pods -w pod/liveness-exec-demo created NAME READY STATUS RESTARTS AGE liveness-exec-demo 0/1 ContainerCreating 0 0s liveness-exec-demo 1/1 Running 0 1s liveness-exec-demo 1/1 Running 1 (1s ago) 46s liveness-exec-demo 1/1 Running 2 (1s ago) 91s liveness-exec-demo 1/1 Running 3 (1s ago) 2m16s liveness-exec-demo 1/1 Running 4 (1s ago) 3m1s liveness-exec-demo 1/1 Running 5 (0s ago) 3m45s # 坐这里可以看出,如果存活性探针,探测失败,将会重启pod.
5.2、http
5.2.1、模拟失败示例
cat >liveness-httpget-fail.yml <<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-httpget-pod spec: containers: - name: liveness-httpget-container image: 192.168.10.33:80/k8s/pod_test:v0.1 ports: - name: http containerPort: 80 livenessProbe: httpGet: port: http path: /index.html initialDelaySeconds: 1 periodSeconds: 3 EOF ----------------------- # 失败周期性的重启pod ]# kubectl apply -f liveness-httpget-fail.yml && kubectl get pods -w pod/liveness-httpget-pod created NAME READY STATUS RESTARTS AGE liveness-httpget-pod 0/1 ContainerCreating 0 0s liveness-httpget-pod 1/1 Running 0 1s liveness-httpget-pod 1/1 Running 1 (1s ago) 40s ----------------------- # 失败的日志 ]# kubectl describe pod liveness-httpget-pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 22s default-scheduler Successfully assigned default/liveness-httpget-pod to node1 Normal Pulled 22s kubelet Container image "192.168.10.33:80/k8s/pod_test:v0.1" already present on machine Normal Created 22s kubelet Created container liveness-httpget-container Normal Started 22s kubelet Started container liveness-httpget-container Warning Unhealthy 14s (x3 over 20s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404 Normal Killing 14s kubelet Container liveness-httpget-container failed liveness probe, will be restarted
5.2.2、模拟成功示例
cat >liveness-httpget-suc.yml <<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-httpget-demo namespace: default spec: containers: - name: demo image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent livenessProbe: httpGet: path: '/hostname' port: 80 scheme: HTTP initialDelaySeconds: 5 periodSeconds: 3 EOF --------------------------- ]# kubectl apply -f liveness-httpget-suc.yml && kubectl get pods -w pod/liveness-httpget-demo created NAME READY STATUS RESTARTS AGE liveness-httpget-demo 0/1 ContainerCreating 0 0s liveness-httpget-demo 1/1 Running 0 1s # 持续探测成功 ]# kubectl logs --tail 5 liveness-httpget-demo 10.244.3.1 - - [19/Mar/2023 09:49:25] "GET /hostname HTTP/1.1" 200 - 10.244.3.1 - - [19/Mar/2023 09:49:28] "GET /hostname HTTP/1.1" 200 - 10.244.3.1 - - [19/Mar/2023 09:49:31] "GET /hostname HTTP/1.1" 200 - 10.244.3.1 - - [19/Mar/2023 09:49:34] "GET /hostname HTTP/1.1" 200 - 10.244.3.1 - - [19/Mar/2023 09:49:37] "GET /hostname HTTP/1.1" 200 -
5.3、tcpsocket
5.3.1、模拟失败示例
cat >liveness-tcpsocket-fail.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-tcpsocket-demo namespace: default spec: containers: - name: demo image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 801 livenessProbe: tcpSocket: port: http periodSeconds: 5 initialDelaySeconds: 5 EOF ------------------- # 先运行容器,然后存活探测,如果失败,则重启pod ]# kubectl apply -f liveness-tcpsocket-fail.yml && kubectl get pods -w -o wide pod/liveness-tcpsocket-demo created NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-tcpsocket-demo 0/1 ContainerCreating 0 0s <none> node1 <none> <none> liveness-tcpsocket-demo 1/1 Running 0 0s 10.244.3.86 node1 <none> <none> liveness-tcpsocket-demo 1/1 Running 1 (0s ago) 45s 10.244.3.86 node1 <none> <none> ------------------- ]# kubectl describe pod liveness-tcpsocket-demo Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 44s default-scheduler Successfully assigned default/liveness-tcpsocket-demo to node1 Normal Pulled 45s kubelet Container image "192.168.10.33:80/k8s/pod_test:v0.1" already present on machine Normal Created 45s kubelet Created container demo Normal Started 45s kubelet Started container demo Warning Unhealthy 30s (x3 over 40s) kubelet Liveness probe failed: dial tcp 10.244.3.86:801: connect: connection refused Normal Killing 30s kubelet Container demo failed liveness probe, will be restarted
5.3.2、模板成功示例
cat >liveness-tcpsocket-suc.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: liveness-tcpsocket-demo namespace: default spec: containers: - name: demo image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 livenessProbe: tcpSocket: port: http periodSeconds: 5 initialDelaySeconds: 5 EOF ------------------- ]# kubectl apply -f liveness-tcpsocket-suc.yml && kubectl get pods -w -o wide pod/liveness-tcpsocket-demo created NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES liveness-tcpsocket-demo 0/1 ContainerCreating 0 0s <none> node1 <none> <none> liveness-tcpsocket-demo 1/1 Running 0 1s 10.244.3.84 node1 <none> <none>
6、就绪性探针-readness
6.1、http
6.1.1、模拟失败示例
cat >readiness-httpget-fail.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: readiness-httpget-pod spec: containers: - name: readiness-httpget-container image: busybox ports: - name: http containerPort: 80 readinessProbe: httpGet: port: http path: /index.html initialDelaySeconds: 1 periodSeconds: 3 EOF -------------------------- # 镜像没有/index.html可访问,所以容器创建不会成功,不断尝试重启创建容器 ]# kubectl apply -f readiness-httpget-fail.yml && kubectl get pods -w pod/readiness-httpget-pod created NAME READY STATUS RESTARTS AGE readiness-httpget-pod 0/1 ContainerCreating 0 0s readiness-httpget-pod 0/1 Completed 0 3s readiness-httpget-pod 0/1 Completed 1 (3s ago) 6s readiness-httpget-pod 0/1 CrashLoopBackOff 1 (2s ago) 7s -------------------------- Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 97s default-scheduler Successfully assigned default/readiness-httpget-pod to node1 Normal Pulled 95s kubelet Successfully pulled image "busybox" in 2.408374442s (2.408383198s including waiting) Normal Pulled 93s kubelet Successfully pulled image "busybox" in 2.399310727s (2.399315236s including waiting) Normal Pulled 75s kubelet Successfully pulled image "busybox" in 2.446511445s (2.446515052s including waiting) Normal Pulling 49s (x4 over 98s) kubelet Pulling image "busybox" Normal Created 47s (x4 over 95s) kubelet Created container readiness-httpget-container Normal Started 47s (x4 over 95s) kubelet Started container readiness-httpget-container Normal Pulled 47s kubelet Successfully pulled image "busybox" in 2.409887055s (2.40989074s including waiting) Warning BackOff 29s (x9 over 92s) kubelet Back-off restarting failed container
6.1.2、模拟成功示例
cat > readiness-httpget-suc.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: readiness-httpget-demo namespace: default spec: containers: - name: demo image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent readinessProbe: httpGet: path: '/readyz' port: 80 scheme: HTTP initialDelaySeconds: 15 timeoutSeconds: 2 periodSeconds: 5 failureThreshold: 3 restartPolicy: Always EOF ----------------------------- ]# kubectl apply -f readiness-httpget-suc.yml && kubectl get pods -w -o wide pod/readiness-httpget-demo created NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES readiness-httpget-demo 0/1 ContainerCreating 0 0s <none> node1 <none> <none> readiness-httpget-demo 0/1 Running 0 1s 10.244.3.81 node1 <none> <none> readiness-httpget-demo 1/1 Running 0 31s 10.244.3.81 node1 <none> <none> ----------------------------- # 模拟失败 ]# curl -XPOST -d 'readyz=FAIL' http://10.244.3.81:80/readyz # 械拟成功 ]# curl -XPOST -d 'readyz=OK' http://10.244.3.81:80/readyz
6.2、tcpsocket
6.2.1、模拟失败示例
# 只需要修改为不存在的端口 cat > readiness-tcpsocket.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: readiness-tcpsocket-pod spec: containers: - name: readiness-tcpsocket-pod image: 192.168.10.33:80/k8s/my_nginx:v1 readinessProbe: tcpSocket: port: 801 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 801 initialDelaySeconds: 15 periodSeconds: 20 EOF # 注意:先执行就绪探针,成功后,创建pod成功,再执行存活性探针 ----------- # 先就绪,再探测 ]# kubectl apply -f readiness-tcpsocket.yml && kubectl get pods -w -o wide pod/readiness-tcpsocket-pod created NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES readiness-tcpsocket-pod 0/1 ContainerCreating 0 0s <none> node1 <none> <none> readiness-tcpsocket-pod 0/1 Running 0 1s 10.244.3.83 node1 <none> <none> readiness-tcpsocket-pod 0/1 Running 1 (1s ago) 61s 10.244.3.83 node1 <none> <none> ----------- # 失败,不断尝试重新创建容器 ]# kubectl describe pod readiness-tcpsocket-pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 60s default-scheduler Successfully assigned default/readiness-tcpsocket-pod to node1 Normal Pulled 1s (x2 over 60s) kubelet Container image "192.168.10.33:80/k8s/my_nginx:v1" already present on machine Normal Created 1s (x2 over 60s) kubelet Created container readiness-tcpsocket-pod Warning Unhealthy 1s (x7 over 51s) kubelet Readiness probe failed: dial tcp 10.244.3.83:801: connect: connection refused Warning Unhealthy 1s (x3 over 41s) kubelet Liveness probe failed: dial tcp 10.244.3.83:801: connect: connection refused Normal Killing 1s kubelet Container readiness-tcpsocket-pod failed liveness probe, will be restarted Normal Started 0s (x2 over 60s) kubelet Started container readiness-tcpsocket-pod
6.2.2、模拟成功示例
cat > readiness-tcpsocket.yml<<'EOF' apiVersion: v1 kind: Pod metadata: name: readiness-tcpsocket-pod spec: containers: - name: readiness-tcpsocket-pod image: 192.168.10.33:80/k8s/my_nginx:v1 readinessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 15 periodSeconds: 20 EOF # 注意:先执行就绪探针,成功后,创建pod成功,再执行存活性探针 ----------- # 先就绪,再探测 ]# kubectl apply -f readiness-tcpsocket.yml && kubectl get pods -w -o wide pod/readiness-tcpsocket-pod created NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES readiness-tcpsocket-pod 0/1 ContainerCreating 0 1s <none> node1 <none> <none> readiness-tcpsocket-pod 0/1 Running 0 1s 10.244.3.82 node1 <none> <none> readiness-tcpsocket-pod 1/1 Running 0 11s 10.244.3.82 node1 <none> <none>标签:容器,17,exec,liveness,探针,探测,Running,Pod,pod From: https://www.cnblogs.com/ygbh/p/17232266.html