环境信息
软件 | 版本号 |
---|---|
Linux | Centos7.9 |
k8s | v1.26.9 |
Docker | 25.0.4 |
kube-prometheus | v0.13.0 |
nginx-ingress-controller | v1.10.1 |
K8S集群信息(提前安装好自己的集群,本文不再讲解集群的安装)
主机名 | IP |
---|---|
k8s-master | 192.168.2.11 |
k8s-node01 | 192.168.2.12 |
k8s-node02 | 192.168.2.13 |
一、安装Prometheus Operator
版本选择-复制下载地址下载到本地
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz
github下载比较慢,可以用代理的加速地址,我这边用的 wget https://mirror.ghproxy.com/https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.tar.gz
可自行选择版本,版本对照如图:https://github.com/prometheus-operator/kube-prometheus/releases
1.解压进入目录:
tar -zxvf v0.13.0.tar.gz && cd kube-prometheus-0.13.0/manifests
第一个坑:国内无法访问registry.k8s.io,需替换资源清单内带使用仓库镜像的地址。
网上有说用bitnami仓库也有用registry.aliyuncs.com/google_containers的,从这两个仓库我都pull失败了,最用docker search找到了说是从官方sync的镜像,测试环境也就不管了直接使用
2.替换镜像地址
替换镜像地址
sed -i 's#registry.k8s.io/kube-state-metrics#jerrymei#' kubeStateMetrics-deployment.yaml
sed -i 's#registry.k8s.io/prometheus-adapter#jerrymei#' prometheusAdapter-deployment.yaml
3.部署prometheeus
部署prometheeus
kubectl apply --server-side -f ./setup
kubectl create -f ./
也可以先下载下来重打tag,那样需要把镜像的下载模式imagePullPolicy从Always改成IfNotPresent(默认好像是用的Always,我没看到配置文件中存在imagePullPolicy的配置信息,可以在部署后使用命令修改 kubectl -n monitoring get deploy 找到相应deploy在使用kubectl -n monitoring edit deploy <YOUR DEPLOY NAME>)
4.使用ingress提供外部访问
k8s需要安装ingress controller,我这里选择的是ingress-nginx controller
已安全装,或者选择其他ingress controller可跳过或参考官方文档:https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress-controllers/
1.安装ingress-nginx controller,也可通过helm部署具体可参考官方文档
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml
###ingress控制器pod可能出现image下载失败,可先下载该yaml文件,修改image为 registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.10.1 版本可根据实际情况更换
2.部署一个ingerss
kubectl apply -f ingress-prometheus.yaml
ingress-prometheus.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: monitoring
name: ingress-monitoring
spec:
ingressClassName: nginx
rules:
- host: "www.prometheus.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: prometheus-k8s
port:
number: 9090
- host: "www.grafana.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: grafana
port:
number: 3000
- host: "www.alertmanager.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: alertmanager-main
port:
number: 9093
二.本机测试访问(防火墙和selinux处于关闭状态)
kubectl get ingress -n monitoring
内网测试域名,需要将host修改为服务实际配置的host
curl -H "host: www.prometheus.com" 10.99.98.214
curl -H "host: www.grafana.com" 10.99.98.214
curl -H "host: www.alertmanager.com" 10.99.98.214
全部提示:504 Gateway Time-out
第二个坑,直接curl ingres的ClusterIP报504,以下为排查思路
kubectl get pods -n monitoring -owide
kubectl get svc -n monitoring
1.排查ipvs负载规则,svc到pod规则正常
ipvsadm -L -n | egrep "3000\s"
2.进入pod,curl服务正常,任意pod中互相访问正常,prometheus、grafana、altermanerge服务正常
kubectl -n monitoring exec -it grafana-79f47474f7-hxjh9 /bin/bash
2.直接访问svc和后端服务pod的ClusterIP,都无响应无响应
3.通过port-forward将本地端口分别转发到svc和pod,均访问正常。
kubectl port-forward --address=0.0.0.0 svc/grafana 3000 -nmonitoring
kubectl port-forward --address=0.0.0.0 pod/grafana-79f47474f7-hxjh9 3000 -nmonitoring
4,最后还是通过一个帖子(https://zhuanlan.zhihu.com/p/624478715) 发现问题:
解决方法:Prometheus Operator 默认设置了 NetworkPolicy,需要手动删除后才能访问
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml
思考:如果是因为networkPolicy的ingress规则导致的无法访问,很好奇通过port-forward转发到svc可以访问,通过svc的ClusterIP却无法访问,这里对k8s的网络理解的还不够啊。
既然是networkPolicy的规则导致的,这里查看了三个服务的networkPolicy文件,如果不打算卸载networkPolicy,应该也也可以在ingress-nginx contronaller的yaml文件中添加networkPolicy允许通过的lable(未进行测试)
或者也可以通过修改三个服务的networkPolicy规则,为其添加ingerss-contronller已存在的labels(已测试成功)
app.kubernetes.io/name: ingress-nginx
根据推测尝试修改grafana的networkPolicy
kubectl get networkPolicy -n monitoring
kubectl edit networkPolicy -n monitoring grafana
curl -H "host: www.grafana.com" 10.99.98.214
参考文档:
kubernetes官网ingress:
https://v1-26.docs.kubernetes.io/zh-cn/docs/concepts/services-networking/ingress/
kube-prometheus github地址:
https://github.com/prometheus-operator/kube-prometheus
ingress-nginx 官方地址:
https://kubernetes.github.io/ingress-nginx/deploy/
其他:
https://zhuanlan.zhihu.com/p/624478715
https://cloud.tencent.com/developer/article/2327634