目录
- 项目信息
- 项目步骤
- ip规划
- 一、部署k8s集群内的机器,一台master,两台node
- 二、部署防火墙服务器和堡垒机,对web集群提供保护
- 三、部署ansible实现业务的自动化运维
- 四、部署nfs服务器,通过pv和pvc、卷的挂载,为web集群提供数据存储
- 五、采用HPA技术来启动nginx和MySQL的pod,cpu使用率超过80%时进行水平扩缩
- 六、构建CI/CD环境,安装部署Jenkins、harbor实现相关的代码发布
- 七、使用ingress,给web业务做基于域名的负载均衡
- 八、使用探针监控web业务的pod,出现问题马上重启,增强业务可靠性
- 九、使用dashboard掌控整个web集群的资源
- 十、安装部署Prometheus+grafana
- 项目心得
项目信息
项目架构图
项目描述
模拟公司里的k8s生产环境,部署web,MySQL,nfs,harbor,Prometheus,Jenkins等应用,构建一个高性能高可用的web集群
项目环境
CentOS7,k8s,docker,Prometheus,nfs,jumpserver,harbor,ansible,Jenkins等
项目步骤
ip规划
k8s-master:192.168.121.101
k8s-node1:192.168.121.102
k8s-node2:192.168.121.103
nfs:192.168.121.104
harbor:192.168.121.105
firewalld:192.168.121.106
jumpserver:192.168.121.107
Prometheus:192.168.121.108
ansible:192.168.121.109
一、部署k8s集群内的机器,一台master,两台node
修改主机名
hostnamectl set-hostname master && bash
hostnamectl set-hostname node1 && bash
hostnamectl set-hostname node2 && bash
添加域名解析
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.121.101 master
192.168.121.102 node1
192.168.121.103 node2
关闭firewalld、selinux
# 关闭防火墙
service firewalld stop
# 设置防火墙开机不启动
systemctl disable firewalld
# 临时关闭selinux
setenforce 0
# 永久关闭selinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
配置静态的IP地址
TYPE="Ethernet"
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR="192.168.121.101"
PREFIX=24
GATEWAY="192.168.2.1"
DNS1=114.114.114.114
关闭交换分区
为了提升k8s性能,默认不允许使用交换分区
# 临时关闭
swapoff -a
# 永久关闭
vim /etc/fstab
#/dev/mapper/centos-swap swap swap defaults 0 0
修改Linux内核参数
# 加载网桥过滤、地址转发功能
modprobe br_netfilter
modprobe overlay
# 若文件不存在,会创建一个;若已存在,会覆盖内容
cat << EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
overlay
EOF
# 修改/etc/sysctl.d/kubernetes.conf
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p
更新和配置软件源
# 安装基础软件包
yum install vim wget -y
# 添加阿里云yum源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
# 重新生成yum元数据缓存
yum clean all && yum makecache
# 配置阿里云Docker的yum仓库源
yum install yum-utils -y
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
配置ipvs功能
实现负载均衡
# 安装ipset和ipvsadm工具
yum install ipset ipvsadm -y
# 添加需要加载的模块写入脚本,保证在节点重启后能自动加载所需模块
cat <<EOF > /etc/sysconfig/modules/ipvs.modules
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
# 为脚本添加执行权限
chmod +x /etc/sysconfig/modules/ipvs.modules
# 执行脚本
/bin/bash /etc/sysconfig/modules/ipvs.modules
同步时间
# 启用chronyd服务
systemctl start chronyd
systemctl enable chronyd
# 设置时区
timedatectl set-timezone Asia/Shanghai
安装docker环境
yum install docker-ce-20.10.24-3.el7 docker-ce-cli-20.10.24-3.el7 containerd.io -y
# 启动并设置开机自动开启
systemctl enable --now docker
# 验证docker状态
systemctl status docker
配置docker
# 创建文件夹
mkdir -p /etc/docker
# 编辑配置
cat > /etc/docker/daemon.json <<EOF
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://reg-mirror.qiniu.com",
"https://docker.mirrors.ustc.edu.cn"
],
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/opt/lib/docker"
}
EOF
# 重新加载配置
systemctl daemon-reload
# 重启docker服务
systemctl restart docker
配置k8s集群环境
# 在k8s集群上操作
# 配置组件源
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 构建本地yum缓存
yum makecache
# 安装k8s
yum install -y kubeadm-1.23.17-0 kubelet-1.23.17-0 kubectl-1.23.17-0 --disableexcludes=kubernetes
cat <<EOF > /etc/sysconfig/kubelet
KUBELET_CGROUP_ARGS="--cgroup-driver=systemd"
KUBE_PROXY_MODE="ipvs"
EOF
# 启动并设置开机自动开启
systemctl enable --now kubelet
集群初始化
master节点初始化
# 在master上操作
# apiserver-advertise-address填集群里master的IP地址
[root@master ~]# kubeadm init \
--kubernetes-version=v1.23.17 \
--pod-network-cidr=10.224.0.0/16 \
--service-cidr=10.96.0.0/12 \
--apiserver-advertise-address=192.168.121.101 \
--image-repository=registry.aliyuncs.com/google_containers
# 成功后有提示信息,做以下操作
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 得到的下面这条命令一会在node上执行
kubeadm join 192.168.121.101:6443 --token 8yrrzd.993ke86hqr3aq72w \
--discovery-token-ca-cert-hash sha256:6efa53848ef3465aeb871304b9614d5b3771c411fe1cbfeb02de974c448d117b
node节点初始化
# 在node1和node2上操作
# 把节点加入到集群里
[root@node1 ~]# kubeadm join 192.168.121.101:6443 --token 8yrrzd.993ke86hqr3aq72w \
--discovery-token-ca-cert-hash sha256:6efa53848ef3465aeb871304b9614d5b3771c411fe1cbfeb02de974c448d117b
检查集群节点状态,分配role
# 在master上操作
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady control-plane,master 15m v1.23.17
node1 NotReady <none> 6m31s v1.23.17
node2 NotReady <none> 6m32s v1.23.17
# 分配role,#在master上操作
[root@master ~]# kubectl label node node1 node-role.kubernetes.io/worker=worker
[root@master ~]# kubectl label node node2 node-role.kubernetes.io/worker=worker
安装Calico网络插件
# 在master上操作
[root@master ~]# kubectl apply -f https://docs.projectcalico.org/archive/v3.25/manifests/calico.yaml
# 验证节点的STATUS=Ready
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 23m v1.23.17
node1 Ready worker 14m v1.23.17
node2 Ready worker 14m v1.23.17
k8s配置ipvs
# 在master上操作
[root@master ~]# kubectl edit configmap kube-proxy -n kube-system
--> mode: "ipvs"
# 删除所有kube-proxy pod
[root@master ~]# kubectl delete pods -n kube-system -l k8s-app=kube-proxy
二、部署防火墙服务器和堡垒机,对web集群提供保护
部署防火墙服务器
# 在防火墙服务器上,配置了2块网课
# ens33 NAT模式
IPADDR=192.168.121.106
PREFIX=24
GATEWAY=192.168.121.2
DNS1=114.114.114.114
# ens34 桥接模式
IPADDR=192.168.2.111
PREFIX=24
GATEWAY=192.168.2.1
DNS1=114.114.114.114
# 编写NAT规则脚本
[root@firewalld ~]# vim nat.sh
#!/bin/bash
# 开启路由功能
echo 1 >/proc/sys/net/ipv4/ip_forward
# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 清除防火墙规则
iptables -F
iptables -t nat -F
# 启用SNAT规则
# 将内网192.168.121.0/24网段的私有IP地址伪装为公网IP地址
iptables -t nat -A POSTROUTING -s 192.168.121.0/24 -o ens34 -j MASQUERADE
iptables -t filter -P INPUT ACCEPT
# 在集群内的master上操作,开启相应的端口
[root@master ~]# cat openport.sh
#!/bin/bash
# 打开ssh
iptables -t filter -A INPUT -p tcp --dport 22 -j ACCEPT
# 打开dns
iptables -t filter -A INPUT -p udp --dport 53 -s 192.168.121.0/24 -j ACCEPT
# 打开dhcp
iptables -t filter -A INPUT -p udp --dport 67 -j ACCEPT
#打开http和https端口
iptables -t filter -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -t filter -A INPUT -p tcp --dport 443 -j ACCEPT
#打开MySQL端口
iptables -t filter -A INPUT -p tcp --dport 3306 -j ACCEPT
iptables -t filter -P INPUT DROP
部署堡垒机
# 执行命令,一键安装Jumpserver
[root@jumpserver ~]# curl -sSL https://resource.fit2cloud.com/jumpserver/jumpserver/releases/latest/download/quick_start.sh | bash
# Web 访问
http://192.168.121.107:80
默认用户: admin 默认密码: admin
三、部署ansible实现业务的自动化运维
# 建立免密通道,在ansible主机上生成密钥对
[root@ansible ~]# ssh-keygen -t rsa
# 上传公钥到所有服务器的root用户家目录下
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
......
# 编写主机清单
[root@ansible ~]# cd /etc/ansible
[root@ansible ansible]# vim hosts
[master]
192.168.121.101
[node]
192.168.121.102
192.168.121.103
[nfs]
192.168.121.104
[harbor]
192.168.121.105
[firewalld]
192.168.121.106
[jumpserver]
192.168.121.107
[Prometheus]
192.168.121.108
四、部署nfs服务器,通过pv和pvc、卷的挂载,为web集群提供数据存储
# 关闭防火墙
[root@nfs ~]# service firewalld stop
[root@nfs ~]# systemctl disable firewalld
# 关闭selinux
[root@nfs ~]# setenforce 0
[root@nfs ~]# sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
在nfs服务器和web集群上安装nfs
[root@nfs ~]# yum install nfs-utils -y
配置NFS共享
# 在nfs服务器上操作
[root@nfs ~]# vim /etc/exports
/web 192.168.220.0/24(rw,no_root_squash,sync)
# 新建共享目录
[root@nfs ~]# mkdir /web
[root@nfs ~]# cd /web
[root@nfs web]# echo "welcome to china" >index.html
# 刷新nfs服务
[root@nfs web]# exportfs -rv
exporting 192.168.220.0/24:/web
# 重启服务并且设置开机启动
[root@nfs web]# systemctl restart nfs && systemctl enable nfs
创建pv使用nfs服务器上的共享目录
# 在master上操作
[root@master pv]# mkdir /pv
[root@master pv]# cd /pv/
[root@master pv]# vim nfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-web
labels:
type: pv-web
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: nfs
nfs:
path: "/web"
server: 192.168.121.104
readOnly: false
[root@master pv]# kubectl apply -f nfs-pv.yml
persistentvolume/pv-web created
[root@master pv]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-web 10Gi RWX Retain Available nfs 2m
# 创建pvc
[root@master pv]# vim nfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-web
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: nfs
[root@master pv]# kubectl apply -f nfs-pvc.yaml
persistentvolumeclaim/pvc-web created
[root@master pv]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-web Bound pv-web 10Gi RWX nfs 11s
# 创建pod
[root@master pv]# vim nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: pv-storage-nfs
persistentVolumeClaim:
claimName: pvc-web
containers:
- name: pv-container-nfs
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: pv-storage-nfs
[root@master pv]# kubectl apply -f nginx-deployment.yaml
deployment.apps/nginx-deployment created
[root@master pv]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5ffbcf544-kqq5p 1/1 Running 0 35s 10.224.166.131 node1 <none> <none>
nginx-deployment-5ffbcf544-sgfxt 1/1 Running 0 35s 10.224.104.3 node2 <none> <none>
nginx-deployment-5ffbcf544-x27kd 1/1 Running 0 35s 10.224.166.130 node1 <none> <none>
# 测试
[root@master pv]# curl 10.224.166.131
welcome to china
# 对nfs服务器上index.html的内容进行修改
[root@nfs web]# vim index.html
welcome to china
hello world!
[root@master pv]# curl 10.224.166.131
welcome to china
hello world!
五、采用HPA技术来启动nginx和MySQL的pod,cpu使用率超过80%时进行水平扩缩
部署MySQLpod
# 编写yaml文件
[root@master ~]# mkdir /mysql
[root@master ~]# cd /mysql/
[root@master mysql]# vim mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mysql
name: mysql
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql:latest
name: mysql
imagePullPolicy: IfNotPresent
env:
- name: MYSQL_ROOT_PASSWORD
value: "123456"
ports:
- containerPort: 3306
---
apiVersion: v1
kind: Service
metadata:
labels:
app: svc-mysql
name: svc-mysql
spec:
selector:
app: mysql
type: NodePort
ports:
- port: 3306
protocol: TCP
targetPort: 3306
nodePort: 30007
#部署pod
[root@master mysql]# kubectl apply -f mysql.yaml
[root@master mysql]# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22h
svc-mysql NodePort 10.101.185.14 <none> 3306:30007/TCP 70s
[root@master mysql]# kubectl get pods
NAME READY STATUS RESTARTS AGE
mysql-597ff9595d-vkx7t 1/1 Running 0 50s
nginx-deployment-5ffbcf544-kqq5p 1/1 Running 0 26m
nginx-deployment-5ffbcf544-sgfxt 1/1 Running 0 26m
nginx-deployment-5ffbcf544-x27kd 1/1 Running 0 26m
# 进入MySQL
[root@master mysql]# kubectl exec -it mysql-597ff9595d-vkx7t -- bash
bash-4.4# mysql -uroot -p123456
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.3.0 MySQL Community Server - GPL
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
部署带HPA功能的nginx
# 编写yaml文件
[root@master ~]# mkdir /hpa
[root@master ~]# cd /hpa
[root@master hpa]# vim nginx-hpa.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myweb
spec:
selector:
matchLabels:
run: myweb
template:
metadata:
labels:
run: myweb
spec:
containers:
- name: myweb
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
resources:
limits:
cpu: 200m
requests:
cpu: 50m
---
apiVersion: v1
kind: Service
metadata:
name: myweb-svc
labels:
run: myweb-svc
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
nodePort: 31000
selector:
run: myweb
[root@master hpa]# kubectl apply -f nginx-hpa.yaml
deployment.apps/myweb created
service/myweb-svc created
# 启用HPA水平扩缩
[root@master hpa]# kubectl autoscale deployment myweb --cpu-percent=80 --min=1 --max=10
horizontalpodautoscaler.autoscaling/myweb autoscaled
安装metrics-server
# 在master上操作
# 下载components.yaml配置文件
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# 修改omponents.yaml配置文件
image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.0
imagePullPolicy: IfNotPresent
args:
// 新增下面两行参数
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
[root@master ~]# cat components.yaml
部分代码如下
spec:
containers:
- args:
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.0
imagePullPolicy: IfNotPresent
# 部署
[root@master ~]# kubectl apply -f components.yaml
# 检查是否安装成功
[root@master ~]# kubectl get pods -o wide -n kube-system |grep metrics-server
metrics-server-5bd756b4b8-788qj 1/1 Running 0 51s 10.224.104.10 node2 <none> <none>
[root@master ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 163m 8% 1183Mi 32%
node1 88m 4% 1186Mi 32%
node2 76m 3% 656Mi 17%
在nfs服务器上进行ab压力测试
# 安装httpd-tools工具
[root@nfs ~]# yum install httpd-tools -y
# 模拟用户访问业务场景
[root@nfs ~]# ab -n 1000 -c50 http://192.168.121.101:31000/index.html
# 在master机器上监控HPA状态
[root@master ~]# kubectl get hpa --watch
# 增加并发数量和请求数量
[root@nfs ~]# ab -n 5000 -c100 http://192.168.121.101:31000/index.html
[root@nfs ~]# ab -n 10000 -c200 http://192.168.121.101:31000/index.html
[root@nfs ~]# ab -n 20000 -c400 http://192.168.121.101:31000/index.html
六、构建CI/CD环境,安装部署Jenkins、harbor实现相关的代码发布
部署并安装Jenkins
# 安装git软件
[root@master ~]# mkdir /jenkins
[root@master ~]# cd /jenkins
[root@master jenkins]# yum install git -y
# 下载yaml文件
[root@master jenkins]# git clone https://github.com/scriptcamp/kubernetes-jenkins
# 创建一个命名空间
[root@master jenkins]# cd kubernetes-jenkins/
[root@master kubernetes-jenkins]# cat namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: devops-tools
[root@master kubernetes-jenkins]# kubectl apply -f namespace.yaml
namespace/devops-tools created
# 查看命名空间
[root@master kubernetes-jenkins]# kubectl get ns
NAME STATUS AGE
default Active 39h
devops-tools Active 89s
kube-node-lease Active 39h
kube-public Active 39h
kube-system Active 39h
# 创建服务账号,集群角色绑定
[root@master kubernetes-jenkins]# vim serviceAccount.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jenkins-admin
rules:
- apiGroups: [""]
resources: ["*"]
verbs: ["*"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: jenkins-admin
namespace: devops-tools
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jenkins-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jenkins-admin
subjects:
- kind: ServiceAccount
name: jenkins-admin
namespace: devops-tools
[root@master kubernetes-jenkins]# kubectl apply -f serviceAccount.yaml
clusterrole.rbac.authorization.k8s.io/jenkins-admin unchanged
serviceaccount/jenkins-admin unchanged
clusterrolebinding.rbac.authorization.k8s.io/jenkins-admin created
# 创建卷
[root@master kubernetes-jenkins]# vim volume.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: jenkins-pv-volume
labels:
type: local
spec:
storageClassName: local-storage
claimRef:
name: jenkins-pv-claim
namespace: devops-tools
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
local:
path: /mnt
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node1 # 改为自己k8s集群里的node节点名字
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-pv-claim
namespace: devops-tools
spec:
storageClassName: local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
[root@master kubernetes-jenkins]# kubectl apply -f volume.yaml
storageclass.storage.k8s.io/local-storage created
persistentvolume/jenkins-pv-volume created
persistentvolumeclaim/jenkins-pv-claim created
[root@master kubernetes-jenkins]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
jenkins-pv-volume 10Gi RWO Retain Bound devops-tools/jenkins-pv-claim local-storage 22s
pv-web 10Gi RWX Retain Bound default/pvc-web nfs 17h
[root@master kubernetes-jenkins]# kubectl describe pv jenkins-pv-volume
Name: jenkins-pv-volume
Labels: type=local
Annotations: <none>
Finalizers: [kubernetes.io/pv-protection]
StorageClass: local-storage
Status: Bound
Claim: devops-tools/jenkins-pv-claim
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity:
Required Terms:
Term 0: kubernetes.io/hostname in [node1]
Message:
Source:
Type: LocalVolume (a persistent volume backed by local storage on a node)
Path: /mnt
Events: <none>
# 部署Jenkins
[root@master kubernetes-jenkins]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins
namespace: devops-tools
spec:
replicas: 1
selector:
matchLabels:
app: jenkins-server
template:
metadata:
labels:
app: jenkins-server
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
serviceAccountName: jenkins-admin
containers:
- name: jenkins
image: jenkins/jenkins:lts
imagePullPolicy: IfNotPresent
resources:
limits:
memory: "2Gi"
cpu: "1000m"
requests:
memory: "500Mi"
cpu: "500m"
ports:
- name: httpport
containerPort: 8080
- name: jnlpport
containerPort: 50000
livenessProbe:
httpGet:
path: "/login"
port: 8080
initialDelaySeconds: 90
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: "/login"
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: jenkins-data
mountPath: /var/jenkins_home
volumes:
- name: jenkins-data
persistentVolumeClaim:
claimName: jenkins-pv-claim
[root@master kubernetes-jenkins]# kubectl apply -f deployment.yaml
deployment.apps/jenkins created
[root@master kubernetes-jenkins]# kubectl get deploy -n devops-tools
NAME READY UP-TO-DATE AVAILABLE AGE
jenkins 1/1 1 1 7m
[root@master kubernetes-jenkins]# kubectl get pod -n devops-tools
NAME READY STATUS RESTARTS AGE
jenkins-b96f7764f-2gzrk 1/1 Running 0 6m7s
# 发布Jenkins pod
[root@master kubernetes-jenkins]# cat service.yaml
apiVersion: v1
kind: Service
metadata:
name: jenkins-service
namespace: devops-tools
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: /
prometheus.io/port: '8080'
spec:
selector:
app: jenkins-server
type: NodePort
ports:
- port: 8080
targetPort: 8080
nodePort: 32000
[root@master kubernetes-jenkins]# kubectl apply -f service.yaml
service/jenkins-service created
[root@master kubernetes-jenkins]# kubectl get svc -n devops-tools
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jenkins-service NodePort 10.98.133.177 <none> 8080:32000/TCP 8s
# 使用浏览器访问宿主机的IP+端口号
http://192.168.2.104:32000
# 进入pod里获取密码
[root@master kubernetes-jenkins]# kubectl exec -it jenkins-b96f7764f-2gzrk -n devops-tools -- bash
jenkins@jenkins-b96f7764f-2gzrk:/$ cat /var/jenkins_home/secrets/initialAdminPassword
af75ec6ce21d47f2b111f0e60b69ebb9
部署安装harbor
# 准备一台2核4G内存的虚拟机
# 配置好阿里云的repo源
[root@harbor ~]# yum install yum-utils -y
[root@harbor ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装docker
[root@harbor ~]# yum install docker-ce-20.10.6 -y
# 启动docker,设置开机启动
[root@harbor ~]# systemctl start docker
[root@harbor ~]# systemctl enable docker.service
# 查看docker和docker compose的版本
[root@harbor ~]# docker version
[root@harbor ~]# docker compose version
# 安装harbor
[root@harbor ~]# wget https://github.com/goharbor/harbor/releases/download/v2.8.3/harbor-offline-installer-v2.8.3.tgz
# 解压
[root@harbor ~]# ls
anaconda-ks.cfg harbor-offline-installer-v2.8.3.tgz
[root@harbor ~]# tar xf harbor-offline-installer-v2.8.3.tgz
# 修改配置文件
[root@harbor ~]# ls
anaconda-ks.cfg harbor harbor-offline-installer-v2.8.3.tgz
[root@harbor ~]# cd harbor
[root@harbor harbor]# ls
common.sh harbor.yml.tmpl LICENSE
harbor.v2.8.3.tar.gz install.sh prepare
[root@harbor harbor]# vim harbor.yml.tmpl
# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: 192.168.121.105 #修改为主机ip地址
# http related config
http:
# port for http, default is 80. If https enabled, this port will redirect to https port
port: 5001 #端口可以修改
# 其余地方不做修改
# 安装harbor
[root@harbor harbor]# mv harbor.yml.tmpl harbor.yml
[root@harbor harbor]# ./install.sh
[+] Running 9/10
⠇ Network harbor_harbor Created 2.8s
✔ Container harbor-log Started 0.5s
✔ Container registry Started 1.5s
✔ Container harbor-db Started 1.2s
✔ Container harbor-portal Started 1.6s
✔ Container redis Started 1.5s
✔ Container registryctl Started 1.2s
✔ Container harbor-core Started 1.9s
✔ Container harbor-jobservice Started 2.4s
✔ Container nginx Started 2.5s
✔ ----Harbor has been installed and started successfully.----
[root@harbor harbor]# docker compose ps|grep harbor
WARN[0000] /root/harbor/docker-compose.yml: `version` is obsolete
harbor-core goharbor/harbor-core:v2.8.3 "/harbor/entrypoint.…" core 2 minutes ago Up 2 minutes (healthy)
harbor-db goharbor/harbor-db:v2.8.3 "/docker-entrypoint.…" postgresql 2 minutes ago Up 2 minutes (healthy)
harbor-jobservice goharbor/harbor-jobservice:v2.8.3 "/harbor/entrypoint.…" jobservice 2 minutes ago Up 2 minutes (healthy)
harbor-log goharbor/harbor-log:v2.8.3 "/bin/sh -c /usr/loc…" log 2 minutes ago Up 2 minutes (healthy) 127.0.0.1:1514->10514/tcp
harbor-portal goharbor/harbor-portal:v2.8.3 "nginx -g 'daemon of…" portal 2 minutes ago Up 2 minutes (healthy)
nginx goharbor/nginx-photon:v2.8.3 "nginx -g 'daemon of…" proxy 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:5001->8080/tcp, :::5001->8080/tcp
redis goharbor/redis-photon:v2.8.3 "redis-server /etc/r…" redis 2 minutes ago Up 2 minutes (healthy)
registry goharbor/registry-photon:v2.8.3 "/home/harbor/entryp…" registry 2 minutes ago Up 2 minutes (healthy)
registryctl goharbor/harbor-registryctl:v2.8.3 "/home/harbor/start.…" registryctl 2 minutes ago Up 2 minutes (healthy)
# 用浏览器访问,测试效果
http://192.168.121.105:5001/
账号:admin
密码:Harbor12345
新建一个项目:k8s-harbor
新建一个用户:user
密码:Aa12345678
授权k8s-harbor项目允许用户user去访问,授予项目管理员权限
# 实现k8s集群使用harbor仓库
[root@master ~]# vim /etc/docker/daemon.json
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://reg-mirror.qiniu.com",
"https://docker.mirrors.ustc.edu.cn"
],
"insecure-registries":["192.168.121.105:5001"],
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/opt/lib/docker"
}
测试harbor的镜像上传和拉取
# 在原来安装harbor的宿主机上,重新启动harbor相关的容器
[root@docker harbor]# cd /harbor/harbor
[root@docker harbor]# docker compose up -d
# master上登陆harbor仓库
[root@master ~]# docker login 192.168.121.105:5001
Username: user
Password:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
# 将master上的nginx镜像上传到仓库里
[root@master ~]# docker tag nginx:latest 192.168.121.105:5001/k8s-harbor/nginx:latest
[root@master ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
192.168.121.105:5001/k8s-harbor/nginx latest 92b11f67642b 7 weeks ago 187MB
[root@master ~]# docker push 192.168.121.105:5001/k8s-harbor/nginx:latest
# 在nfs服务器上拉取harbor仓库里的nginx镜像
[root@nfs ~]# vim /etc/docker/daemon.json
{
"registry-mirrors": ["https://ruk1gp3w.mirror.aliyuncs.com"],
"insecure-registries" : ["192.168.121.105:5001"]
}
# 重启docker
[root@nfs ~]# systemctl daemon-reload
[root@nfs ~]# systemctl restart docker
# 登陆harbor仓库
[root@nfs ~]# docker login 192.168.121.105:5001
[root@nfs ~]# docker pull 192.168.121.105:5001/k8s-harbor/nginx:latest
[root@nfs ~]# docker images|grep nginx
192.168.121.105:5001/k8s-harbor/nginx latest 92b11f67642b 7 weeks ago 187MB
七、使用ingress,给web业务做基于域名的负载均衡
安装ingress controller
[root@master ~]# mkdir /ingress
[root@master ~]# cd /ingress/
[root@master ingress]# ls
ingress-controller-deploy.yaml kube-webhook-certgen-v1.1.0.tar.gz
ingress-nginx-controllerv1.1.0.tar.gz nginx-svc-1.yaml
ingress.yaml nginx-svc-2.yaml
# 将镜像传到所有的node节点服务器上
[root@master ingress]# scp kube-webhook-certgen-v1.1.0.tar.gz node2:/root
[root@master ingress]# scp kube-webhook-certgen-v1.1.0.tar.gz node1:/root
[root@master ingress]# scp ingress-nginx-controllerv1.1.0.tar.gz node1:/root
[root@master ingress]# scp ingress-nginx-controllerv1.1.0.tar.gz node2:/root
# 在node1和node2上导入镜像
[root@node1 ~]# docker load -i ingress-nginx-controllerv1.1.0.tar.gz
[root@node1 ~]# docker load -i kube-webhook-certgen-v1.1.0.tar.gz
[root@node1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller v1.1.0 ae1a7201ec95 2 years ago 285MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen v1.1.1 c41e9fcadf5a 2 years ago 47.7MB
# 使用ingress-controller-deploy.yaml文件启动ingress controller
[root@master ingress]# kubectl apply -f ingress-controller-deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
# 查看命名空间
[root@master ingress]# kubectl get namespace
NAME STATUS AGE
default Active 44h
devops-tools Active 5h30m
ingress-nginx Active 14s
kube-node-lease Active 44h
kube-public Active 44h
kube-system Active 44h
# 查看相关service
[root@master ingress]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.101.156.215 <none> 80:32023/TCP,443:30325/TCP 2m47s
ingress-nginx-controller-admission ClusterIP 10.105.220.120 <none> 443/TCP 2m47s
# 查看相关pod
[root@master ingress]# kubectl get pod -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-cpd95 0/1 Completed 0 3m28s
ingress-nginx-admission-patch-jdk4w 0/1 Completed 1 3m28s
ingress-nginx-controller-7cd558c647-2d878 1/1 Running 0 3m28s
ingress-nginx-controller-7cd558c647-ct69k 1/1 Running 0 3m28s
创建pod和暴露pod的服务
[root@master ingress]# vim nginx-svc-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy-1
labels:
app: nginx-1
spec:
replicas: 3
selector:
matchLabels:
app: nginx-1
template:
metadata:
labels:
app: nginx-1
spec:
containers:
- name: nginx-1
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc-1
labels:
app: nginx-svc-1
spec:
selector:
app: nginx-1
ports:
- name: name-of-service-port
protocol: TCP
port: 80
targetPort: 80
[root@master ingress]# vim nginx-svc-2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy-2
labels:
app: nginx-2
spec:
replicas: 3
selector:
matchLabels:
app: nginx-2
template:
metadata:
labels:
app: nginx-2
spec:
containers:
- name: nginx-2
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc-2
labels:
app: nginx-svc-2
spec:
selector:
app: nginx-2
ports:
- name: name-of-service-port
protocol: TCP
port: 80
targetPort: 80
[root@master ingress]# kubectl apply -f nginx-svc-1.yaml
deployment.apps/nginx-deploy-1 created
service/nginx-svc-1 created
[root@master ingress]# kubectl apply -f nginx-svc-2.yaml
deployment.apps/nginx-deploy-2 created
service/nginx-svc-2 created
启用ingress 关联ingress controller 和service
[root@master ingress]# vim ingress-url.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: simple-fanout-example
annotations:
kubernets.io/ingress.class: nginx
spec:
ingressClassName: nginx
rules:
- host: www.qqx.com
http:
paths:
- path: /foo
pathType: Prefix
backend:
service:
name: nginx-svc-1
port:
number: 80
- path: /bar
pathType: Prefix
backend:
service:
name: nginx-svc-2
port:
number: 80
[root@master ingress]# kubectl apply -f ingress-url.yaml
ingress.networking.k8s.io/simple-fanout-example created
[root@master ingress]# kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
simple-fanout-example nginx www.qqx.com 192.168.121.102,192.168.121.103 80 44m
在nfs服务器上进行测试,需要在/etc/hosts文件里添加域名解析记录
[root@nfs ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.121.102 www.qqx.com
192.168.121.103 www.qqx.com
# 进入service1对应的一个pod里
[root@master ingress]# kubectl exec -it nginx-deploy-1-75d4755db9-bfkd2 -- bash
# 新建/for文件夹以及index.html网页文件
root@nginx-deploy-1-75d4755db9-bfkd2:/# cd /usr/share/nginx/html/
root@nginx-deploy-1-75d4755db9-bfkd2:/usr/share/nginx/html# mkdir /foo
root@nginx-deploy-1-75d4755db9-bfkd2:/usr/share/nginx/html# cd /foo/
root@nginx-deploy-1-75d4755db9-bfkd2:/usr/share/nginx/html/foo# echo "this is www.qqx.com/foo/" >index.html
root@nginx-deploy-1-75d4755db9-bfkd2:/usr/share/nginx/html/foo# exit
# 同理,进入service2上的pod并新建/bar文件夹以及index.html网页文件
[root@master ingress]# kubectl exec -it nginx-deploy-2-5c47798b5f-bnpxj -- bash
root@nginx-deploy-2-5c47798b5f-bnpxj:/# cd /usr/share/nginx/html/
root@nginx-deploy-2-5c47798b5f-bnpxj:/usr/share/nginx/html# mkdir bar
root@nginx-deploy-2-5c47798b5f-bnpxj:/usr/share/nginx/html# cd bar/
root@nginx-deploy-2-5c47798b5f-bnpxj:/usr/share/nginx/html/bar# echo "this is www.qqx.com/bar/" >index.html
root@nginx-deploy-2-5c47798b5f-bnpxj:/usr/share/nginx/html/bar# exit
# 在nfs服务器上
[root@nfs ~]# curl www.qqx.com/foo/index.html
this is www.qqx.com/foo/
[root@nfs ~]# curl www.qqx.com/bar/index.html
this is www.qqx.com/bar/
八、使用探针监控web业务的pod,出现问题马上重启,增强业务可靠性
[root@master /]# mkdir /probe
[root@master /]# cd /probe/
# 为了和第五步的myweb.yaml区分,下面的myweb都加上后缀
[root@master probe]# vim myweb2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myweb2
name: myweb2
spec:
replicas: 3
selector:
matchLabels:
app: myweb2
template:
metadata:
labels:
app: myweb2
spec:
containers:
- name: myweb2
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
resources:
limits:
cpu: 300m
requests:
cpu: 100m
livenessProbe:
exec:
command:
- ls
- /tmp
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- ls
- /tmp
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /
port: 8000
failureThreshold: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
labels:
app: myweb2-svc
name: myweb2-svc
spec:
selector:
app: myweb2
type: NodePort
ports:
- port: 8000
protocol: TCP
targetPort: 8000
nodePort: 30001
[root@master probe]# kubectl apply -f myweb2.yaml
deployment.apps/myweb2 unchanged
service/myweb2-svc created
[root@master probe]# kubectl get pod |grep -i myweb2
myweb2-7c4dcb8459-7vn8d 1/1 Running 2 (84s ago) 11m
myweb2-7c4dcb8459-jxdpf 1/1 Running 2 (84s ago) 11m
myweb2-7c4dcb8459-zc9n7 1/1 Running 2 (84s ago) 11m
[root@master probe]# kubectl describe pod myweb2-7c4dcb8459-zc9n7
Liveness: exec [ls /tmp] delay=5s timeout=1s period=5s #success=1 #failure=3
Readiness: exec [ls /tmp] delay=5s timeout=1s period=5s #success=1 #failure=3
Startup: http-get http://:8000/ delay=0s timeout=1s period=10s #success=1 #failure=30
九、使用dashboard掌控整个web集群的资源
[root@master ~]# mkdir /dashboard
[root@master ~]# cd /dashboard
[root@master dashboard]# vim recommended.yaml
---
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
# 指定类型
type: NodePort
ports:
- port: 443
targetPort: 8443
# 指定宿主机的端口
nodePort: 30088
selector:
k8s-app: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: serviceaccount-cluster-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: User
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard
---
[root@master dashboard]# kubectl apply -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
# 查看命名空间
[root@master dashboard]# kubectl get ns
NAME STATUS AGE
default Active 2d13h
devops-tools Active 22h
ingress-nginx Active 16h
kube-node-lease Active 2d13h
kube-public Active 2d13h
kube-system Active 2d13h
kubernetes-dashboard Active 20m
# 查看pod是否启动
[root@master dashboard]# kubectl get pods -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-799d786dbf-mqrlw 1/1 Running 0 24s
kubernetes-dashboard-546cbc58cd-9s6gk 1/1 Running 0 24s
# 查看服务是否启动
[root@master dashboard]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.97.45.28 <none> 8000/TCP 32s
kubernetes-dashboard NodePort 10.111.113.122 <none> 443:30088/TCP 32s
在浏览器里访问 https://192.168.121.101:30088/
# 获取token
[root@master dashboard]# kubectl get secret -n kubernetes-dashboard|grep dashboard-token
kubernetes-dashboard-token-ltt6g kubernetes.io/service-account-token 3 31m
[root@master dashboard]# kubectl describe secret -n kubernetes-dashboard kubernetes-dashboard-token-ltt6g
十、安装部署Prometheus+grafana
安装Prometheus
[root@prometheus ~]# mkdir /prom
[root@prometheus ~]# cd /prom
# 把Prometheus安装包传输到当前目录
[root@prometheus /prom]# ls
prometheus-2.43.0.linux-amd64.tar.gz
# 解压文件
[root@prometheus prom]# tar xf prometheus-2.43.0.linux-amd64.tar.gz
[root@prometheus prom]# ls
prometheus-2.43.0.linux-amd64 prometheus-2.43.0.linux-amd64.tar.gz
[root@prometheus prom]# mv prometheus-2.43.0.linux-amd64 prometheus
[root@prometheus prom]# cd prometheus
# 修改PATH变量
[root@prometheus prometheus]# PATH=/prom/prometheus:$PATH
[root@prometheus prometheus]# vim /etc/profile
PATH=/prom/prometheus:$PATH # 加在最后
# 在后台运行
[root@prometheus prometheus]# nohup prometheus --config.file=/prom/prometheus/prometheus.yml &
# 查看Prometheus进程
[root@prometheus prometheus]# ps aux|grep prom
root 8197 1.4 2.1 798956 40900 pts/0 Sl 14:56 0:00 prometheus --config.file=/prom/prometheus/prometheus.yml
root 8204 0.0 0.0 112824 972 pts/0 S+ 14:56 0:00 grep --color=auto prom
# 查看Prometheus端口
[root@prometheus prometheus]# netstat -anplut | grep prom
tcp6 0 0 :::9090 :::* LISTEN 8197/prometheus
tcp6 0 0 ::1:9090 ::1:41618 ESTABLISHED 8197/prometheus
tcp6 0 0 ::1:41618 ::1:9090 ESTABLISHED 8197/prometheus
# 关闭防火墙
[root@prometheus prometheus]# service firewalld stop
Redirecting to /bin/systemctl stop firewalld.service
[root@prometheus prometheus]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
# 配置关于Prometheus服务的参数
[root@prometheus prometheus]# vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
[Service]
ExecStart=/prom/prometheus/prometheus --config.file=/prom/promethe
us/prometheus.yml
ExecReload=/bin/kill -HUP $MAINPID
killMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
# 关闭selinux
[root@prometheus prometheus]# setenforce 0
[root@prometheus prometheus]# sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
# 重新加载systemd的配置文件
[root@prometheus prometheus]# systemctl daemon-reload
[root@prometheus prometheus]# ps aux|grep prom
root 8734 0.0 2.6 930284 48580 pts/0 Sl 17:55 0:00 prometheus --config.file=/prom/prometheus/prometheus.yml
[root@prometheus prometheus]# kill -9 8734
[root@prometheus prometheus]# service prometheus restart
# 用浏览器访问 IP:9090
在要被监听的机器上安装exporter
# 在被监听的机器上操作,这里我们选择master
[root@master ~]# tar xf node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@master ~]# mv node_exporter-1.4.0-rc.0.linux-amd64 /node_exporter
[root@master ~]# cd /node_exporter/
[root@master node_exporter]# ls
LICENSE node_exporter NOTICE
# 修改环境变量
[root@master node_exporter]# PATH=/node_exporter/:$PATH
[root@master node_exporter]# vim /root/.bashrc
PATH=/node_exporter/:$PATH
# 在后台执行
[root@master node_exporter]# nohup node_exporter --web.listen-address 0.0.0.0:8090 &
[1] 4844
[root@master node_exporter]# nohup: 忽略输入并把输出追加到"nohup.out"
# 检查进程
[root@master node_exporter]# ps aux |grep node_exporter
root 84412 0.0 0.3 716544 13104 pts/0 Sl 15:51 0:00 node_exporter --web.listen-address 0.0.0.0:8090
root 84846 0.0 0.0 112824 980 pts/0 S+ 15:52 0:00 grep --color=auto node_exporter
# 关闭防火墙
[root@prometheus /]# service firewalld stop
Redirecting to /bin/systemctl stop firewalld.service
[root@prometheus /]# systemctl disable firewalld
#关闭selinux:
setenforce 0
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
浏览器访问:ip+8090
# 设置node_exporter开机启动
[root@prometheus /]# vim /etc/rc.local
nohup /node_exporter/node_exporter --web.listen-address 0.0.0.0:8090 &
[root@prometheus node_exporter]# chmod +x /etc/rc.d/rc.local
在Prometheus机器的/prom/prometheus/prometheus.yml 里添加机器
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "master"
static_configs:
- targets: ["192.168.121.101:8090"]
# 刷新Prometheus服务
[root@prometheus prometheus]# service prometheus restart
Redirecting to /bin/systemctl restart prometheus.service
安装并部署Grafana
[root@prometheus ~]# ls
anaconda-ks.cfg grafana-enterprise-9.1.2-1.x86_64.rpm node_exporter-1.4.0-rc.0.linux-amd64.tar.gz
[root@prometheus ~]# yum install grafana-enterprise-9.1.2-1.x86_64.rpm -y
启动
[root@prometheus ~]# service grafana-server start
Starting grafana-server (via systemctl): [ 确定 ]
[root@prometheus ~]# ps aux|grep grafana
grafana 8230 6.6 3.8 1195912 71472 ? Ssl 16:48 0:00 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning
root 8238 0.0 0.0 112824 976 pts/0 R+ 16:49 0:00 grep --color=auto grafana
[root@prometheus ~]# netstat -anplut|grep grafana
tcp 0 0 192.168.121.108:44674 34.120.177.193:443 ESTABLISHED 8230/grafana-server
tcp6 0 0 :::3000 :::* LISTEN 8230/grafana-server
# 浏览器访问 ip:3000
默认用户名:admin
默认密码:admin
Configuration --> Add data source -> 选择Prometheus
填 http://192.168.121.108:9090
dashboard -> import -> 添加模板 1860 -> 选择Prometheus数据源
项目心得
1.更加了解开发和运维的关系
2.更加熟悉ansible、Prometheus等服务
3.自己的故障排错能力得到了提升
4.对负载均衡、高可用和自动扩缩有了新的认识