准备:主机环境的前期准备工作
个人环境使用3台CentOS Linux release 8.5.2111,搭建需要联网,配置yum的k8s仓库等。
IP地址:172.17.136.28/29/32/33,主机名对应为:gip28、gip29、gip32、gip33期中k8smaster主节点为gip28
注意:以下操作如果没有特殊说明,则默认在所有的节点均执行。
一、安装docker
配置yum官方仓库,安装docker组件
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sed -i 's/enabled=0/enabled=1/g' /etc/yum.repos.d/CentOS-Linux-BaseOS.repo
yum install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin启动docker服务
systemctl start docker
systemctl enable docker
systemctl status docker
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.4/cri-dockerd-0.3.4-3.el7.x86_64.rpm
#或者 scp [email protected]:/db/storage/zhaoxiangqian/k8s-0807/* ./ 从本地下载
rpm -ivh cri-dockerd-0.3.4-3.el7.x86_64.rpm
重载系统守护进程→设置cri-dockerd自启动→启动cri-dockerd
systemctl daemon-reload
systemctl enable cri-docker.socket cri-docker
systemctl start cri-docker.socket cri-docker
systemctl status cri-docker.socket
二、安装Kubernetes
kubectl是kubernetes的命令行工具,是操作、检查集群的重要组件。这里通过 curl 方式来安装 kubectl最新发行版v1.27.4下载校验文件,验证 kubectl 的可执行文件。
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
echo "$(cat kubectl.sha256) kubectl" | sha256sum --check
安装 kubectl
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
安装 kubeadm、kubelet 和 kubectl,配置yum文件,因为国内无法直接访问google,这里需要将官网中的google的源改为国内源,以阿里云为例:
vim /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=1
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
执行安装,
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
结果输出以下内容即为安装成功。
systemctl enable --now kubelet
安装runc,这是Kubernetes必须要的运行环境。这里下载之后上传到server。
下载地址:https://github.com/opencontainers/runc/releases
install -m 755 runc.amd64 /usr/local/bin/runc
runc -v # 检查是否安装成功
设置所需的 sysctl 参数,参数在重新启动后保持不变
vim /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
删除默认 container的cri插件,将默认配置conf文件,并重启containerd
重启容器服务
systemctl stop containerd;systemctl start containerd;systemctl status containerd
三、初始化主节点,并加入其他worker节点
kubeadm init --node-name=gip28 --kubernetes-version=v1.28.0 \
--image-repository=registry.aliyuncs.com/google_containers \
--cri-socket=unix:///var/run/cri-dockerd.sock \
--apiserver-advertise-address=172.17.136.28 \
--pod-network-cidr=10.144.0.0/16 \
--service-cidr=10.96.0.0/12
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.17.136.28:6443 --token woj9g8.0v1p79sikkypfwlh \
--discovery-token-ca-cert-hash sha256:c6adaa9e966eaf59afc771b063d5fae5e701bf35163c856699e28431409f2d89
[root@gip28 ~]#
解释: 网上搜索的参考知乎大神的介绍,cp过来的
--image-repository=registry.aliyuncs.com/google_containers # 将下载容器镜像源替换为阿里云,否则因为网络原因会导致镜像拉不下来,一定会执行不成功。、--cri-socket=unix:///var/run/cri-dockerd.sock # 这是指定容器运行时,因为containerd也是Docker的组件之一,下载Docker会一并将containerd下载下来,在执行初始化时当Kubernetes检测到有多个容器运行时环境,就必须要手动选择一个。这里也可以看出containerd实际上比Docker更轻量得多。
--apiserver-advertise-address=172.17.136.28 # 为API server设置广播地址,这里选择本机的ipv4地址,这里不希望API SERVER设置在其他node上的话就不要改为其他地址。
--pod-network-cidr=10.1.0.0/16 # 指明 pod 网络可以使用的 IP 地址段,暂时不清楚的可以先不管就用这个值。
--service-cidr=10.96.0.0/16 # 为服务的虚拟 IP 地址另外指定 IP 地址段,暂时不清楚的可以先不管就用这个值。
初始化完成之后,配置环境变量
在当前节点配置环境变量:
根据主节点初始化输出的提醒信息,配置好环境变量
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
编辑当前主节点的环境变量
vim ~root/.bash_profile
export KUBECONFIG=/etc/kubernetes/admin.conf
source ~root/.bash_profile
可能遇到的报错
报错1:
[root@gip28 ~]# kubeadm init --node-name=gip28 --image-repository=registry.aliyuncs.com/google_containers --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=172.17.136.28 --pod-network-cidr=10.1.0.0/16 --service-cidr=10.255.0.0/16
[init] Using Kubernetes version: v1.27.4
[preflight] Running pre-flight checks
[WARNING FileExisting-tc]: tc not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-08-11T10:50:36+08:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/cri-dockerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: connection refused\"", error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
原因:当前kubelet未启动,或者启动一场
systemctl status kubelet
查看当前的vim /var/log/message
E0811 11:05:11.397387 8439 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Aug 11 11:05:11 gip28 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Aug 11 11:05:11 gip28 systemd[1]: kubelet.service: Failed with result 'exit-code'.
解决方法: 重试初始化
使用
kubeadm init 或 kubeadm reset -f 之后重新systemctl restart kubelet
安装配置网络插件——这里使用flannel
将kube-flannel.yml文件下载并上传到server上。
kube-flannel.yml文件下载:https://github.com/flannel-io/flannel/releases/tag/v0.22.0
下载之后,在主节点安装:kubectl apply -f kube-flannel.yml
mkdir /run/flannel/
touch /run/flannel/subnet.env
vim /run/flannel/subnet.env
FLANNEL_NETWORK=10.144.0.0/16
FLANNEL_SUBNET=10.96.0.1/12
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
#查看当前的节点状态: kubectl get nodes -o wide
[root@gip33 ~]# kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
gip28 Ready control-plane 7h39m v1.27.4
gip29 Ready <none> 4m18s v1.27.4
gip32 Ready <none> 3m v1.27.4
gip33 Ready <none> 21s v1.27.4
[root@gip33 ~]
拷贝主节点的admin.conf到其他3台worker节点
scp /etc/kubernetes/admin.conf 172.17.136.29:/etc/kubernetes/
scp /etc/kubernetes/admin.conf 172.17.136.32:/etc/kubernetes/
scp /etc/kubernetes/admin.conf 172.17.136.33:/etc/kubernetes/
# 到node节点检查admin.conf文件是否传输完成
ls /etc/kubenetes
加入其他work节点,在其余3台非master主机上执行join
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile
kubeadm join 172.17.136.28:6443 --token woj9g8.0v1p79sikkypfwlh \
--discovery-token-ca-cert-hash sha256:c6adaa9e966eaf59afc771b063d5fae5e701bf35163c856699e28431409f2d89 \
--cri-socket unix:///var/run/cri-dockerd.sock
如果不加--cri-socket unix:///var/run/cri-dockerd.sock可能遇到的报错:
报错1:
kubeadm join 172.17.136.28:6443 --token 9i5gdl.h5o1on5zaeys6nn6 --discovery-token-ca-cert-hash sha256:dbf918ea4d19f7dd10a456505124808aeea0af6ec934d7b4614647d8c063780c
需要在node节点加入master时,添加--cri-socket unix:///var/run/cri-dockerd.sock 选项,否则保持如下,无法join
Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/cri-dockerd.sock
To see the stack trace of this error execute with --v=5 or higher
报错2: 可能是网络超时引起,或者
[root@gip33 ~]# kubeadm join 172.17.136.28:6443 --token woj9g8.0v1p79sikkypfwlh --discovery-token-ca-cert-hash sha256:c6adaa9e966eaf59afc771b063d5fae5e701bf35163c856699e28431409f2d89 --cri-socket unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[WARNING FileExisting-tc]: tc not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: Unauthorized
To see the stack trace of this error execute with --v=5 or higher
[root@gip33 ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-84nvr 0/1 CrashLoopBackOff 4 (87s ago) 4m30s
kube-flannel kube-flannel-ds-jj87c 0/1 CrashLoopBackOff 77 (2m52s ago) 6h16m
kube-flannel kube-flannel-ds-w54r2 0/1 CrashLoopBackOff 3 (47s ago) 3m12s
kube-flannel kube-flannel-ds-xdqgh 0/1 Init:0/2 0 33s
kube-system coredns-7bdc4cb885-8wc97 1/1 Running 0 7h39m
kube-system coredns-7bdc4cb885-q6l2c 1/1 Running 0 7h39m
kube-system etcd-gip28 1/1 Running 0 7h40m
kube-system kube-apiserver-gip28 1/1 Running 0 7h40m
kube-system kube-controller-manager-gip28 1/1 Running 0 7h40m
kube-system kube-proxy-4xd5c 1/1 Running 0 33s
kube-system kube-proxy-5jvqg 1/1 Running 0 7h39m
kube-system kube-proxy-786q9 1/1 Running 0 3m12s
kube-system kube-proxy-sfndf 1/1 Running 0 4m30s
kube-system kube-scheduler-gip28 1/1 Running 0 7h40m
[root@gip33 ~]#
四、卸载或重置k8s节点环境
清理docke 镜像和配置文件:
docker stop $(docker ps -qa);
docker rm -f $(docker ps -qa);
systemctl stop docker ;
docker rmi -f $(docker images -q) ;
rm -rf /var/lib/etcd ;
rm -rf /etc/kubernetes/* ;
iptables -F;
systemctl stop kubelet
1、停止相关服务
systemctl stop kubelet
systemctl stop etcd
systemctl stop docker
2、卸载k8s
kubeadm reset -f
3、删除k8s相关目录
rm -rf ~/.kube/
rm -rf /etc/kubernetes/*
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/etcd
rm -rf /var/etcd