1、集群环境 IP分配: 192.168.10.110 k8s-deploy-harbor 2c2g 192.168.10.111 k8s-master1-etcd1-haproxy1 2c4g 192.168.10.112 k8s-master2-etcd2-haproxy2 2c4g 192.168.10.113 k8s-master3-etcd3-haproxy3 2c4g 192.168.10.114 k8s-node1 2c3g 192.168.10.115 k8s-node2 2c3g 192.168.10.116 k8s-node3 2c3g VIP: 192.168.10.118 2、etcd的介绍 etcd是CoreOS团队于2013年6月发起的开源项目,它的目标是构建一个高可用的分布式键值(key-value)数据库。etcd内部采用raft协议作为一致性算法,etcd基于Go语言实现。 3、etcd的特点 简单:安装配置简单,而且提供了HTTP API进行交互,使用也很简单 安全:支持SSL证书验证 快速:根据官方提供的benchmark数据,单实例支持每秒2k+读操作 可靠:采用raft算法,实现分布式系统数据的可用性和一致性 完全复制:集群中的每个节点都可以使⽤完整的存档 高可用性:Etcd可用于避免硬件的单点故障或网络络问题 一致性:每次读取都会返回跨多主机的最新写入 4、etcd的客户端操作 etcd有多个不同的API访问版本,v1版本已经废弃etcdv2和v3本质上是共享同一套raft 协议代码的两个独立的应用,接口不一样, 存储不一样,数据互相隔离。也就是说如果从Etcd v2升级到Etcd v3,原来v2的数据还是只能用v2的接口访问,v3的接口创建的数据也只能访问通过v3的接口访问。 etcd命令用法: root@k8s-deploy:~/shell# etcdctl --help 4.1查看etcd集群成员客户端信息 etcd集群成员的心跳信息 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# export NODE_IPS="192.168.10.111 192.168.10.112 192.168.10.113" && for ip in ${NODE_IPS};do /usr/local/bin/etcdctl_AP1=3 /usr/local/bin//usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health; done 2023-01-07 09:39:39.946578 W | pkg/flags: unrecognized environment variable /usr/local/bin/etcdctl_AP1=3 https://192.168.10.111:2379 is healthy: successfully committed proposal: took = 44.530175ms 2023-01-07 09:39:40.006976 W | pkg/flags: unrecognized environment variable /usr/local/bin/etcdctl_AP1=3 https://192.168.10.112:2379 is healthy: successfully committed proposal: took = 10.252366ms 2023-01-07 09:39:40.031803 W | pkg/flags: unrecognized environment variable /usr/local/bin/etcdctl_AP1=3 https://192.168.10.113:2379 is healthy: successfully committed proposal: took = 10.760717ms etcd集群的成员信息 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# export NODE_IPS="192.168.10.111 192.168.10.112 192.168.10.113" && for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.111:2379 | 1463ac8ab5f4a708 | 3.4.13 | 3.4 MB | false | false | 5 | 843466 | 843466 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.112:2379 | 6d195d5665446f3c | 3.4.13 | 3.4 MB | true | false | 5 | 843466 | 843466 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.113:2379 | cce14a1a3ff3b166 | 3.4.13 | 3.3 MB | false | false | 5 | 843466 | 843466 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ 显示etcd群集的详细信息 root@k8s-master1-etcd1-haproxy1:~# for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.111:2379 | 1463ac8ab5f4a708 | 3.4.13 | 3.6 MB | false | false | 2 | 1313 | 1313 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.112:2379 | 6d195d5665446f3c | 3.4.13 | 3.6 MB | true | false | 2 | 1313 | 1313 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.113:2379 | cce14a1a3ff3b166 | 3.4.13 | 3.6 MB | false | false | 2 | 1313 | 1313 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ 192.168.10.112为etcd集群的leader 人为的关闭etcd的leader:192.168.10.112,查看etcd leader是否会跳转到其它的etcd #1、关闭etcd leader root@k8s-master2-etcd2-haproxy2:~# systemctl stop etcd #2、在其他的etcd root@k8s-master1-etcd1-haproxy1:/usr/local/bin# for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.111:2379 | 1463ac8ab5f4a708 | 3.4.13 | 3.4 MB | false | false | 6 | 844734 | 844734 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ {"level":"warn","ts":"2023-01-07T09:50:18.198+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://192.168.10.112:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.10.112:2379: connect: connection refused\""} Failed to get the status of endpoint https://192.168.10.112:2379 (context deadline exceeded) +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.113:2379 | cce14a1a3ff3b166 | 3.4.13 | 3.3 MB | true | false | 6 | 844750 | 844750 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ 可以看到现在的LEADER变成了113 4.2查看etcd集群的数据信息 4.2.1查看所有的key root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get / --prefix --keys-onl #以路径的⽅式所有key信息 ................................... /registry/pods/default/net-test1 /registry/pods/default/net-test3 /registry/flowschemas/workload-leader-election /registry/leases/kube-node-lease/192.168.10.111 /registry/leases/kube-node-lease/192.168.10.112 /registry/leases/kube-node-lease/192.168.10.113 /registry/leases/kube-node-lease/192.168.10.114 /registry/leases/kube-node-lease/192.168.10.115 /registry/leases/kube-node-lease/192.168.10.116 4.2.2查看kubernetes中所有pod的信息 root@k8s-master1-etcd1-haproxy1:~# root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl get / --prefix --keys-only | grep pods /registry/pods/default/net-test1 /registry/pods/default/net-test2 /registry/pods/default/net-test3 /registry/pods/kube-system/calico-kube-controllers-78bc9689fb-dld4v /registry/pods/kube-system/calico-node-5f6cq /registry/pods/kube-system/calico-node-6rkmq /registry/pods/kube-system/calico-node-759hl /registry/pods/kube-system/calico-node-hpxd2 /registry/pods/kube-system/calico-node-vmzlj /registry/pods/kube-system/calico-node-zbbct /registry/pods/kube-system/coredns-f97dc456d-tghvb #在kubernetes中查看pod信息 root@k8s-master1-etcd1-haproxy1:~# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE default net-test1 1/1 Running 0 79m default net-test2 1/1 Running 0 79m default net-test3 1/1 Running 0 76m kube-system calico-kube-controllers-78bc9689fb-dld4v 1/1 Running 0 80m kube-system calico-node-5f6cq 1/1 Running 0 80m kube-system calico-node-6rkmq 1/1 Running 0 80m kube-system calico-node-759hl 1/1 Running 0 80m kube-system calico-node-hpxd2 1/1 Running 0 80m kube-system calico-node-vmzlj 1/1 Running 0 80m kube-system calico-node-zbbct 1/1 Running 0 80m kube-system coredns-f97dc456d-tghvb 1/1 Running 0 32 4.2.3查看kubernetes中所有namespace的信息 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get / --prefix --keys-only | grep namespaces /registry/namespaces/default /registry/namespaces/kube-node-lease /registry/namespaces/kube-public /registry/namespaces/kube-system /registry/namespaces/kubernetes-dashboard #在kubernetes中查看namespaces信息 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# kubectl get namespaces NAME STATUS AGE default Active 116s kube-node-lease Active 112s kube-public Active 112s kube-system Active 117s 4.2.4查看kubernetes中所有deployments的信息 root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only | grep deployments /registry/deployments/kube-system/calico-kube-controllers /registry/deployments/kube-system/coredns #在kubernetes中查看deployments root@k8s-master1-etcd1-haproxy1:~# kubectl get deployments -A NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE kube-system calico-kube-controllers 1/1 1 1 82m kube-system coredns 1/1 1 1 34m 4.2.5查看calico网络组件信息 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get / --prefix --keys-only | grep calico ................................................. /calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.calico-node /registry/clusterrolebindings/calico-node /registry/clusterroles/calico-node /registry/controllerrevisions/kube-system/calico-node-69bd9bc66c /registry/controllerrevisions/kube-system/calico-node-fbb95bfdf /registry/daemonsets/kube-system/calico-node /registry/pods/kube-system/calico-node-4nxsv /registry/pods/kube-system/calico-node-bzccb /registry/pods/kube-system/calico-node-msvg7 /registry/pods/kube-system/calico-node-wfv28 /registry/pods/kube-system/calico-node-xr2df /registry/pods/kube-system/calico-node-zbtc8 /registry/secrets/kube-system/calico-node-token-9vrg6 /registry/serviceaccounts/kube-system/calico-node 4.2.6查看指定的key #查看namespaces中default的key root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 etcdctl get /registry/namespaces/default /registry/namespaces/default k8s v1 Namespace default"*$60c1aa10-122b-41e2-a80d-24aead782cda2°䛆Z& ubernetes.io/metadata.namedefaultz{ kube-apiserverUpdatev°䛆FieldsV1:I G{"f:metadata":{"f:labels":{".":{},"f:kubernetes.io/metadata.name":{}}}} kubernetes Active" #查看calico的key root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get /calico/ipam/v2/assignment/ipv4/block/10.200.107.192-26 /calico/ipam/v2/assignment/ipv4/block/10.200.107.192-26 {"cidr":"10.200.107.192/26","affinity":"host:k8s-node3","allocations":[0,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],"unallocated":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63],"attributes":[{"handle_id":"ipip-tunnel-addr-k8s-node3","secondary":{"node":"k8s-node3","type":"ipipTunnelAddress"}}],"deleted":false} 4.2.7查看所有calico的数据 root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 etcdctl get --keys-only --prefix /calico /calico/ipam/v2/assignment/ipv4/block/10.200.107.192-26 /calico/ipam/v2/assignment/ipv4/block/10.200.159.128-26 /calico/ipam/v2/assignment/ipv4/block/10.200.169.128-26 /calico/ipam/v2/assignment/ipv4/block/10.200.219.0-26 /calico/ipam/v2/assignment/ipv4/block/10.200.36.64-26 /calico/ipam/v2/assignment/ipv4/block/10.200.97.0-26 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-master1-etcd1-haproxy1 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-master2-etcd2-haproxy2 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-master3-etcd3-haproxy3 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-node1 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-node2 /calico/ipam/v2/handle/ipip-tunnel-addr-k8s-node3 .................................................................................... 4.3 etcd增删改查数据 4.3.1添加数据 #etcd添加数据 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl put /magedu/n56 linux OK #验证添加的数据 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get /magedu/n56 /magedu/n56 linux 4.3.2改动数据 #修改数据,把name的值改为test root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl put /magedu/n56 test OK root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get /magedu/n56 /magedu/n56 test 4.3.3删除数据 #删除name root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl del /magedu/n56 test 348 root@k8s-master1-etcd1-haproxy1:/usr/local/bin# ETCDCTL_API=3 /usr/local/bin/etcdctl get /magedu/n56 4.4 etcd数据watch机制 基于不断监看数据,发生变化就主动触发通知客户端,Etcd v3的watch机制支持watch某个固定的key,也支持watch一个范围。 相比Etcd v2, Etcd v3的一些主要变化: 1)接口通过grpc提供rpc接口,放弃了v2的http接口,优势是长连接效率提升明显,缺点是使用不如以前方便,尤其对不方便维护长连接的场景。 2)废弃了原来的目录结构,变成了纯粹的kv,用户可以通过前缀匹配模式模拟目录。 3)内存中不再保存value,同样的内存可以支持存储更多的key。 4)watch机制更稳定,基本上可以通过watch机制实现数据的完全同步。 5)提供了批量操作以及事务机制,用户可以通过批量事务请求来实现Etcd v2的CAS机制(批量事务支持if条件判断)。 在etcd02上监控一个name的key root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl watch /name 在etcd01上修改name的key,在etcd02上验证 #添加一个name的值 root@k8s-master2-etcd2-haproxy2:~# ETCDCTL_API=3 /usr/local/bin/etcdctl put /name zhai OK root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl watch /name PUT /name zhai #修改name的值 root@k8s-master2-etcd2-haproxy2:~# ETCDCTL_API=3 /usr/local/bin/etcdctl put /name aaa OK root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl watch /name PUT /name zhai PUT /name aaa #删除name root@k8s-master2-etcd2-haproxy2:~# ETCDCTL_API=3 /usr/local/bin/etcdctl del /name 1 root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl watch /name PUT /name zhai PUT /name aaa DELETE /name 4.5 etcd数据备份与恢复机制 WAL是write ahead log的缩写,顾名思义,也就是在执行真正的写操作之前先写一个日志,预写日志。 wal:存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。 4.5.1 etcd 集群v3版本数据⼿动备份与恢复 数据备份 root@k8s-master1-etcd1-haproxy1:/tools# ETCDCTL_API=3 /usr/local/bin/etcdctl snapshot save etcd-20230107-bak.db {"level":"info","ts":1673060607.7006812,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"etcd-20230107-bak.db.part"} {"level":"info","ts":"2023-01-07T11:03:27.701+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"} {"level":"info","ts":1673060607.7012792,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"} {"level":"info","ts":"2023-01-07T11:03:27.735+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"} {"level":"info","ts":1673060607.7730155,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"3.4 MB","took":0.072251129} {"level":"info","ts":1673060607.7731135,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"etcd-20230107-bak.db"} Snapshot saved at etcd-20230107-bak.db 数据恢复 数据恢复时需要对整个etcd集群的所有成员做恢复 #第一步先停止业务及etcd root@k8s-master1-etcd1-haproxy1:~# systemctl stop etcd root@k8s-master2-etcd2-haproxy2:~# systemctl stop etcd root@k8s-master3-etcd3-haproxy3:~# systemctl stop etcd #第二步删除etcd的原数据目录 root@k8s-master1-etcd1-haproxy1:~# rm -rf /var/lib/etcd root@k8s-master2-etcd2-haproxy2:~# rm -rf /var/lib/etcd root@k8s-master3-etcd3-haproxy3:~# rm -rf /var/lib/etcd #恢复命令 ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ --name=etcd-192.168.10.111 \ --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ --initial-cluster-token=etcd-cluster-0 \ --initial-advertise-peer-urls=https://192.168.10.111:2380 \ --data-dir=/var/lib/etcd #注意 #--name=etcd-192.168.10.111 \ #--initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ #--initial-cluster-token=etcd-cluster-0 \ #--initial-advertise-peer-urls=https://192.168.10.111:2380 \ #--data-dir=/var/lib/etcd 上面的信息可以在下面的文件/etc/systemd/system/etcd.service中查找 #第三步在所有etcd成员上恢复数据 ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db #恢复etcd01上的数据 ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ --name=etcd-192.168.10.111 \ --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ --initial-cluster-token=etcd-cluster-0 \ --initial-advertise-peer-urls=https://192.168.10.111:2380 \ --data-dir=/var/lib/etcd ---------------------------------------------------------------------------------------------- root@k8s-master1-etcd1-haproxy1:/var/lib# ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ > --name=etcd-192.168.10.111 \ > --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ > --initial-cluster-token=etcd-cluster-0 \ > --initial-advertise-peer-urls=https://192.168.10.111:2380 \ > --data-dir=/var/lib/etcd {"level":"info","ts":1673061700.2319326,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} {"level":"info","ts":1673061700.2965689,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":677669} {"level":"info","ts":1673061700.302237,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"1463ac8ab5f4a708","added-peer-peer-urls":["https://192.168.10.111:2380"]} {"level":"info","ts":1673061700.30231,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"6d195d5665446f3c","added-peer-peer-urls":["https://192.168.10.112:2380"]} {"level":"info","ts":1673061700.30234,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"cce14a1a3ff3b166","added-peer-peer-urls":["https://192.168.10.113:2380"]} {"level":"info","ts":1673061700.309144,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} #只能将数据恢复到一个新的不存在的目录中,如果需要恢复到原目录,需要将原目录删除 #恢复etcd02上的数据 ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ --name=etcd-192.168.10.112 \ --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ --initial-cluster-token=etcd-cluster-0 \ --initial-advertise-peer-urls=https://192.168.10.112:2380 \ --data-dir=/var/lib/etcd ---------------------------------------------------------------------------------------------- root@k8s-master2-etcd2-haproxy2:~# ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ > --name=etcd-192.168.10.111 \ > --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ > --initial-cluster-token=etcd-cluster-0 \ > --initial-advertise-peer-urls=https://192.168.10.111:2380 \ > --data-dir=/var/lib/etcd {"level":"info","ts":1673061838.0671895,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} {"level":"info","ts":1673061838.1501634,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":677669} {"level":"info","ts":1673061838.1551597,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"1463ac8ab5f4a708","added-peer-peer-urls":["https://192.168.10.111:2380"]} {"level":"info","ts":1673061838.1553109,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"6d195d5665446f3c","added-peer-peer-urls":["https://192.168.10.112:2380"]} {"level":"info","ts":1673061838.1553912,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"cce14a1a3ff3b166","added-peer-peer-urls":["https://192.168.10.113:2380"]} {"level":"info","ts":1673061838.169916,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} #只能将数据恢复到一个新的不存在的目录中,如果需要恢复到原目录,需要将原目录删除 #恢复etcd03上的数据 ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ --name=etcd-192.168.10.113 \ --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ --initial-cluster-token=etcd-cluster-0 \ --initial-advertise-peer-urls=https://192.168.10.113:2380 \ --data-dir=/var/lib/etcd ---------------------------------------------------------------------------------------------- root@k8s-master3-etcd3-haproxy3:~# ETCDCTL_API=3 etcdctl snapshot restore /tools/etcd-20230107-bak.db \ > --name=etcd-192.168.10.111 \ > --initial-cluster=etcd-192.168.10.111=https://192.168.10.111:2380,etcd-192.168.10.112=https://192.168.10.112:2380,etcd-192.168.10.113=https://192.168.10.113:2380 \ > --initial-cluster-token=etcd-cluster-0 \ > --initial-advertise-peer-urls=https://192.168.10.111:2380 \ > --data-dir=/var/lib/etcd {"level":"info","ts":1673061839.653366,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} {"level":"info","ts":1673061839.8076127,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":677669} {"level":"info","ts":1673061839.8133101,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"1463ac8ab5f4a708","added-peer-peer-urls":["https://192.168.10.111:2380"]} {"level":"info","ts":1673061839.8133879,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"6d195d5665446f3c","added-peer-peer-urls":["https://192.168.10.112:2380"]} {"level":"info","ts":1673061839.8134158,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"ff9cf838088b5a51","local-member-id":"0","added-peer-id":"cce14a1a3ff3b166","added-peer-peer-urls":["https://192.168.10.113:2380"]} {"level":"info","ts":1673061839.819993,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/tools/etcd-20230107-bak.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"} #只能将数据恢复到一个新的不存在的目录中,如果需要恢复到原目录,需要将原目录删除 #第四步重启etcd systemctl daemon-reload systemctl restart etcd #etcd1 root@k8s-master1-etcd1-haproxy1:~# systemctl daemon-reload root@k8s-master1-etcd1-haproxy1:~# systemctl restart etcd #etcd2 root@k8s-master2-etcd2-haproxy2:~# systemctl daemon-reload root@k8s-master2-etcd2-haproxy2:~# systemctl restart etcd #etcd3 root@k8s-master3-etcd3-haproxy3:~# systemctl daemon-reload root@k8s-master3-etcd3-haproxy3:~# systemctl restart etcd 第五步验证 root@k8s-master1-etcd1-haproxy1:/var/lib/etcd# export NODE_IPS="192.168.10.111 192.168.10.112 192.168.10.113" && for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.111:2379 | 1463ac8ab5f4a708 | 3.4.13 | 3.4 MB | true | false | 443 | 11 | 11 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.112:2379 | 6d195d5665446f3c | 3.4.13 | 3.4 MB | false | false | 443 | 11 | 11 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.113:2379 | cce14a1a3ff3b166 | 3.4.13 | 3.4 MB | false | false | 443 | 11 | 11 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ 4.5.3 使用kubeasz来备份和恢复etcd 备份,用ezctl备份也是连接到一台etcd执行备份,然后打个快照,在把快照考到操作的机器 #备份 ./ezctl backup k8s-01 #恢复 ./ezctl restore k8s-01 #备份 root@k8s-harbor-deploy:/etc/kubeasz# ./ezctl backup k8s-01 #刚刚做的快照文件 root@k8s-harbor-deploy:/etc/kubeasz# ls /etc/kubeasz/clusters/k8s-01/backup/ snapshot.db snapshot_202301071605.db #查看本机的pod root@k8s-master1-etcd1-haproxy1:~# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE default net-test1 1/1 Running 0 60m default net-test2 1/1 Running 0 60m default net-test3 1/1 Running 0 57m default net-test4 1/1 Running 0 57m kube-system calico-kube-controllers-78bc9689fb-dld4v 1/1 Running 0 61m kube-system calico-node-5f6cq 1/1 Running 0 61m kube-system calico-node-6rkmq 1/1 Running 0 61m kube-system calico-node-759hl 1/1 Running 0 61m kube-system calico-node-hpxd2 1/1 Running 0 61m kube-system calico-node-vmzlj 1/1 Running 0 61m kube-system calico-node-zbbct 1/1 Running 0 61m kube-system coredns-f97dc456d-tghvb 1/1 Running 0 13m #删除pod root@k8s-master1-etcd1-haproxy1:~# kubectl delete pod net-test1 net-test2 net-test3 pod "net-test1" deleted pod "net-test2" deleted pod "net-test3" deleted #现有pod root@k8s-master1-etcd1-haproxy1:~# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE default net-test4 1/1 Running 0 5s kube-system calico-kube-controllers-78bc9689fb-dld4v 1/1 Running 0 65m kube-system calico-node-5f6cq 1/1 Running 0 65m kube-system calico-node-6rkmq 1/1 Running 0 65m kube-system calico-node-759hl 1/1 Running 0 65m kube-system calico-node-hpxd2 1/1 Running 0 65m kube-system calico-node-vmzlj 1/1 Running 0 65m kube-system calico-node-zbbct 1/1 Running 0 65m kube-system coredns-f97dc456d-tghvb 1/1 Running 0 17m #在etcd删除net-test4 root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl get / --prefix --keys-only |grep net-test4 /registry/events/default/net-test4.1737f87a712e63a9 /registry/events/default/net-test4.1737f87a9b8149c3 /registry/events/default/net-test4.1737f87a9ed434ec /registry/events/default/net-test4.1737f87aa2f33577 /registry/events/default/net-test4.1737f87aab51831f /registry/pods/default/net-test4 #在etcd执行删除pod ETCDCTL_API=3 /usr/local/bin/etcdctl del /registry/pods/default/net-test4 root@k8s-master1-etcd1-haproxy1:~# ETCDCTL_API=3 /usr/local/bin/etcdctl del /registry/pods/default/net-test4 #检查pod root@k8s-master1-etcd1-haproxy1:~# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-78bc9689fb-dld4v 1/1 Running 0 67m kube-system calico-node-5f6cq 1/1 Running 0 67m kube-system calico-node-6rkmq 1/1 Running 0 67m kube-system calico-node-759hl 1/1 Running 0 67m kube-system calico-node-hpxd2 1/1 Running 0 67m kube-system calico-node-vmzlj 1/1 Running 0 67m kube-system calico-node-zbbct 1/1 Running 0 67m kube-system coredns-f97dc456d-tghvb 1/1 Running 0 19m #通过etcd备份恢复,通过ezctl恢复会停止apiserver和etcd、kube-controll-manager,三台master都会停止,也会停止node节点上的kube-proxy和kubetel ./ezctl restore k8s-01 root@k8s-harbor-deploy:/etc/kubeasz# ./ezctl restore k8s-01 ansible-playbook -i clusters/k8s-01/hosts -e @clusters/k8s-01/config.yml playbooks/95.restore.yml 2023-01-07 16:12:51 INFO cluster:k8s-01 restore begins in 5s, press any key to abort: #通过etcd的备份成功恢复回pod root@k8s-master1-etcd1-haproxy1:~# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE default net-test1 1/1 Running 0 69m default net-test2 1/1 Running 0 69m default net-test3 1/1 Running 0 66m default net-test4 1/1 Running 0 66m kube-system calico-kube-controllers-78bc9689fb-dld4v 1/1 Running 0 70m kube-system calico-node-5f6cq 1/1 Running 0 70m kube-system calico-node-6rkmq 1/1 Running 0 70m kube-system calico-node-759hl 1/1 Running 0 70m kube-system calico-node-hpxd2 1/1 Running 0 70m kube-system calico-node-vmzlj 1/1 Running 0 70m kube-system calico-node-zbbct 1/1 Running 0 70m kube-system coredns-f97dc456d-tghvb 1/1 Running 0 22m #验证etcd是否是一主2从,一个LEADER export NODE_IPS="192.168.10.111 192.168.10.112 192.168.10.113" && for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done root@k8s-master2-etcd2-haproxy2:~# export NODE_IPS="192.168.10.111 192.168.10.112 192.168.10.113" && for ip in ${NODE_IPS}; do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem; done +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.111:2379 | 1463ac8ab5f4a708 | 3.4.13 | 3.6 MB | false | false | 2 | 148 | 148 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.112:2379 | 6d195d5665446f3c | 3.4.13 | 3.6 MB | true | false | 2 | 148 | 148 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.10.113:2379 | cce14a1a3ff3b166 | 3.4.13 | 3.6 MB | false | false | 2 | 148 | 148 | | 4.5.4 使用kubeasz来添加和删除etcd节点 root@k8s-deploy:/etc/kubeasz# ./ezctl --help Usage: ezctl COMMAND [args] ------------------------------------------------------------------------------------- Cluster setups: list to list all of the managed clusters checkout <cluster> to switch default kubeconfig of the cluster new <cluster> to start a new k8s deploy with name 'cluster' setup <cluster> <step> to setup a cluster, also supporting a step-by-step way start <cluster> to start all of the k8s services stopped by 'ezctl stop' stop <cluster> to stop all of the k8s services temporarily upgrade <cluster> to upgrade the k8s cluster destroy <cluster> to destroy the k8s cluster backup <cluster> to backup the cluster state (etcd snapshot) restore <cluster> to restore the cluster state from backups start-aio to quickly setup an all-in-one cluster with 'default' settings Cluster ops: add-etcd <cluster> <ip> to add a etcd-node to the etcd cluster add-master <cluster> <ip> to add a master node to the k8s cluster add-node <cluster> <ip> to add a work node to the k8s cluster del-etcd <cluster> <ip> to delete a etcd-node from the etcd cluster del-master <cluster> <ip> to delete a master node from the k8s cluster del-node <cluster> <ip> to delete a work node from the k8s cluster Extra operation: kcfg-adm <cluster> <args> to manage client kubeconfig of the k8s cluster Use "ezctl help <command>" for more information about a given command. #添加新的etcd节点 root@k8s-harbor-deploy:/etc/kubeasz# ./ezctl add-etcd k8s-01 192.168.10.114 #删除etcd节点 root@k8s-harbor-deploy:/etc/kubeasz# ./ezctl add-del k8s-01 192.168.10.114 4.6 ETCD数据恢复流程 1、恢复服务器系统 2、重新部署ETCD集群 3、停止kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy 4、停止ETCD集群 5、各ETCD节点恢复统一份备份数据 6、启动各节点并验证ETCD集群 7、启动kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy 8、验证k8s master状态及pod数据
标签:k8s,--,备份,192.168,calico,etcd,kube,节点 From: https://www.cnblogs.com/yeyouqing/p/17033084.html