使用Patroni管理LightDB高可用

标签：01 LightDB 可用 16 patroni 10.19 12 Patroni lightdb1

测试环境

CPU: 海光x86
OS: Kylin Advanced Server V10 SP1
LightDB: 13.8-22.3
Patroni: 2.1.3
etcd: 3.5.4

安装部署etcd集群

需要3台机器。
centos/RHEL等可以从epel获取etcd。
麒麟ky10,ky10sp1没有etcd包，可以使用lightdb预编译的etcd-3.5.4。
解压并设置PATH：

cd $HOME
tar xzf etcd-3.5.4-ky10.hygon.tar.gz
export PATH=$HOME/etcd/bin:$PATH
which etcd
etcd --version

启动集群，

TOKEN=token-01
CLUSTER_STATE=new
NAME_1=machine-2
NAME_2=machine-34
NAME_3=machine-35
HOST_1=10.19.64.2
HOST_2=10.19.64.34
HOST_3=10.19.64.35
CLUSTER=${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380

# 在10.19.62.2 （machine-2）启动
THIS_IP=${HOST_1}
THIS_NAME=${NAME_1}
etcd --data-dir=data.etcd --name ${THIS_NAME} \
	--initial-advertise-peer-urls http://${THIS_IP}:2380 --listen-peer-urls http://${THIS_IP}:2380 \
	--advertise-client-urls http://${THIS_IP}:2379 --listen-client-urls http://${THIS_IP}:2379 \
  --enable-v2=true \
	--initial-cluster ${CLUSTER} \
	--initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN}

# 在另外两台机器启动，只需要修改THIS_NAME和THIS_IP

安装LightDB

数据库部署在机器A、B，每一个LightDB实例会有对应一个patroni，1对1。
在机器A安装单机版lightdb-13.8-22.3，安装时选择创建实例。这个实例会作为primary。
在机器B安装单机版lightdb-13.8-22.3，安装时选择不创建实例。机器B会作为standby。
安装步骤详见LightDB安装手册。
本文示例中机器A为10.19.64.76, 机器B为10.19.64.77。

安装patroni

在机器A, B安装patroni。
解压，安装依赖：

tar xzf patroni-2.1.3-lightdb.tar.gz
cd patroni-2.1.3-lightdb
pip3 install --user -r requirements.txt
pip3 install --user psycopg
sudo yum install python3-psutil

或离线安装依赖：

unzip patroni-requirements-ky10sp1.hygon.zip
pip3 install --user *.whl
pip3 install --user ydiff-1.2.tar.gz
pip3 install --user python-etcd-0.4.5.tar.gz
pip3 install --user pysyncobj-0.3.11.tar.gz
pip3 install --user psutil-5.7.3.tar.gz
pip3 install --user cffi-1.15.1.tar.gz

使用patroni启动lightdb集群

参考patroni文档: https://patroni.readthedocs.io/en/latest/existing_data.html

机器A： primary

进入patroni-2.1.3-lightdb目录, 根据实际IP、端口修改lightdb0.yml

# etcd配置段中添加hosts, 去掉原有hosts、host配置
  hosts:
  - 10.19.64.2:2379
  - 10.19.64.34:2379
  - 10.19.64.35:2379

# postgresql配置段
  # 修改listen,设置为机器A上实例的监听端口和IP
  listen: 127.0.0.1,10.19.64.76:55432
  connect_addr: 10.19.64.76:55432
  # 修改data_dir，同前文安装时一致
  data_dir: /home/lightdb/cluster_A/lightdb0
  # 修改superuser用户名密码, 同前文安装时保持一致, 如
  superuser:
    username: lightdb
    password: lightdb999

检查环境变量是否正确配置。
此时如果db实例未启动则启动：lt_ctl start
连接实例，根据ligntdb0.yml中authentication创建patroni需要的db用户：

ltsql -p 55432 -U lightdb
create user replicator with replication encrypted password 'rep-pass';
create user rewind_user with encrypted password 'rewind_password';

因为不是patroni初始化（bootstrap）的实例, yml配置文件中的pg_hba不会更新到lt_hba.conf, 需要手动加入，并reload:

# $LTDATA/lt_hba.conf中添加这一行
host    replication     replicator             10.19.64.0/24                 trust
# 重新加载lt_hba.conf
lt_ctl -D cluster_A/lightdb0/ reload

启动patroni, patroni会接管lightdb0.yml配置的实例:

./patroni.py lightdb0.yml

机器B： standby

进入patroni-2.1.3-lightdb目录, 修改lightdb1.yml

# etcd配置段中添加hosts, 去掉原有hosts、host配置
  hosts:
  - 10.19.64.2:2379
  - 10.19.64.34:2379
  - 10.19.64.35:2379

# postgresql配置段中
  # 修改listen,设置为机器A上实例的监听端口和IP
  listen: 127.0.0.1,10.19.64.77:55433
  connect_addr: 10.19.64.77:55432
  # 修改data_dir, 会从primary复制到配置的dir, 目录有读写权限
  data_dir: ....
  # authentication与primary保持一致,如：
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: lightdb
      password: lightdb999
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password

启动patroni, patroni会根据lightdb1.yml配置,从primary做basebackup并成为standby:

./patroni.py lightdb1.yml

验证复制模式

primary:

ltsql -p 55432 -h 10.19.64.76 -U lightdb
create table t1(id int);
insert into t1 values(55432);
select * from t1;
select * from pg_stat_replication;
show synchronous_standby_names;
show synchronous_commit;
\db+

standby:

ltsql -p 55433 -h 10.19.64.77 -U lightdb
create table t1(id int);
insert into t1 values(55433);
select * from t1;
select * from pg_stat_replication;
show synchronous_standby_names;
show synchronous_commit;
\db+

同步复制

patroni默认使用异步复制，可通过restapi在线修改复制模式：
在primary机器上执行：

curl -s localhost:8008/config | jq .
curl -s -XPATCH -d '{"synchronous_mode":true}' http://localhost:8008/config | jq .

postgres=# select usename,client_addr,sync_state from pg_stat_replication ;
  usename   | client_addr | sync_state 
------------+-------------+------------
 replicator | 10.19.64.77 | sync
(1 row)

不允许standby提升

修改lightdb1.yml，tags中添加nofailover, 重启:

tags:
  # ...
  nofailover: true

把primary（机器A，lightdb0）停止后，可在standby看到如下日志I am not allowed to promote：

2022-12-01 16:20:18,035 INFO: Lock owner: lightdb0; I am lightdb1
2022-12-01 16:20:18,035 INFO: establishing a new patroni connection to the postgres cluster
2022-12-01 16:20:18,054 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb0)
2022-12-01 16:20:25,115 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb0)
2022-12-01 16:20:35,114 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb0)
server signaled
2022-12-01 16:20:45,423 INFO: following a different leader because I am not allowed to promote

级联复制

使用配置lightdb2.yml, 与lightdb1在同一主机; 可比较patroni目录中lightdb1.yml与lightdb2.yml区别。
lightdb2.yml中tags包含了replicatefrom，让lightdb2这个实例从lightdb1复制，而不是lightdb0：

tags:
  # ...
  replicatefrom: lightdb1

在lightdb0（primary，patroni中叫做leader）可以看到lightdb1正在复制：

[lightdb@localhost ~]$ ltsql -p 55432 -h 10.19.64.76
ltsql (13.8-22.3)
Type "help" for help.

lightdb@lt_test=# select usename, application_name, client_addr, sync_state from pg_stat_replication ;
   usename   | application_name | client_addr | sync_state 
-------------+------------------+-------------+-------------
  replicator | lightdb1         | 10.19.64.77 | async  
(1 row)

在lightdb1（standby）上可以看到lightdb2正在复制（cascading standby）, lightdb2与lightdb都在10.19.64.77：

[lightdb@localhost ~]$ ltsql -p 55433 -h 10.19.64.77
ltsql (13.8-22.3)
Type "help" for help.

lightdb@lt_test=# select usename, application_name, client_addr, sync_state from pg_stat_replication ;
   usename   | application_name | client_addr | sync_state 
-------------+------------------+-------------+-------------
  replicator | lightdb2         | 10.19.64.77 | async  
(1 row)

多standby

使用上一节的lightdb2.yml，去掉tags中的replicatefrom, 重启。

从lightdb0可以看到lightdb1，lightdb2都在复制：

[lightdb@localhost ~]$ ltsql -p 55432 -h 10.19.64.76
ltsql (13.8-22.3)
Type "help" for help.

lightdb@lt_test=# select usename, application_name, client_addr, sync_state from pg_stat_replication ;
   usename   | application_name | client_addr | sync_state 
-------------+------------------+-------------+-------------
  replicator | lightdb1         | 10.19.64.77 | async  
  replicator | lightdb2         | 10.19.64.77 | async  
(2 rows)

把lightdb0停掉，观察lightdb1，lightdb2的日志，可以看到lightdb2成为leader（primary），lightdb1跟随了lightdb2：
lightdb1:

2022-12-01 16:54:07,307 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb0)
2022-12-01 16:54:17,815 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb0)
2022-12-01 16:54:21,232 INFO: Local timeline=2 lsn=0/C00000A0
server signaled
2022-12-01 16:54:21,249 INFO: following a different leader because I am not allowed to promote
2022-12-01 16:54:21,358 INFO: Lock owner: lightdb2; I am lightdb1
2022-12-01 16:54:21,363 INFO: Local timeline=2 lsn=0/C00000A0
2022-12-01 16:54:21,371 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb2)
2022-12-01 16:54:21,427 INFO: Lock owner: lightdb2; I am lightdb1
2022-12-01 16:54:21,432 INFO: Local timeline=2 lsn=0/C00000A0
2022-12-01 16:54:21,441 INFO: master_timeline=3
2022-12-01 16:54:21,442 INFO: master: history=1	0/6093AB80	no recovery target specified
2	0/C00000A0	no recovery target specified
2022-12-01 16:54:21,449 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb2)
2022-12-01 16:54:31,430 INFO: no action. I am (lightdb1), a secondary, and following a leader (lightdb2)

lightdb2:

2022-12-01 16:54:07,307 INFO: no action. I am (lightdb2), a secondary, and following a leader (lightdb0)
2022-12-01 16:54:17,823 INFO: no action. I am (lightdb2), a secondary, and following a leader (lightdb0)
2022-12-01 16:54:19,244 WARNING: Request failed to lightdb0: GET http://10.19.64.76:8008/patroni (HTTPConnectionPool(host='10.19.64.76', port=8008): Max retries exceeded with url: /patroni (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))))
2022-12-01 16:54:19,320 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2022-12-01 16:54:19,325 INFO: promoted self to leader by acquiring session lock
server promoting
2022-12-01 16:54:19,329 INFO: cleared rewind state after becoming the leader
2022-12-01 16:54:21,381 INFO: no action. I am (lightdb2), the leader with the lock
2022-12-01 16:54:21,419 INFO: no action. I am (lightdb2), the leader with the lock

ltcluster集群转patroni

ltcluster service pause
停止ltclusterd
停止keepalived
如果有witness，停止witness实例

以上步骤完成后，集群不会自动切换主备，再按照上文安装部署etcd集群, 安装patroni， 使用patroni启动lightdb集群章节中步骤配置即可。

集群监控

可使用rest api来获取集群状态:
https://patroni.readthedocs.io/en/latest/rest_api.html#health-check-endpoints
https://patroni.readthedocs.io/en/latest/rest_api.html#monitoring-endpoint

下载链接

todo

标签：01,LightDB,可用,16,patroni,10.19,12,Patroni,lightdb1
From： https://www.cnblogs.com/faxx/p/16950374.html

使用Patroni管理LightDB高可用

测试环境

安装部署etcd集群

安装LightDB

安装patroni

使用patroni启动lightdb集群

机器A： primary

机器B： standby

验证复制模式

同步复制

不允许standby提升

级联复制

多standby

ltcluster集群转patroni

集群监控

下载链接

todo

相关文章

赞助商

阅读排行