1. PostgreSQL一主一从高可用搭建(VIP)
IP | 主机名 | 角色 |
---|---|---|
192.168.198.146 | test11 | 主服务器 |
192.168.198.143 | test12 | 备用服务器 |
192.168.198.144 | test13 | 见证服务器 |
一、基本环境配置
1、hosts文件配置(所有节点)
vim /etc/hosts
# 新增
192.168.198.146 msp-app01
192.168.198.143 msp-app02
192.168.198.144 msp-app03
2、配置SSH免密登录
root用户下免密登录设置(所有节点都执行)
ssh-keygen
ssh-copy-id 192.168.198.146
ssh-copy-id 192.168.198.143
ssh-copy-id 192.168.198.144
postgres用户下免密登录设置,postgresql需要postgres用户
#创建postgres用户
groupadd postgres
useradd -g postgres postgres
passwd postgres
su - postgres
ssh-keygen
ssh-copy-id 192.168.198.146
ssh-copy-id 192.168.198.143
ssh-copy-id 192.168.198.144
3、添加sudo权限
主节点与备用节点postgres用户添加sudo权限,用于执行VIP自动切换脚本
sudo visudo
postgres ALL=(ALL) NOPASSWD:ALL
如果上面的方法无法使用:
创建完用户后,需要给用户添加权限。
[root@localhost ~]# chmod -v u+w /etc/sudoers
mode of '/etc/sudoers' changed from 0440 (r--r-----) to 0640 (rw-r-----)
修改sudoers权限,使其可编辑,编辑sudoers文件。
vim /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
postgres ALL=(ALL) NOPASSWD:ALL
将sudoers文件权限修改为不可编辑。
[root@localhost ~]# chmod -v u-w /etc/sudoers
mode of "/etc/sudoers" changed from 0640 (rw-r-----) to 0440 (r--r-----)
可能出现问题:
1、使用test用户进行切换(su root)root用户,提示鉴定故障。
可能在权限不足带来的问题。
检查办法如下:
1)检查/etc目录下passwd的权限,该文件应该具备的权限如下:
[root@localhost ~]# ll /etc/passwd
-rw-r--r--. 1 root root 2347 4f 16 21:00 /etc/passwd
如果该权限不足,则使用chmod 补充完成。
2)检查/bin/su文件是否有s位权限 也有可能是所属用户为普通用户,该文件应该具备的权限如下:
[root@localhost ~]# ll /bin/su
-rwsrwxrwx. 1 root root 32128 8f 9 2019 /bin/su
如果该权限不足,则使用chmod u+s /bin/su补充完成即可。
二、安装配置PostgreSQL
1、下载postgresql-16源码包(所有节点)
wget https://ftp.postgresql.org/pub/source/v16.0/postgresql-16.0.tar.gz --no-check-certificate
2、安装依赖包(所有节点)
yum install gcc gcc-c++ readline-devel readline readline-dev zlib-devel
3、解压编译(所有节点)
tar -zxvf postgresql-16.0.tar.gz -C /app
cd /app/postgresql-16.0
./configure --prefix=/app/pgsql
或
./configure --prefix=/app/pgsql --without-icu
make && make install
mv /app/postgresql-16.0 /app/pgsql
4、配置数据、日志目录(所有节点)
mkdir -p /app/pgsql/{data,log}
chown -R postgres:postgres /app/pgsql
5、初始化数据库(主节点操作)
只需要在主数据库服务器上对 PostgreSQL数据库服务进行初始化并且正常启动即可,其余备服务器数据库可利用 repmgr 工具进行初始化搭建部署。
su - postgres
/app/pgsql/bin/initdb -D /app/pgsql/data/
6、启动数据库(主节点操作)
/app/pgsql/bin/pg_ctl -D /app/pgsql/data/ -l logfile start
7、验证测试(主节点操作)
#软链接
ln -s /app/pgsql/bin/psql /bin/psql
ln -s /app/pgsql/bin/pg_config /bin/pg_config
su - postgres
#测试登录
psql -U postgres -d postgres
8、配置环境变量(所有节点)
vim /etc/profile
# PG安装目录
export PGHOME=/app/pgsql/
# PG数据目录
export PGDATA=/app/pgsql/data/
export PATH=$PGHOME/bin:$PATH
export PG_CONFIG=/bin/pg_config
source /etc/profile
三、安装repmgr
1、下载源码包
wget --no-check-certificate https://www.repmgr.org/download/repmgr-5.4.1.tar.gz
2、解压编译
yum install flex libcurl-devel json-c-devel
tar -zxvf repmgr-5.4.1.tar.gz -C /app/
cd /app/repmgr-5.4.1
./configure && make install
make install 成功后,pg_bin_path 里会有 repmgr、repmgrd 两个可执行文件
四、主节点配置
1、postgresql.conf配置文件
vim /app/pgsql/data/postgresql.conf
#直接添加在配置文件最下面就可以
listen_addresses = '*'
port = 5432
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
wal_log_hints = on
2、pg_hba.conf配置文件
vim /app/pgsql/data/pg_hba.conf
#直接在文件最下面添加
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr msp-app01 trust
host replication repmgr msp-app02 trust
host replication repmgr msp-app03 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr msp-app01 trust
host repmgr repmgr msp-app02 trust
host repmgr repmgr msp-app03 trust
3、创建repmgr用户和库
[root@msp-app01 src]# su - postgres
Last login: Fri Jan 26 10:09:52 CST 2024 on pts/0
[postgres@msp-app01 ~]$ createuser -s repmgr #创建repmgr用户
[postgres@msp-app01 ~]$ createdb repmgr -O repmgr #创建repmgr库,并将其所有者设置为repmgr用户
[postgres@msp-app01 ~]$ psql -U repmgr -d repmgr #以repmgr身份登录到repmgr库
psql (16.0)
Type "help" for help.
repmgr=# alter user repmgr with password '123456'; #修改repmgr密码
ALTER ROLE
repmgr=#
4、配置用户登录数据库免密
# root用户下添加以下内容到~/.pgpass文件,用户、数据库和密码修改为自己的即可
vim /home/postgres/.pgpass
# ip:port:repmgr:repmgr:repmgr
*:*:repmgr:repmgr:123456
# 修改 ~/.pgpass 文件权限
chmod 600 /home/postgres/.pgpass
5、重启生效
[root@msp-app01 src]# su - postgres -c "pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop"
waiting for server to shut down.... done
server stopped
[root@msp-app01 src]# su - postgres -c "pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start"
waiting for server to start.... done
server started
6、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}
chown -R postgres:postgres /etc/repmgr
配置文件内容:
vim /etc/repmgr/conf/repmgr.conf
# 节点ID,各节点标识
node_id=1
# 节点名称
node_name='msp-app01'
# 本节点数据库连接信息
conninfo='host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'
# 日志管理
log_level=INFO
# log文件需要提前创建
log_file='/etc/repmgr/log/repmgrd.log'
# 此设置导致repmgrd以指定的时间间隔(以秒为单位,默认为300)发出状态日志行,描述repmgrd的当前状态
log_status_interval=10
# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'
# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'
# failover设置
failover='automatic'
# 当repmgrd确定当前节点将成为新的主节点时,将在故障转移情况下执行promote_command中定义的程序或脚本
promote_command='repmgr standby promote --log-to-file'
# %n将被替换repmgrd与新的主节点的ID,如果没有提供,repmgr standby follow将尝试自行确定新的主repmgr standby follow节点,但如果在新主节点提升后原主节点重新上线,则存在导致节点继续跟随原主节点的风险
follow_command='repmgr standby follow --log-to-file --upstream-node-id=%n'
#发生特定事件时要执行的命令,比如升级备用节点为主节点
event_notification_command='bash /usr/local/bin/repmgr_promote_self.sh %n'
#要通知的事件类型,例如 'standby_promote' 表示备用节点升级为主节点时触发通知
event_notifications='standby_promote'
# 高可用参数设置
# 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
location='location1'
# 节点优先级,选主时可能使用到(lsn > priority > node_id),0代表该节点不会被提升为主节点
priority=100
# 是否将监控数据写入monitoring_history表
monitoring_history=yes
# 故障转移之前,尝试重新连接的间隔(以秒为单位)
reconnect_interval=5
# 故障转移之前,尝试重新连接的次数
reconnect_attempts=3
# ping: repmg使用PQPing()方法测试连接
# connection: 尝试与节点建立新的连接
# query:通过现有连接在节点上执行SQL语句
connection_check_type=ping
# 写入监控数据的间隔
monitor_interval_secs=5
use_replication_slots=true
7、VIP自动切换脚本
vim /usr/local/bin/repmgr_promote_self.sh
#!/usr/bin/env bash
#虚拟IP地址
VIP=192.168.198.145
#网络接口名称
IFACE=eno16777736
#postgresql的bin目录
PATH=/app/pgsql/bin:$PATH
VIP_CONF="${VIP}/23 dev ${IFACE} label ${IFACE}"
##IPDOWN
#通过SSH连接到另一台服务器上,删除指定的VIP配置
ssh postgres@msp-app02 sudo ip addr del ${VIP_CONF}
sleep 5;
##IPUP
#本地添加VIP
sudo /usr/sbin/ip addr add ${VIP_CONF}
sudo /usr/sbin/arping -b -A -c 3 -I ${IFACE} ${VIP}
exit 0
脚本授权
chmod +x /usr/local/bin/repmgr_promote_self.sh
8、注册本地节点为主服务器节点
su - postgres
repmgr -f /etc/repmgr/conf/repmgr.conf primary register
9、查看集群信息
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
10、数据库查看集群基础信息
[postgres@msp-app01 ~]$ psql -U repmgr -d repmgr
psql (16.0)
Type "help" for help.
repmgr=# \x 1
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+---------------------------------------------------------------------------
node_id | 1
upstream_node_id |
active | t
node_name | msp-app01
type | primary
location | location1
priority | 100
conninfo | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser | repmgr
slot_name | repmgr_slot_1
config_file | /etc/repmgr/conf/repmgr.conf
11、启动repmgrd服务
修改postgresql.conf文件,加入 repmgr 共享库。
vim /app/pgsql/data/postgresql.conf
# 添加如下内容
shared_preload_libraries = 'repmgr'
重启数据库
pg_ctl -D /app/pgsql/data/ -l logfile restart
启动repmgrd服务
[postgres@msp-app01 ~]$ repmgrd -f /etc/repmgr/conf/repmgr.conf start
[2024-01-26 12:50:31] [NOTICE] redirecting logging output to "/etc/repmgr/log/repmgrd.log"
#查看/etc/repmgr/log/repmgrd.log启动是否有报错
12、添加VIP
sudo ip addr add 192.168.198.145/23 dev eno16777736
五、备用节点配置
1、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}
chown -R postgres:postgres /etc/repmgr
配置文件内容:
vim /etc/repmgr/conf/repmgr.conf
# 节点ID,各节点标识
node_id=2
# 节点名称
node_name='msp-app02'
# 本节点数据库连接信息
conninfo='host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'
# 日志管理
log_level=INFO
# log文件需要提前创建
log_file='/etc/repmgr/log/repmgrd.log'
# 此设置导致repmgrd以指定的时间间隔(以秒为单位,默认为300)发出状态日志行,描述repmgrd的当前状态
log_status_interval=10
# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'
# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'
# failover设置
failover='automatic'
# 当repmgrd确定当前节点将成为新的主节点时,将在故障转移情况下执行promote_command中定义的程序或脚本
promote_command='repmgr standby promote --log-to-file'
# %n将被替换repmgrd与新的主节点的ID,如果没有提供,repmgr standby follow将尝试自行确定新的主repmgr standby follow节点,但如果在新主节点提升后原主节点重新上线,则存在导致节点继续跟随原主节点的风险
follow_command='repmgr standby follow --log-to-file --upstream-node-id=%n'
#发生特定事件时要执行的命令,比如升级备用节点为主节点
event_notification_command='bash /usr/local/bin/repmgr_promote_self.sh %n'
#要通知的事件类型,例如 'standby_promote' 表示备用节点升级为主节点时触发通知
event_notifications='standby_promote'
# 高可用参数设置
# 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
location='location1'
# 节点优先级,选主时可能使用到(lsn > priority > node_id),0代表该节点不会被提升为主节点
priority=100
# 是否将监控数据写入monitoring_history表
monitoring_history=yes
# 故障转移之前,尝试重新连接的间隔(以秒为单位)
reconnect_interval=5
# 故障转移之前,尝试重新连接的次数
reconnect_attempts=3
# ping: repmg使用PQPing()方法测试连接
# connection: 尝试与节点建立新的连接
# query:通过现有连接在节点上执行SQL语句
connection_check_type=ping
# 写入监控数据的间隔
monitor_interval_secs=5
use_replication_slots=true
2、VIP自动切换脚本
vim /usr/local/bin/repmgr_promote_self.sh
#!/usr/bin/env bash
#虚拟IP地址
VIP=192.168.198.145
#网络接口名称
IFACE=eno16777736
#postgresql的bin目录
PATH=/app/pgsql/bin:$PATH
VIP_CONF="${VIP}/23 dev ${IFACE} label ${IFACE}"
##IPDOWN
#通过SSH连接到另一台服务器上,删除指定的VIP配置
ssh postgres@msp-app01 sudo ip addr del ${VIP_CONF}
sleep 5;
##IPUP
#本地添加VIP
sudo /usr/sbin/ip addr add ${VIP_CONF}
sudo /usr/sbin/arping -b -A -c 3 -I ${IFACE} ${VIP}
exit 0
脚本授权
chmod +x /usr/local/bin/repmgr_promote_self.sh
3、检查备库是否可克隆
su - postgres
# --dry-run表示命令测试,并不会实际执行,可用于验证是否会出现一些基本错误
repmgr -h msp-app01 -U repmgr -d repmgr -f /etc/repmgr/conf/repmgr.conf standby clone --dry-run
# 实际执行pg的克隆操作
repmgr -h msp-app01 -U repmgr -d repmgr -f /etc/repmgr/conf/repmgr.conf standby clone
4、启动备库
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start
5、注册从库为备用服务器
repmgr -f /etc/repmgr/conf/repmgr.conf --upstream-node-id=1 standby register
6、查看当前集群信息
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
[postgres@msp-app02 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
1 | msp-app01 | primary | * running | | location1 | 100 | 1 | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
2 | msp-app02 | standby | running | msp-app01 | location1 | 100 | 1 | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
7、启动repmgrd服务
# 启动 repmgrd 服务
[postgres@msp-app02 ~]$ repmgrd -f /etc/repmgr/conf/repmgr.conf start
[2024-01-26 13:11:12] [NOTICE] redirecting logging output to "/etc/repmgr/log/repmgrd.log"
#查看/etc/repmgr/log/repmgrd.log启动是否有报错
六、见证节点配置
1、启动节点postgres服务
su - postgres
#初始化数据库
initdb -D /app/pgsql/data
#启动
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start
2、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}
chown -R postgres:postgres /etc/repmgr
配置文件内容:
vim /etc/repmgr/conf/repmgr.conf
# 基本信息
node_id=3
# 节点名称,高可用集群各节点名称
node_name='msp-app03'
# 本节点数据库连接信息
conninfo='host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'
# 日志管理
log_level=INFO
log_file='/etc/repmgr/log/repmgrd.log'
log_status_interval=10
# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'
# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd -f /etc/repmgr/conf/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'
# failover设置
failover='automatic'
promote_command='repmgr standby promote -f /etc/repmgr/conf/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/conf/repmgr.conf --log-to-file --upstream-node-id=%n'
# 高可用参数设置
location='location1'
priority=100
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
connection_check_type=ping
monitor_interval_secs=5
use_replication_slots=true
3、启动repmgrd服务
修改postgresql.conf,加入repmgr共享库。
vim /app/pgsql/data/postgresql.conf
# 添加如下内容
listen_addresses = '*'
shared_preload_libraries = 'repmgr'
修改 pg_hba.conf 文件
vim /app/pgsql/data/pg_hba.conf
# 添加如下内容
host repmgr repmgr 0.0.0.0/0 trust
重启数据库,创建repmgr用户和库。
su - postgres
#重启
pg_ctl -D /app/pgsql/data/ -l logfile restart
#创建repmgr用户和库
[postgres@msp-app03 ~]$ psql -U postgres -d postgres
psql (16.0)
Type "help" for help.
postgres=# create user repmgr with password '123456' superuser replication;
CREATE ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE
postgres=# \q
启动repmgrd服务
repmgrd -f /etc/repmgr/conf/repmgr.conf start
4、注册witness
repmgr -f /etc/repmgr/conf/repmgr.conf witness register -h msp-app01 -U repmgr -d repmgr
5、查看集群
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
1 | msp-app01 | primary | * running | | location1 | 100 | 1 | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
2 | msp-app02 | standby | running | msp-app01 | location1 | 100 | 1 | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
3 | msp-app03 | witness | * running | msp-app01 | location1 | 0 | n/a | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
6、集群最终效果
[postgres@msp-app01 pgsql]$ psql -U repmgr -d repmgr
psql (16.0)
Type "help" for help.
repmgr=# \x 1
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+---------------------------------------------------------------------------
node_id | 1
upstream_node_id |
active | t
node_name | msp-app01
type | primary
location | location1
priority | 100
conninfo | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser | repmgr
slot_name | repmgr_slot_1
config_file | /etc/repmgr/conf/repmgr.conf
-[ RECORD 2 ]----+---------------------------------------------------------------------------
node_id | 2
upstream_node_id | 1
active | t
node_name | msp-app02
type | standby
location | location1
priority | 100
conninfo | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser | repmgr
slot_name | repmgr_slot_2
config_file | /etc/repmgr/conf/repmgr.conf
-[ RECORD 3 ]----+---------------------------------------------------------------------------
node_id | 3
upstream_node_id | 1
active | t
node_name | msp-app03
type | witness
location | location1
priority | 0
conninfo | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser | repmgr
slot_name | repmgr_slot_3
config_file | /etc/repmgr/conf/repmgr.conf
七、开启守护进程(所有节点)
su - postgres
/app/pgsql/bin/repmgrd -f /etc/repmgr/conf/repmgr.conf -d -p /etc/repmgr/log/repmgrd.pid
八、模拟故障自动转换
1、查看集群状态
[postgres@msp-app01 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
1 | msp-app01 | primary | * running | | location1 | 100 | 1 | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
2 | msp-app02 | standby | running | msp-app01 | location1 | 100 | 1 | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
3 | msp-app03 | witness | * running | msp-app01 | location1 | 0 | n/a | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
主节点IP信息
[postgres@msp-app01 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:ad:ac:5b brd ff:ff:ff:ff:ff:ff
inet 192.168.198.146/24 brd 192.168.198.255 scope global dynamic eno16777736
valid_lft 1390sec preferred_lft 1390sec
inet 192.168.198.145/23 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fead:ac5b/64 scope link
valid_lft forever preferred_lft forever
2、手动关闭主节点(primary),模拟故障宕机
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop
3、通过见证服务器查看集群信息变化
[postgres@msp-app01 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+----------------------+-------------+-----------+----------+----------+----------------------------------------------------------------------------
1 | msp-app01 | primary | * running | | location1 | 100 | 1 | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
2 | msp-app02 | standby | ! running as primary | | location1 | 100 | 2 | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
3 | msp-app03 | witness | * running | ! msp-app02 | location1 | 0 | n/a | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
#提升新的primary节点完成后,旧的primary节点被至为failed
[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
1 | msp-app01 | primary | - failed | ? | location1 | 100 | | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
2 | msp-app02 | primary | * running | | location1 | 100 | 2 | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
3 | msp-app03 | witness | * running | msp-app02 | location1 | 0 | n/a | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "msp-app01" (ID: 1)
HINT: execute with --verbose option to see connection error messages
4、重新将旧的 primary 节点以 standby 的身份加入到集群中
测试是否能重新加入集群
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop
repmgr node rejoin -f /etc/repmgr/conf/repmgr.conf -h msp-app02 -U repmgr -d repmgr --dry-run --force-rewind
加入集群
repmgr node rejoin -f /etc/repmgr/conf/repmgr.conf -h msp-app02 -U repmgr -d repmgr --force-rewind
5、查看最终集群状态
[postgres@msp-app02 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:40:16:78 brd ff:ff:ff:ff:ff:ff
inet 192.168.198.143/24 brd 192.168.198.255 scope global dynamic eno16777736
valid_lft 1563sec preferred_lft 1563sec
inet6 fe80::20c:29ff:fe40:1678/64 scope link
valid_lft forever preferred_lft forever
查看集群信息
[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-----------+-----------+----------+----------+------------------
1 | msp-app01 | standby | running | msp-app02 | location1 | 100 | 4 | host=msp-app01 user=rep
2 | msp-app02 | primary | * running | | location1 | 100 | 4 | host=msp-app02 user=rep
3 | msp-app03 | witness | * running | msp-app02 | location1 | 0 | n/a | host=msp-app03 user=rep
标签:PostgreSQL,log,app,msp,pgsql,一主一,conf,repmgr,搭建
From: https://www.cnblogs.com/zreo2home/p/18353390