说明
在PostgreSQL(HOT-Standby)如主库出现异常。备库如何激活;来替换主库工作。有下列2种方式
备库在recovery.conf文件中有个配置项trigger_file。它是激活standby的触发文件。当它存在;就会激活standby。
使用pg_ctl promote来激活。
演示
模拟演示主库异常关机,将备库切换为主库,然后原主库修复后切换为新的备库继续工作。
环境说明
主机名 | IP地址 | 角色 | 数据目录 |
master | 192.168.20.133 | 主库 | /var/lib/pgsql/11/data |
slave | 192.168.20.134 | 备库 | /var/lib/pgsql/11/data |
查看当前环境状态
主库
lei=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 3274
usesysid | 16774
usename | repuser
application_name | walreceiver
client_addr | 192.168.20.134
client_hostname | slave
client_port | 49896
backend_start | 2019-05-30 02:40:58.253032-04
backend_xmin |
state | streaming
sent_lsn | 0/180003C8
write_lsn | 0/180003C8
flush_lsn | 0/180003C8
replay_lsn | 0/180003C8
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
主库关闭
[root@master data]# systemctl stop postgresql-11
激活备库
作为新主库运行,删除数据库lei中表test并创建表tt
[postgres@slave ~]$ pg_ctl -D /var/lib/pgsql/11/data/ promote
waiting for server to promote.... done
server promoted
删除表test,创建表tt
[postgres@slave ~]$ psql lei;
psql (11.3)
Type "help" for help.
lei=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | lei | table | postgres
public | t | table | postgres
public | test | table | postgres
(3 rows)
lei=# drop table test;
DROP TABLE
lei=# create table tt(id int);
CREATE TABLE
手动切换几次WAL日志
lei=# select pg_switch_wal();
pg_switch_wal
---------------
0/19019058
(1 row)
lei=# select pg_switch_wal();
pg_switch_wal
---------------
0/1A000078
(1 row)
lei=# select pg_switch_wal();
pg_switch_wal
---------------
0/1B000000
(1 row)
恢复原主库
用pg_rewind命令同步新备库
[postgres@master ~]$ pg_rewind --target-pgdata /var/lib/pgsql/11/data/ --source-server='host=slave port=5432 user=postgres dbname=postgres' -P
connected to server
servers diverged at WAL location 0/19000098 on timeline 3
rewinding from last common checkpoint at 0/19000028 on timeline 3
reading source file list
reading target file list
reading WAL in target
need to copy 133 MB (total source directory size is 165 MB)
136230/136230 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
修改recovery.conf文件
由于配置是同步过来的,所以需要修改一下配置primary_conninfo
[postgres@master ~]$ mv /var/lib/pgsql/11/data/recovery.done /var/lib/pgsql/11/data/recovery.conf
[postgres@master ~]$ vi /var/lib/pgsql/11/data/recovery.conf
primary_conninfo = 'host=slave port=5432 user=replica password=replica'
启动新备库
[root@master data]# systemctl start postgresql-11
查看数据是否同步过来
可以看到表test没有了,多了tt表
postgres=# \c lei;
You are now connected to database "lei" as user "postgres".
lei=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | lei | table | postgres
public | t | table | postgres
public | tt | table | postgres
(3 rows)
主库查看进程状态
lei=# \x
Expanded display is on.
lei=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 8625
usesysid | 16774
usename | repuser
application_name | walreceiver
client_addr | 192.168.20.133
client_hostname | master
client_port | 55306
backend_start | 2019-05-30 03:26:14.645623-04
backend_xmin |
state | streaming
sent_lsn | 0/1E0000D0
write_lsn | 0/1E0000D0
flush_lsn | 0/1E0000D0
replay_lsn | 0/1E0000D0
write_lag | 00:00:00.001552
flush_lag | 00:00:00.002167
replay_lag | 00:00:00.002169
sync_priority | 0
sync_state | async
如果有异常信息,请查看数据库日志来定位问题,通常问题都是出现在几个配置文件中。
- pg_hba.conf
- postgresql.conf
- recovery.conf
至此PG主备就切换完成了!