案例说明:
在主备流复制集群运维过程中,对于同步复制节点,在读写分离环境应用需要强一致性访问时,需要监控主备之间流复制的延迟。
适用版本:
KingbaseES V8R6
一、主库查询流复制延迟
1、 通过sys_stat_replication
获取sent_lsn、write_lsn、flush_lsn、replay_lsn获取备库延迟信息
1)执行语句
select usename,application_name,client_addr,state,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), sent_lsn) sent_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), write_lsn) write_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), flush_lsn) flush_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), replay_lsn) replay_delay
from pg_stat_replication;
2)执行结果
prod=# select usename,application_name,client_addr,state,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), sent_lsn) sent_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), write_lsn) write_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), flush_lsn) flush_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), replay_lsn) replay_delay
prod-# from pg_stat_replication;
usename | application_name | client_addr | state | sent_delay | write_delay | flush_delay | replay_delay
---------+------------------+---------------+-----------+------------+-------------+-------------+--------------
esrep | node2 | 192.168.1.202 | streaming | 0 | 0 | 0 | 0
(1 row)
2、通过slot获取延迟信息
restart_lsn 最老的WAL日志的LSN 在复制槽中
数据库会把所有复制槽里restart_lsn最小的作为最老的LSN,这个LSN往后的所有较新的日志都会保留下来。
1)查询语句
select slot_name,plugin,slot_type,database,active,
xmin,catalog_xmin,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay
from pg_replication_slots;
2)查询结果
prod=# select slot_name,plugin,slot_type,database,active,
prod-# xmin,catalog_xmin,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay
prod-# from pg_replication_slots;
slot_name | plugin | slot_type | database | active | xmin | catalog_xmin | restart_delay
---------------+--------+-----------+----------+--------+------+--------------+---------------
repmgr_slot_2 | | physical | | t | 2130 | | 0
(1 row)
3)查询语句
select slot_name,plugin,slot_type,database,temporary,active,active_pid,
xmin,catalog_xmin,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),confirmed_flush_lsn) flush_delay
from pg_replication_slots;
4)查询结果
prod=# select slot_name,plugin,slot_type,database,temporary,active,active_pid,
prod-# xmin,catalog_xmin,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),confirmed_flush_lsn) flush_delay
prod-# from pg_replication_slots;
slot_name | plugin | slot_type | database | temporary | active | active_pid | xmin | catalog_xmin | restart_
delay | flush_delay
---------------+--------+-----------+----------+-----------+--------+------------+------+--------------+---------
------+-------------
repmgr_slot_2 | | physical | | f | t | 10895 | 2130 | |
0 |
(1 row)
二、从备库执行
1)查询语句
select pg_is_in_recovery(),pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp();
2)查询结果
test=# select pg_is_in_recovery(),pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(),
test-# pg_last_xact_replay_timestamp();
pg_is_in_recovery | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_replay_timestamp
-------------------+-------------------------+------------------------+-------------------------------
t | 2/1D000B50 | 2/1D000B50 | 2023-10-10 11:50:16.738548+08
(1 row)
3)查询语句
select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(),pg_last_xact_replay_timestamp();
4)查询结果
test=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
test-# pg_last_wal_replay_lsn(),pg_last_xact_replay_timestamp();
pg_is_in_recovery | pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_re
play_timestamp
-------------------+-------------------------+-------------------------+------------------------+----------------
---------------
t | f | 2/1D000B50 | 2/1D000B50 | 2023-10-10 11:5
0:16.738548+08
(1 row)
5)查询语句
SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;
6)查询结果
test=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
test-# THEN 0
test-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
test-# END AS log_delay;
log_delay
-----------
0
(1 row)
三、查看主备之间的延迟wal差异量
1)查看主库
prod=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
2/1D000B50
(1 row)
2)查看备库
select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();
test=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
test-# pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();
pg_is_in_recovery | pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_re
play_timestamp
-------------------+-------------------------+-------------------------+------------------------+----------------
---------------
t | f | 1/1D000B50 | 1/1D000B50 | 2023-10-10 11:5
0:16.738548+08
(1 row)
3)查看lsn差异函数
pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn)
pg_xlog_location_diff (location pg_lsn, location pg_lsn)
4)获取wal日志差异
prod=# select pg_wal_lsn_diff('2/1D000B50','1/1D000B50');
pg_wal_lsn_diff
-----------------
4294967296
(1 row)
prod=# select round(4294967296/pow(1024,3.0),2) missing_lsn_GiB;
missing_lsn_GiB
-----------------
4.00
(1 row)
四、总结
可以通过脚本的方式监控主备流复制之间的延迟,第一时间获悉集群运行状态。