案例说明:
复现用户删除表(drop table)误操作,通过wal日志解析找到误操作时间点,执行基于时间点的恢复(PITR)。
适用版本:
KingbaseES V8R6
一、模拟业务现场操作
1、查看当前对象信息
prod=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+--------
public | dual | view | system
public | sys_stat_statements | view | system
public | t_centerprises | table | system
public | t1 | table | system
public | TB1 | table | system
public | tb2 | table | system
public | tbl_test | table | system
(7 rows)
2、执行事务操作
prod=# select count(*) from t1;
count
-------
10000
(1 row)
prod=# insert into t1 values(generate_series(1001,2000),'usr'||generate_series(1001,2000));
INSERT 0 1000
prod=# select count(*) from t1;
count
-------
11000
(1 row)
# 当前wal对应lsn
prod=# select sys_current_wal_lsn();
sys_current_wal_lsn
---------------------
0/3D000058
(1 row)
prod=# insert into t1 values(generate_series(11001,20000),'usr'||generate_series(11001,20000));
INSERT 0 9000
prod=# select count(*) from t1;
count
-------
20000
(1 row)
# 查看数据库当前wal日志文件
prod=# select pg_current_wal_lsn(),pg_walfile_name(pg_current_wal_lsn()),pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+-----------------------------------
0/3D0A5090 | 00000001000000000000003D | (00000001000000000000003D,675984)
(1 row)
# 用户误删除表
prod=# drop table t1;
DROP TABLE
二、查看当前数据库的物理备份
如下图所示:以下是用户误操作时间点之前的物理备份。
三、sys_waldump解析wal日志(查看误操作准确时间点)
Tips:
对于用户的误操作,要执行PITR的恢复,需要找到误操作的准确时间点,在生产环境中,可以根据误操作大概的时间点,对时间点前后的归档及在线日志进行解析,本案例为了简化操作,只解析了一个wal日志。
[kingbase@node102 sys_wal]$ /opt/Kingbase/ES/V8R6_C6/Server/bin/sys_waldump 00000001000000000000003D -s '0/3D000058'
1、Insert操作日志解析
prod=# select oid,relname from sys_class where oid=16500;
oid | relname
-------+---------
16500 | t1
(1 row)
2、更新pg_statistic系统表日志
prod=# select oid,relname from sys_class where oid=2696 or oid=2619;
oid | relname
------+----------------------------------
2619 | pg_statistic
2696 | pg_statistic_relid_att_inh_index
(2 rows)
3、drop table日志解析
# drop table 将清理系统表中对象的信息
prod=# select oid,relname from sys_class where oid=2608;
oid | relname
------+-----------
2608 | pg_depend
(1 row)
=如上图所示,从wal日志中可以解析到‘drop table’事务commit的时间点,“COMMIT 2022-09-27 11:09:45.005736 CST”,可以以此时间点作为PITR恢复的时间点,来恢复用户误删除的表。=
四、执行基于时间点(PITR)恢复
Tips:
对于生产环境可以异地恢复,在另外的实例执行PITR的恢复,将恢复出的数据,再导入到生产实例中。本案例是从本实例恢复,需要将数据库服务关闭。
1、关闭数据库服务并备份数据库文件
[kingbase@node102 bin]$ ./sys_ctl stop -D /data/kingbase/v8r6_c6/data
[kingbase@node102 v8r6_c6]$ mv data data.bk
2、执行PIRT恢复
[kingbase@node102 bin]$ /opt/Kingbase/ES/V8R6_C6/Server/bin/sys_rman --config=/home/kingbase/kbbr7_repo/sys_rman.conf --stanza=king --type=time --target='2022-09-27 11:09:45' restore
.......
2022-09-27 13:34:05.234 P00 INFO: Restore Process: FILE: 2605 / 2605 100% SZIE: 105160230 bytes / 105160230 bytes 100.3MB / 100.3MB 100%
2022-09-27 13:34:05.239 P00 INFO: write updated /data/kingbase/v8r6_c6/data/kingbase.auto.conf
2022-09-27 13:34:05.244 P00 INFO: restore global/sys_control (performed last to ensure aborted restores cannot be started)
2022-09-27 13:34:05.245 P00 INFO: restore size = 100.3MB, file total = 2605
2022-09-27 13:34:05.246 P00 INFO: restore command end: completed successfully (2308ms)
3、查看恢复后的数据
1)查看kingbase.auto.conf配置
[kingbase@node102 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
# Recovery settings generated by sys_rman restore on 2022-09-27 13:39:58
restore_command = '/opt/Kingbase/ES/V8R6_C6/KESRealPro/V008R006C006B0013/Server/bin/sys_rman --config=/home/kingbase/kbbr7_repo/sys_rman.conf --stanza=king archive-get %f "%p"'
recovery_target_time = '2022-09-27 11:09:45'
2)启动数据库服务
[kingbase@node102 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_c6/data
3)查看sys_log日志
4)查看恢复后的数据
5)清理kingbase.auto.conf文件后重启数据库
[kingbase@node102 bin]$ ./sys_ctl restart -D /data/kingbase/v8r6_c6/data
四、总结
对于归档模式的数据库在数据库备份及wal日志(归档和在线)保存完整的情况下,可以通过基于时间点的恢复操作,恢复用户的误操作丢失的数据。对于用户误操作的时间点,可以借助wal日志的解析获取到准确的事务操作的时间点。