案例说明:
在生产环境通过sys_rman执行了物理备份后,需要在异机构建测试环境,本案例描述了通过物理备份异机恢复的详细过程及操作。
适用版本:
KingbaseES V8R3
节点信息:
[kingbase@node102 bin]$ cat /etc/hosts
......
192.168.1.101 node101 # 生产节点
192.168.1.102 node102 # 测试节点
一、生产库执行sys_rman物理备份
1、生产环境相关配置参数
# 开启归档
test=# show archive_mode ;
archive_mode
--------------
on
(1 row)
# 归档文件存储路径
test=# show archive_dest ;
archive_dest
--------------------------
/data/kingbase/arch/c290
(1 row)
# 归档配置
test=# show archive_command ;
archive_command
----------------------------------------------------------------------------
test ! -f /data/kingbase/arch/c290/%f && cp %p /data/kingbase/arch/c290/%f
(1 row)
# wal日志配置
test=# show wal_level ;
wal_level
-----------
replica
(1 row)
2、执行sys_rman物理备份
1)备份初始化
[kingbase@node101 ~]$ mkdir -p /data/kingbase/bk/c290
[kingbase@node101 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ init
2)执行数据库全备
[kingbase@node101 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ -b full backup
INFO: validate: RTATKP backup and archive log files by CRC
[kingbase@node101 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ validate
INFO: validate: RTATKP backup and archive log files by CRC
INFO: backup validation completed successfully
3)执行增量备份
# 事务操作
prod=# create table t2 as select * from t1;
SELECT 10000
prod=# select count(*) from t2;
count
-------
10000
(1 row)
# 生成检查点(在恢复时,缩短recovery时间)。
prod=# select sys_switch_xlog();
sys_switch_xlog
-----------------
0/70000A0
(1 row)
prod=# checkpoint;
CHECKPOINT
# 执行正增量备份
[kingbase@node101 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ -b page backup
INFO: validate: RTATU1 backup and archive log files by CRC
4) 查看备份信息
[kingbase@node101 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ show
==========================================================================================================
ID Recovery time Mode Current/Parent TLI Time Data start_lsn stop_lsn Status
==========================================================================================================
RTATU1 2023-04-18 14:54:03 PAGE 1 / 0 2s 628kB 0/9000028 0/A000078 OK
RTATKP 2023-04-18 14:48:27 FULL 1 / 0 2s 80MB 0/3000028 0/3000130 OK
5)查看备份文件信息
[kingbase@node101 c290]$ ls -lh
total 4.0K
drwx------ 4 kingbase kingbase 32 Apr 18 14:54 backups
-rw-r--r-- 1 kingbase kingbase 41 Apr 18 14:47 sys_rman.conf
lrwxrwxrwx 1 kingbase kingbase 25 Apr 18 15:40 wal -> /data/kingbase/arch/c290/
二、sys_rman执行异机恢复
Tips:
物理备份的恢复一般分为两个步骤
restore: 还原备份数据文件到data目录下
reocovery: 启动实例后从最近的检查点开始应用xlog日志到一致性状态后,开启数据库。
1、准备数据库环境
1)在测试主机安装和生产主机相同的数据库版本
2)创建相同的备份存储路径和xlog日志归档路径
3)归档及wal日志配置和生产库相同
2、复制生产库备份到测试主机
[kingbase@node101 c290]$ scp -r * node102:/data/kingbase/bk/c290/
3、执行sys_rman恢复
1)restore备份到data目录下
# 备份测试库data目录
[kingbase@node102 c290]$ cd /opt/Kingbase/ES/C290/
[kingbase@node102 C290]$ mv data data.bk
# 创建data目录并授权
[kingbase@node102 bin]$ mkdir -p /opt/Kingbase/ES/C290/data
[kingbase@node102 bin]$ chmod 700 /opt/Kingbase/ES/C290/data
# 在测试库上查看备份信息
[kingbase@node102 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ show
==========================================================================================================
ID Recovery time Mode Current/Parent TLI Time Data start_lsn stop_lsn Status
==========================================================================================================
RTATU1 2023-04-18 14:54:03 PAGE 1 / 0 2s 628kB 0/9000028 0/A000078 OK
RTATKP 2023-04-18 14:48:27 FULL 1 / 0 2s 80MB 0/3000028 0/3000130 OK
# 执行sys_rman restore
[kingbase@node102 bin]$ ./sys_rman -U system -W 123456 -d test -D /opt/Kingbase/ES/C290/data -B /data/kingbase/bk/c290/ restore
INFO: validate: RTATKP backup and archive log files by SIZE
INFO: validate: RTATU1 backup and archive log files by SIZE
INFO: restore complete. Recovery starts automatically when the Kingbase server is started.
如下图所示,执行restore:
2)启动测试库实例执行recovery
# 启动数据库实例
[kingbase@node102 bin]$ ./sys_ctl start -D ../../data
server starting
.......
# 查看sys_log日志
[kingbase@node102 sys_log]$ tail -1000 kingbase-2023-04-18_154713.log
2023-04-18 15:47:13 CST LOG: database system was interrupted; last known up at 2023-04-18 14:54:01 CST
2023-04-18 15:47:13 CST LOG: creating missing WAL directory "sys_xlog/archive_status"
2023-04-18 15:47:13 CST LOG: starting archive recovery
2023-04-18 15:47:13 CST LOG: restored log file "000000010000000000000009" from archive
2023-04-18 15:47:13 CST LOG: redo starts at 0/9000028
2023-04-18 15:47:13 CST LOG: redo wal segment count 1
2023-04-18 15:47:13 CST LOG: restored log file "00000001000000000000000A" from archive
2023-04-18 15:47:13 CST LOG: consistent recovery state reached at 0/A000078
2023-04-18 15:47:13 CST LOG: restored log file "00000001000000000000000B" from archive
2023-04-18 15:47:13 CST LOG: restored log file "00000001000000000000000C" from archive
cp: cannot stat ‘/data/kingbase/bk/c290//wal/00000001000000000000000D’: No such file or directory
2023-04-18 15:47:13 CST LOG: complete: 1/1
2023-04-18 15:47:13 CST LOG: redo done at 0/C0000D0
2023-04-18 15:47:13 CST LOG: last completed transaction was at log time 2023-04-18 14:54:03.704661+08
2023-04-18 15:47:13 CST LOG: restored log file "00000001000000000000000C" from archive
cp: cannot stat ‘/data/kingbase/bk/c290//wal/00000002.history’: No such file or directory
2023-04-18 15:47:13 CST LOG: selected new timeline ID: 2
2023-04-18 15:47:13 CST LOG: archive recovery complete
cp: cannot stat ‘/data/kingbase/bk/c290//wal/00000001.history’: No such file or directory
2023-04-18 15:47:13 CST LOG: MultiXact member wraparound protections are now enabled
2023-04-18 15:47:13 CST LOG: autovacuum launcher started
2023-04-18 15:47:13 CST LOG: database system is ready to accept connections
2023-04-18 15:47:13 CST LOG: starting syslogical supervisor
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database TEST
2023-04-18 15:47:13 CST LOG: manager worker [11755] at slot 0 generation 1 detaching cleanly
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database TEMPLATE1
2023-04-18 15:47:13 CST LOG: manager worker [11757] at slot 0 generation 2 detaching cleanly
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database TEMPLATE2
2023-04-18 15:47:13 CST LOG: manager worker [11758] at slot 0 generation 3 detaching cleanly
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database SAMPLES
2023-04-18 15:47:13 CST LOG: manager worker [11759] at slot 0 generation 4 detaching cleanly
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database SECURITY
2023-04-18 15:47:13 CST LOG: manager worker [11760] at slot 0 generation 5 detaching cleanly
2023-04-18 15:47:13 CST LOG: starting syslogical database manager for database prod
2023-04-18 15:47:13 CST LOG: manager worker [11761] at slot 0 generation 6 detaching cleanly
如下图所示,数据库执行reocvery操作:
三、测试库连接访问
[kingbase@node102 bin]$ ./ksql -U system -W 123456 test
ksql (V008R003C002B0290)
Type "help" for help.
test=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+-------------+-------------+--------------------
prod | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 |
SAMPLES | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 |
SECURITY | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 |
TEMPLATE0 | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/SYSTEM +
| | | | | SYSTEM=CTcb/SYSTEM
TEMPLATE1 | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/SYSTEM +
| | | | | SYSTEM=CTcb/SYSTEM
TEMPLATE2 | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | =Tc/SYSTEM +
| | | | | SYSTEM=CTcb/SYSTEM
TEST | SYSTEM | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 |
(7 rows)
test=# \c prod
You are now connected to database "prod" as user "system".
prod=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------------------------+-------+--------
PUBLIC | pathman_cache_stats | view | SYSTEM
PUBLIC | pathman_concurrent_part_tasks | view | SYSTEM
PUBLIC | pathman_config | table | SYSTEM
PUBLIC | pathman_config_params | table | SYSTEM
PUBLIC | pathman_partition_list | view | SYSTEM
PUBLIC | t1 | table | SYSTEM
PUBLIC | t2 | table | SYSTEM
(7 rows)
prod=# select count(*) from t1;
count
-------
10000
(1 row)
prod=# select count(*) from t2;
count
-------
10000
(1 row)
---如上所示,测试库数据恢复到了最近的备份点。
三、总结
sys_rman物理备份支持异机恢复,操作过程相对比较简单;可以将生产库的备份目录建立nfs共享,然后在测试环境mount共享文件系统,不用再从生产主机将备份拷贝到测试主机。