一、概述
1、基础信息
gs_probackup是一个用于管理openGauss数据库备份和恢复的工具。它对openGauss实例进行定期备份,以便在数据库出现故障时能够恢复服务器。
- 可用于备份单机数据库,也可对主机或者主节点数据库备机进行备份,为物理备份。
- 可备份外部目录的内容,如脚本文件、配置文件、日志文件、dump文件等。
- 支持增量备份、定期备份和远程备份。
- 可设置备份的留存策略。
2、前提条件
- 可以正常连接openGauss数据库。
- 若要使用PTRACK增量备份,需在postgresql.conf中手动添加参数“enable_cbm_tracking = on”。
- 为了防止xlog在传输结束前被清理,请适当调高postgresql.conf文件中wal_keep_segements的值。
3、限制说明
- 备份必须由运行数据库服务器的用户执行。
- 备份和恢复的数据库服务器的主版本号必须相同。
- 如果要通过ssh在远程模式下备份数据库,需要在本地和远程主机安装相同主版本的数据库,并通过ssh-copy-id remote_user@remote_host命令设置本地主机备份用户和远程主机数据库用户的无密码ssh连接。
- 远程模式下只能执行add-instance、backup、restore子命令。
- 使用restore子命令前,应先停止gaussdb进程。
- 当存在用户自定义表空间时,备份的时候要加上 --external-dirs 参数,否则,该表空间不会被备份。
- 当备份的规模比较大时,为了防止备份过程中timeout发生,请适当调整postgresql.conf文件的参数 session_timeout、wal_sender_timeout。并且在备份的命令行参数中适当调整参数--rw-timeout的值。
- 恢复时,使用-T选项把备份中的外部目录重定向到新目录时,请同时指定参数--external-mapping。
- 当使用远程备份时,请确保远程机器和备份机器的时钟同步,以防止使用--recovery-target-time恢复的场合,启动gaussdb时有可能会失败。
- 当远程备份有效时(remote-proto=ssh),请确保-h和--remote-host指定的是同一台机器。当远程备份无效时,如果指定了-h选项,请确保-h指定的是本机地址或本机主机名。
- 当前仅支持备份发布订阅的逻辑复制槽。
- 备份时,请确保服务器用户对备份的目录下所有文件有读写的权限,以防止在恢复时因权限不足的问题而失败。
- 在资源池化模式下当前仅支持本地主机全量备份和全量恢复。
- 备份将执行checkpoint与xlog switch操作,此行为将产生新的xlog,并提交事务。一主一备或一主多备场景备份时,若配置文件中synchronous_commit设置为on,备机关停可能会导致主机同步提交事务失败,进而导致备份失败。此场景下,请确认各节点状态正常,或将synchronous_commit设置为off以避免备份失败。
二、gs_probackup本地备份恢复测试
1、打开参数enable_cbm_tracking
打开参数enable_cbm_tracking,跟踪数据页的变化
在node2主执行:
postgres=# show enable_cbm_tracking;
enable_cbm_tracking
---------------------
off
(1 row)
postgres=# alter system set enable_cbm_tracking=on;
ALTER SYSTEM SET
2、本地初始化备份目录
node2执行
mkdir -p /opt/ogdata01
chown -R omm: /opt/ogdata01
chmod -R 700 /opt/ogdata01
cd /opt
mkdir opgaussbak
chown -R omm:dbgrp opgaussbak/
[omm@node2 ~]$ gs_probackup init -B /opt/opgaussbak
INFO: Backup catalog '/opt/opgaussbak' successfully inited
[omm@node2 opgaussbak]$ ls
backups wal
对目录的初始化操作实际是在备份目录下创建backups/和wal/子目录,分别用于存放备份文件和WAL文件。
打开数据库归档模式:修改postgres.conf
wal_level = hot_standby
archive_mode = on
archive_dest = '/opt/archive6543'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql.log' #
log_file_mode = 0600
3、添加本地备份实例
Node2执行:
su - omm
[omm@node2 ~]$ gs_probackup add-instance -B /opt/opgaussbak -D /opt/ogdata --instance node2bak
INFO: Instance 'node2bak' successfully inited
查看备份集
[omm@node2 ~]$ gs_probackup show -B /opt/opgaussbak/
4、本地执行一次全量备份
[omm@node2 ~]$ gs_probackup backup -B /opt/opgaussbak/ --instance node2bak -b full -D /opt/ogdata -d postgres -p 6543
--progress --log-filename=full_postgres_log --retention-redundancy=2 --compress --note='This is full backup set.'
INFO: Progress: (2044/2044). Validate file "database_map"
INFO: Backup QSDACN data files are valid
INFO: Backup QSDACN resident size: 702MB
INFO: Backup QSDACN completed
5、查看本地全备备份集
gs_probackup show -B /opt/opgaussbak/
6、第一次执行增量备份
增量之前创建测试数据
创建测试数据在postgres中
create table t4(id number);
insert into t4 values('4');
本地node2执行:
gs_probackup backup -B /opt/opgaussbak/ --instance node2bak -b PTRACK -D /opt/ogdata -d postgres -p 6543 --progress
--log-filename=incr1.log --delete-expired --delete-wal --retention-redundancy=2 --compress --note='This is the first incremental backup set.'
7、查看本地增量备份集
gs_probackup show -B /opt/opgaussbak/
8、第二次执行增量备份
创建测试表t5
create table t5(id number);
insert into t5 values('5');
gs_probackup backup -B /opt/opgaussbak/ --instance node2bak -b PTRACK -D /opt/ogdata -d postgres -p 6543 --progress
--log-filename=incr1.log --delete-expired --delete-wal --retention-redundancy=2 --compress --note='This is the first incremental backup set.'
9、查看本地第二次增量备份集
10、删除数据库并进行全量恢复
gs_ctl stop
rm -rf /opt/ogdata
[omm@node2 ~]$ gs_probackup restore -B /opt/opgaussbak/ -D /opt/ogdata -i QSDACN --instance node2bak
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup QSDACN
LOG: check external directories of backup QSDACN
WARNING: Process 22144 which used backup QSDACN no longer exists
INFO: Validating backup QSDACN
INFO: Backup QSDACN data files are valid
LOG: Thread [1]: Opening WAL segment "/opt/opgaussbak/backups/node2bak/QSDACN/database/pg_xlog/00000001000000000000009C"
INFO: Backup QSDACN WAL segments are valid
INFO: Backup QSDACN is valid.
INFO: Restoring the database from backup at 2021-04-30 16:02:47+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks...
INFO: Start restoring backup files. PGDATA size: 1296MB
LOG: Start thread 1
INFO: Backup files are restored. Transfered bytes: 1312MB, time elapsed: 21s
INFO: Restore incremental ratio (less is better): 101% (1312MB/1296MB)
INFO: Syncing restored files to disk
INFO: Restored backup files are synced, time elapsed: 0
INFO: Restore of backup QSDACN completed.
启动数据库查看数据库:
gs_ctl start
gssql postgres –p 6543
可以看到全量的恢复并没有包含我们第一次增量和第二次增量的数据。我们进行第一次增量恢复。
11、执行全量+第一次增量恢复测试并验证数据
查看需要恢复数据的增量点:我们恢复数据库到ID为QSDAT3。这个点有我们新创建的表T4.
执行恢复:
gs_probackup restore -B /opt/opgaussbak/ -D /opt/ogdata01 -i QSDAT3 --instance node2bak
gs_ctl start -D /opt/ogdata01/
gsql postgres -p 6543
12、执行全量+第二次增量恢复测试并验证数据
查看需要恢复数据的增量点:我们恢复数据库到ID为QSER8W。这个点有我们新创建的表T5.
gs_probackup restore -B /opt/opgaussbak/ -D /opt/ogdata -i QSER8W --instance node2bak
gs_ctl start -D /opt/ogdata01
gsql postgres -p 6543
13、gs_probackup配置文件解析
pg_probackup.conf文件,设置备份配置策略前的配置:
[root@node2 node2bak]# pwd
/opt/opgaussbak/backups/node2bak
[root@node2 node2bak]# cat pg_probackup.conf
# Backup instance information
pgdata = /opt/ogdata
system-identifier = 3422924873445789
设置后
gs_probackup set-config -B /opt/opgaussbak/ --instance node2bak --retention-redundancy=2 --compress-algorithm=zlib --compress-level=6
cd /opt/opgaussbak/backups/node2bak
[root@node2 node2bak]# cat pg_probackup.conf
# Backup instance information
pgdata = /opt/ogdata
system-identifier = 3422924873445789
# Retention parameters
retention-redundancy = 2
# Compression parameters
compress-algorithm = zlib
compress-level = 6
备份集目录下的backup.control文件(描述备份集的属性信息)
[omm@node2 QSEVBS]$ cat backup.control
#Configuration
backup-mode = FULL
stream = true
compress-alg = zlib
compress-level = 1
from-replica = false
#Compatibility
block-size = 8192
xlog-block-size = 8192
checksum-version = 0
program-version = 2.4.2
server-version = 9.2
#Result backup info
timelineid = 1
start-lsn = 0/A8000028
stop-lsn = 0/A80002F8
start-time = '2021-05-01 12:33:28+08'
merge-time = '2021-05-01 12:36:05+08'
end-time = '2021-05-01 12:36:33+08'
recovery-xid = 14835
recovery-time = '2021-05-01 12:33:29+08'
recovery-name = 'backup QSEV3L'
data-bytes = 1152575411
wal-bytes = 16777216
uncompressed-bytes = 2429062395
pgdata-bytes = 1342753913
status = OK
note = 'This is the first incremental backup set.'
修改备份信息后
gs_probackup set-backup -B /opt/opgaussbak/ --instance node2bak -i QSEVBS --note 'backup.control setting' --ttl 20d
INFO: Backup QSEVBS is pinned until '2021-05-21 12:33:29+08'
INFO: Adding note to backup QSEVBS: 'backup.control setting'
[root@node2 QSEVBS]# cat backup.control
#Configuration
backup-mode = FULL
stream = true
compress-alg = zlib
compress-level = 1
from-replica = false
#Compatibility
block-size = 8192
xlog-block-size = 8192
checksum-version = 0
program-version = 2.4.2
server-version = 9.2
#Result backup info
timelineid = 1
start-lsn = 0/A8000028
stop-lsn = 0/A80002F8
start-time = '2021-05-01 12:33:28+08'
merge-time = '2021-05-01 12:36:05+08'
end-time = '2021-05-01 12:36:33+08'
recovery-xid = 14835
recovery-time = '2021-05-01 12:33:29+08'
expire-time = '2021-05-21 12:33:29+08'
recovery-name = 'backup QSEV3L'
data-bytes = 1152575411
wal-bytes = 16777216
uncompressed-bytes = 2429062395
pgdata-bytes = 1342753913
status = OK
note = 'backup.control setting'
content-crc = 1707874668
14、其他常用命令
查看备份集详细信息。
[omm@node2 ~]$ gs_probackup show -B /opt/opgaussbak/ --instance node2bak -i QSERBG
删除备份实例
[omm@node2 ~]$ gs_probackup del-instance -B /opt/opgaussbak/ --instance node2bak
合并备份集
[omm@node2 ~]$ gs_probackup merge -B /opt/opgaussbak/ -i QSEVBS --instance node2bak
查看合并后的备份集:
gs_probackup show -B /opt/opgaussbak/
大致流程:
校验备份集
合并备份集
删除备份集
重命名QSEV3L为QSEVBS (将原先全备的备份ID重命名为最近的刚被删除的那个增备ID:QSEVBS)
校验新的备份集QSEVBS数据文件是否有效。
验证合并后的备份集
[omm@node2 ~]$ gs_probackup validate -B /opt/opgaussbak/ -i QSEVBS --instance node2bak
三、gs_probackup异地备份恢复测试
1、配置远程复制用户
create user rep1 with sysadmin replication identified by 'asdfg.1314';
rep1权限:sysadmin+replication
ssh互信配置
第一步:在本地机器上使用ssh-keygen产生公钥私钥对
$ ssh-keygen
第二步:用ssh-copy-id将公钥复制到远程机器中
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
注意ssh-copy-id 将key写到远程机器的 ~/ .ssh/authorized_key.文件中
第三步: 登录到远程机器不用输入密码。
2、打开参数enable_cbm_tracking,跟踪数据页的变化
在node2主执行:
show enable_cbm_tracking;
alter system set enable_cbm_tracking=on;
3、异机初始化备份目录
Node1执行
mkdir -p /opt/ogdata01
chown -R omm: /opt/ogdata01
chmod -R 700 /opt/ogdata01
cd /opt
mkdir opgaussbak
chown -R omm:dbgrp opgaussbak/
[omm@node1 ~]$ gs_probackup init -B /opt/opgaussbak
INFO: Backup catalog '/opt/opgaussbak' successfully inited
#对目录的初始化操作实际是在备份目录下创建backups/和wal/子目录,分别用于存放备份文件和WAL文件。
4、添加异机备份实例
node1执行:
[omm@node1 ~]$ gs_probackup add-instance -B /opt/opgaussbak -D /opt/ogdata --instance node1bak
--remote-proto=ssh --remote-host=192.168.126.130 --remote-port=22 --remote-path=/opt/og/bin --remote-user=omm
报错1:
LOG: Start SSH client process, pid 20743
ERROR: Agent error: /opt/og/bin/gs_probackup: error while loading shared libraries: libssl.so.1.1: cannot open
shared object file: No such file or directory
解决方法:
默认情况下,可执行文件运行时只会去lib和/usr/lib中寻找,如果库安装在别的地方,则需要更新ld.so.conf文件
vi /etc/ld.so.conf
在该文件中,追加lib的路径如下(不要加include):
[root@node1 ~]# cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/opt/og/lib
然后运行ldconfig命令让修改生效:
sudo /sbin/ldconfig –v
node1再次执行:
[omm@node1 ~]$ /opt/og/bin/gs_probackup add-instance -B /opt/opgaussbak -D /opt/ogdata --instance node1bak
--remote-proto=ssh --remote-host=192.168.126.130 --remote-port=22 --remote-path=/opt/og/bin --remote-user=omm
LOG: Start SSH client process, pid 23822
INFO: Instance 'node1bak' successfully inited
5、异地执行全量备份
node1执行:
gs_probackup backup –B /opt/opgaussbak --instance=node1 -b full -D /opt/ogdata -h 192.168.126.130 -p 6543 -d
postgres -U rep1 -W asdfg.1314 --remote-host=192.168.126.130 --remote-proto=ssh --remote-port=22 --remote-user=omm --remote-path=/opt/og/bin
报错如下:
remote-proto=ssh --remote-port=22 --remote-user=omm --remote-path=/opt/og/bin
INFO: Backup start, gs_probackup version: 2.4.2, instance: node1bak, backup ID: QSIXPL, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1
LOG: Backup destination is initialized
WARNING: This openGauss instance was initialized without data block checksums. gs_probackup have no way to detect data block corruption without them. Reinitialize PGDATA with option '--data-checksums'.
LOG: Start SSH client process, pid 23937
LOG: Database backup start
ERROR: could not connect to database postgres: FATAL: no pg_hba.conf entry for replication connection from host "192.168.126.129", user "rep1", SSL off
WARNING: backup in progress, stop backup
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
WARNING: Backup QSIXPL is running, setting its status to ERROR
解决如下:
添加 192.168.126.129 node1 可以访问node2进行备份。
host replication rep1 192.168.126.129/32 md5
再次执行
6、查看异机全量备份
[omm@node1 ~]$ gs_probackup show -B /opt/opgaussbak
7、异机增量备份
Node2主库插入数据
insert into t4 values('4');
Node1增量备份
gs_probackup backup -B /opt/opgaussbak --instance=node1bak -b PTRACK -D /opt/ogdata -h 192.168.126.130 -p 6543 -d
postgres -U rep1 -W asdfg.1314 --remote-host=192.168.126.130 --remote-proto=ssh --remote-port=22 --remote-user=omm --remote-path=/opt/og/bin
8、查看异机增量备份
9、执行全量恢复异机恢复
全量备份恢复
gs_probackup restore -B /opt/opgaussbak/ -D /opt/ogdata01 -i QSIYFH --instance node1bak
增量备份恢复
[omm@node1 ~]$ gs_probackup restore -B /opt/opgaussbak/ -D /opt/ogdata01 -i QSIZOI --instance node1bak
注意:恢复之前需要创建恢复目录,启动数据库进行数据验证
标签:opt,gs,--,备份,openGauss,backup,probackup
From: https://blog.51cto.com/u_13236892/6171289