首页 > 其他分享 >KingbaseES V8R6集群备份恢复案例之---主库single-pro备份恢复

KingbaseES V8R6集群备份恢复案例之---主库single-pro备份恢复

时间:2024-11-01 17:42:41浏览次数:1  
标签:主库 11 01 V8R6 -- 备份 kingbase 2024 keepalives

案例说明:
KingbaseES V8R6集群物理备份支持single-pro方式,本案例在集群执行single-pro方式备份并多次切换集群后,对集群执行了恢复测试,文档记录了恢复的详细过程。

适用版本:
KingbaseES V8R6

集群架构:

 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 44482 | no      | n/a
 2  | node2 | standby |   running | node1    | running | 15369 | no      | 1 second(s) ago

一、查看集群备份
在执行single-prod模式的物理备份的初始化(sys_backup.sh init)后,同时会在主备库节点执行备份:

1、主库备份

[kingbase@node201 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase info
WARN: set process-max 4 is too large, auto set to CPU core count 1
stanza: kingbase
    status: ok
    cipher: none

    db (current)
        wal archive min/max (V008R006C008B0014): 000000490000000300000040/00000049000000030000004A

        full backup: 20241101-105946F
            timestamp start/stop: 2024-11-01 10:59:46 / 2024-11-01 10:59:53
            wal start/stop: 000000490000000300000048 / 000000490000000300000048
            database size: 378.4MB, database backup size: 378.4MB
            repo1: backup set size: 378.4MB, backup size: 378.4MB

        incr backup: 20241101-105946F_20241101-110356I
            timestamp start/stop: 2024-11-01 11:03:56 / 2024-11-01 11:03:58
            wal start/stop: 00000049000000030000004A / 00000049000000030000004A
            database size: 378.8MB, database backup size: 25.1MB
            repo1: backup set size: 378.8MB, backup size: 25.1MB
            backup reference list: 20241101-105946F

2、备库备份

[kingbase@node202 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase info
WARN: set process-max 4 is too large, auto set to CPU core count 1
stanza: kingbase
    status: ok
    cipher: none

    db (current)
        wal archive min/max (V008R006C008B0014): 000000490000000300000040/0000004B000000030000004D

        full backup: 20241101-105544F
            timestamp start/stop: 2024-11-01 10:55:44 / 2024-11-01 10:55:53
            wal start/stop: 000000490000000300000046 / 000000490000000300000046
            database size: 378.4MB, database backup size: 378.4MB
            repo2: backup set size: 378.4MB, backup size: 378.4MB

二、切换集群测试

1、执行failover切换后

[kingbase@node201 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 73       | 0 bytes | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

2、执行switchover切换后

[kingbase@node201 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 75       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

3、主库timeline时间线变化
如下所示,主库节点timeline变化:

三、主库数据恢复测试

1、模拟数据丢失

prod=# drop table t3;
DROP TABLE
prod=# drop table t2;
DROP TABLE
prod=# drop table t1;
DROP TABLE
prod=# \d
                       List of relations
 Schema |          Name           |       Type        | Owner
--------+-------------------------+-------------------+--------
 public | sys_roles               | table             | system
 public | sys_stat_statements     | view              | system
 public | sys_stat_statements_all | view              | system
 public | tb1                     | table             | system
 public | teachers                | table             | system
(15 rows)

2、通过物理备份恢复数据
1)关闭集群
[kingbase@node201 bin]$ ./sys_monitor.sh stop

2)备份原data
[kingbase@node201 kingbase]$ mv data data.bk

3)执行全量恢复
如下所示,主库节点数据库恢复成功:

[kingbase@node201 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase restore
2024-11-01 11:14:17.027 P00   INFO: restore command begin 2.27: --band-width=0 --config=/home/kingbase/kbbr_repo/sys_rman.conf --exec-id=56849-a18b87e7 --kb2-host=192.168.1.202 --kb1-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --kb2-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --link-all --log-level-console=info --log-level-file=info --log-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/log --log-subprocess --non-archived-space=1024 --process-max=4 --repo1-path=/home/kingbase/kbbr_repo --stanza=kingbase
WARN: set process-max 4 is too large, auto set to CPU core count 1
2024-11-01 11:14:17.065 P00   INFO: repo1: restore backup set 20241101-105946F_20241101-110356I, recovery will start at 2024-11-01 11:03:56
2024-11-01 11:14:17.981 P00   INFO: Restore Process: FILE: 1 / 4572 0%       SZIE: 182419456 bytes / 397204747 bytes 174.0MB / 378.8MB 45%
........

2024-11-01 11:14:23.434 P00   INFO: Restore Process: FILE: 4572 / 4572 100%       SZIE: 397204747 bytes / 397204747 bytes 378.8MB / 378.8MB 100%
2024-11-01 11:14:23.435 P00   INFO: write updated /home/kingbase/cluster/R6C8/HAC8/kingbase/data/kingbase.auto.conf
2024-11-01 11:14:23.438 P00   INFO: restore global/sys_control (performed last to ensure aborted restores cannot be started)
2024-11-01 11:14:23.439 P00   INFO: restore size = 378.8MB, file total = 4572
2024-11-01 11:14:23.440 P00   INFO: restore command end: completed successfully (6417ms)

4)查看数据恢复状态
如下所示,主库节点数据库恢复完成:

# 启动主库数据库服务
[kingbase@node201 bin]$ ./sys_ctl start -D ../data

# 访问数据库
[kingbase@node201 bin]$ ./ksql -U system test
Type "help" for help.

test=# \c prod
You are now connected to database "prod" as userName "system".
prod=# \d
                       List of relations
 Schema |          Name           |       Type        | Owner
--------+-------------------------+-------------------+--------
 public | sys_roles               | table             | system
 public | sys_stat_statements     | view              | system
 public | sys_stat_statements_all | view              | system
 public | t1                      | table             | system
 public | t2                      | table             | system
 public | t3                      | table             | system
 public | tb1                     | table             | system
 public | teachers                | table             | system
(18 rows)

prod=# select count(*) from t3;
 count(*)
----------
    10000
(1 row)

四、执行备库恢复
如下所示,在执行备库clone前,将主库kingbase.auto.conf中的“restore_command”选项注释:

[kingbase@node201 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
.......

# Recovery settings generated by sys_rman restore on 2024-11-01 11:14:23
# restore_command = '/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase archive-get %f "%p"'

1、备库clone
[kingbase@node202 bin]$ ./repmgr standby clone -h 192.168.1.201 -U esrep -d esrep

2、启动备库数据库服务

[kingbase@node202 bin]$ ./sys_ctl start -D ../data

3、注册备库
[kingbase@node202 bin]$ ./repmgr standby register --force

4、查看集群状态

[kingbase@node201 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
You have mail in /var/spool/mail/kingbase

---如上所示,集群恢复完成 !

五、恢复故障案例

1、备库clone后故障
如下所示,在主库kingbase.auto.conf中没有注释“restore_command”参数后,备库执行clone,然后启动数据库服务,备库从归档日志开始恢复数据,导致主备库数据时间线timeline不一致,主备流复制建立失败:

备库sys_log日志:

[kingbase@node202 sys_log]$ tail -1000 kingbase-2024-11-01_113351.csv
2024-11-01 11:33:51.683 CST,,,32336,,67244c1f.7e50,1,,2024-11-01 11:33:51 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2024-11-01 11:33:51.688 CST,,,32338,,67244c1f.7e52,1,,2024-11-01 11:33:51 CST,,0,LOG,00000,"database system was interrupted; last known up at 2024-11-01 11:33:21 CST",,,,,,,,,""
2024-11-01 11:33:52.247 CST,,,32338,,67244c1f.7e52,2,,2024-11-01 11:33:51 CST,,0,LOG,00000,"restored log file ""0000004B.history"" from archive",,,,,,,,,""
2024-11-01 11:33:52.259 CST,,,32338,,67244c1f.7e52,3,,2024-11-01 11:33:51 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2024-11-01 11:33:52.269 CST,,,32338,,67244c1f.7e52,4,,2024-11-01 11:33:51 CST,,0,LOG,00000,"restored log file ""0000004B.history"" from archive",,,,,,,,,""
2024-11-01 11:33:52.289 CST,,,32338,,67244c1f.7e52,5,,2024-11-01 11:33:51 CST,,0,FATAL,XX000,"requested timeline 75 is not a child of this server's history","Latest checkpoint is at 3/51000058 on timeline 74, but in the history of the requested timeline, the server forked off from that timeline at 3/4D0000A0.",,,,,,,,""
2024-11-01 11:33:52.290 CST,,,32336,,67244c1f.7e50,2,,2024-11-01 11:33:51 CST,,0,LOG,00000,"startup process (PID 32338) exited with exit code 1",,,,,,,,,""
2024-11-01 11:33:52.290 CST,,,32336,,67244c1f.7e50,3,,2024-11-01 11:33:51 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
2024-11-01 11:33:52.297 CST,,,32336,,67244c1f.7e50,4,,2024-11-01 11:33:51 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,""

2、注释主库restore_command参数

3、备库执行clone

如下所示,备库执行clone后,集群恢复正常:

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | node1 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

六、总结
本文详细记录了,在集群环境下通过single-pro模式执行物理备份后的,集群恢复过程,可以用于数据库数据恢复的参考。

标签:主库,11,01,V8R6,--,备份,kingbase,2024,keepalives
From: https://www.cnblogs.com/tiany1224/p/18520950

相关文章

  • 安装xtrabackup备份mysql
    一、工具xtrabackup说明说明:1、PerconaXtraBackup:是一个适用于Mysql的开源热备份工具,它在备份期间不锁表,而且可以备份InnoDB、XtraDB以及MyISAM存储引擎的表。2、PerconaXtraBackup2.4可以备份MySQL5.5、5.6和5.7,但不支持备份MySQL8.0,如果需要备份MYSQL8.......
  • 太硬核!企业数据备份解决方案有哪些?这4个备份妙计,让人眼前一亮又一亮!
    在数字化时代,企业数据的安全与稳定是企业持续发展的关键。面对日益复杂多变的数据环境,企业数据备份已成为保障数据安全不可或缺的一环。本文将为您揭秘四个企业数据备份解决方案,旨在为企业提供全方位、高效、可靠的数据保护策略。一、外部硬盘备份外部硬盘是数据备份的常用......
  • Linux+winds定时备份服务
    系统文件定时备份简介数据备份是容灾的基础,是指为防止系统出现操作失误或系统故障导致数据丢失,而将全部或部分数据从应用主机的硬盘或阵列复制到其他存储介质的过程。灾难备份是信息系统安全的关键基础设施。重要信息系统必须构建灾难备份和恢复系统,以防范和抵御灾难所带来的......
  • RMAN之环境配置(二)---Backups to a Media Manager备份到介质管理器
    在生产库中,一般都选用第三方的磁带管理软件,但是基本对于oracle的备份和恢复都是通过调用RMAN来实现的。确定mediamanagerLibrary(媒体管理库)的位置在尝试将RMAN与媒体管理器一起使用之前,请确定媒体管理库的位置。分配或配置RMAN与媒体管理器通信的通道时,在命令行ALL......
  • python 备份文件,从 D盘 到Z盘。并且保留15天的文件
    备份文件,从D盘到Z盘。并且保留15天的文件importosimportshutilfromdatetimeimportdatetime,timedeltadefmove_and_clean_folders(a_folder,b_folder,keep_count=15):try:#获取前两天的日期yesterday=datetime.now()-timedelta(days=......
  • Dedecms数据库恢复与备份的两种方法
    使用phpMyAdmin进行备份和恢复备份步骤:登录主机控制面板。点击“数据库管理”->“登录phpMyAdmin”。使用数据库用户名和密码登录。找到要备份的数据库,全选数据表。点击“导出”,选择导出格式(如SQL),点击“执行”。恢复步骤:登录phpMyAdmin。选择要导入的数据库。......
  • Dedecms备份的数据文件位置及备份数据库的方法
    备份数据文件的位置Dedecms备份的数据库文件通常保存在以下目录中:  /dede/backup_data/用织梦DEDECMS后台备份数据库的方法第一种方法登录原站后台登录到您的Dedecms后台管理界面。备份数据库进入“系统”->“数据库备份/还原”。选择需要备份的表,建议......
  • KingbaseES V8R6集群部署案例之---install.conf配置错误
    案例说明:KingbaseESV8R6集群在通过脚本部署时,提示节点无法通过securcmdd连接,部署失败。适用版本:KingbaseESV8R6一、问题现象如下所示,在执行脚本部署时,提示无法和集群节点通过8890端口建立连接:二、问题分析1、测试节点securecmdd互信连接如下所示,节点之间securecmdd......
  • GaussDB数据库的备份与恢复
    ​1.逻辑备份-gs_dumpgs_dump是一款用于导出数据库相关信息的工具,支持导出完整一致的数据库对象(数据库、模式、表、视图等)数据,同时不影响用户对数据库的正常访问。备份sql语句gs_dump是openGauss用于导出数据库相关信息的工具,用户可以自定义导出一个数据库或其中的对象(模式......
  • 【数据库】数据库管理(下)存储过程 触发器 慢查询日志 备份与恢复
    文章目录存储过程参数控制语句触发器慢日志备份&恢复备份方法恢复方法导入导出数据存储过程数据库存储过程(StoredProcedure)是一组为了完成特定功能的SQL语句集合,这些语句被预先编译并保存在数据库中。存储过程可以接受输入参数、执行复杂的业务逻辑,并返回结果......