首页 > 其他分享 >openGauss 5.0 主从集群 日常运维

openGauss 5.0 主从集群 日常运维

时间:2023-05-07 10:22:10浏览次数:43  
标签:5.0 node gs 运维 192.168 install openGauss data

在之前的博客我们看了openGauss 主从集群的搭建,如下:

openGauss 5.0 一主两从 复制环境 搭建手册
https://www.cndba.cn/dave/article/116528

本篇我们看下主从集群的维护。

 

1 查看集群状态

查看集群所有节点:

[[email protected] ~]$ gs_om -t status --detail
[  CMServer State   ]

node       node_ip         instance                                     state
-------------------------------------------------------------------------------
1  oracle  192.168.56.105  1    /data/openGauss/data/cmserver/cm_server Primary
2  oracle2 192.168.56.106  2    /data/openGauss/data/cmserver/cm_server Standby
3  oracle3 192.168.56.107  3    /data/openGauss/data/cmserver/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Standby Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Primary Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

查看单个节点:

[[email protected] ~]$ gs_om -t status -h oracle
-----------------------------------------------------------------------

cluster_state             : Normal
redistributing            : No
balanced                  : No

-----------------------------------------------------------------------

node                      : 1
node_name                 : oracle

node                      : 1
instance_id               : 1
node_ip                   : 192.168.56.105
data_path                 : /data/openGauss/data/cmserver/cm_server
type                      : CMServer
instance_state            : Primary

node                      : 1
instance_id               : 6001
node_ip                   : 192.168.56.105
data_path                 : /data/openGauss/install/data/dn
type                      : Datanode
instance_state            : Standby
dcf_role                  : FOLLOWER
static_connections        : 2
HA_state                  : Normal
reason                    : Normal
sender_sent_location      : 0/6011EA8
sender_write_location     : 0/6011EA8
sender_flush_location     : 0/6011EA8
sender_replay_location    : 0/6011EA8
receiver_received_location: 0/6011EA8
receiver_write_location   : 0/6011EA8
receiver_flush_location   : 0/6011EA8
receiver_replay_location  : 0/6011E08
sync_state                : Async

node                      : 1
node_name                 : oracle

node                      : 1
instance_id               : 1
node_ip                   : 192.168.56.105
data_path                 : /data/openGauss/data/cmserver/cm_server
type                      : CMServer
instance_state            : Primary

node                      : 1
node_ip                   : 192.168.56.105
type                      : Fenced UDF
state                     : Normal

-----------------------------------------------------------------------

node_state                : Normal
-----------------------------------------------------------------------

2 集群启停

在集群的任一主节点上以omm用户进行操作。

 


[[email protected] ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.


[[email protected] ~]$ gs_om -t start
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state      : Normal
redistributing     : No
node_count         : 3
Datanode State
    primary           : 1
    standby           : 2
    secondary         : 0
    cascade_standby   : 0
    building          : 0
    abnormal          : 0
    down              : 0

Successfully started cluster.


[[email protected] ~]$ gs_om -t status --detail
[  CMServer State   ]

node       node_ip         instance                                     state
-------------------------------------------------------------------------------
1  oracle  192.168.56.105  1    /data/openGauss/data/cmserver/cm_server Primary
2  oracle2 192.168.56.106  2    /data/openGauss/data/cmserver/cm_server Standby
3  oracle3 192.168.56.107  3    /data/openGauss/data/cmserver/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Standby Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Primary Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

3 switchover 切换

先查看集群状态:

[[email protected] ~]$ gs_om -t status --detail
……
cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Standby Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Primary Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

我们这里主库是192.168.56.106,我们将192.168.56.105激活成主库,在56.105 上用omm 执行:

[[email protected] ~]$ gs_ctl switchover -D /data/openGauss/install/data/dn
[2023-04-07 17:55:53.995][16727][][gs_ctl]: gs_ctl switchover ,datadir is /data/openGauss/install/data/dn 
[2023-04-07 17:55:53.995][16727][][gs_ctl]: switchover term (1)
[2023-04-07 17:55:54.008][16727][][gs_ctl]: waiting for server to switchover........
[2023-04-07 17:55:59.069][16727][][gs_ctl]: done
[2023-04-07 17:55:59.069][16727][][gs_ctl]: switchover completed (/data/openGauss/install/data/dn)

对于同一数据库,上一次主备切换未完成,不能执行下一次切换。当业务正在操作时,发起switchover,可能主机的线程无法停止导致switchover显示超时,实际后台仍然在运行,等主机线程停止后,switchover即可完成。比如在主机删除一个大的分区表时,可能无法响应switchover发起的信号。

switchover或failover成功后,执行如下命令记录当前主备机器信息:

[[email protected] ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

[[email protected] ~]$ gs_om -t status --detail
[  CMServer State   ]

node       node_ip         instance                                     state
-------------------------------------------------------------------------------
1  oracle  192.168.56.105  1    /data/openGauss/data/cmserver/cm_server Primary
2  oracle2 192.168.56.106  2    /data/openGauss/data/cmserver/cm_server Standby
3  oracle3 192.168.56.107  3    /data/openGauss/data/cmserver/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Primary Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Standby Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

注意这里有一个小细节,就是在集群正常的情况下,kill gaussdb 进程或者用gs_ctl 关闭主库,都会自动发生switchover,并作为备库自动拉起来:

[[email protected] ~]$ ps -ef|grep openG
omm       4154     1  2 10:54 ?        00:11:15 /data/openGauss/install/app/bin/om_monitor -L /var/log/omm/omm/cm/om_monitor
omm      13943  4154 25 17:50 ?        00:03:07 /data/openGauss/install/app/bin/cm_agent
omm      13963     1 15 17:50 ?        00:01:50 /data/openGauss/install/app/bin/cm_server
omm      21983     1 15 18:02 ?        00:00:05 /data/openGauss/install/app/bin/gaussdb -D /data/openGauss/install/data/dn -M pending
omm      22761     1  0 18:03 ?        00:00:00 python3 /data/openGauss/install/om/script/local/CheckSshAgent.py
omm      22805  4878  0 18:03 pts/1    00:00:00 grep --color=auto openG
[[email protected] ~]$ kill -9 21983
[[email protected] ~]$ gs_ctl stop -D /data/openGauss/install/data/dn
[2023-04-07 18:06:25.733][24541][][gs_ctl]: gs_ctl stopped ,datadir is /data/openGauss/install/data/dn 
waiting for server to shut down..... done
server stopped
[[email protected] ~]$
[[email protected] ~]$ ps -ef|grep openG
omm       4154     1  2 10:54 ?        00:11:15 /data/openGauss/install/app/bin/om_monitor -L /var/log/omm/omm/cm/om_monitor
omm      13943  4154 25 17:50 ?        00:03:11 /data/openGauss/install/app/bin/cm_agent
omm      13963     1 15 17:50 ?        00:01:52 /data/openGauss/install/app/bin/cm_server
omm      22968     1 57 18:03 ?        00:00:01 /data/openGauss/install/app/bin/gaussdb -D /data/openGauss/install/data/dn -M pending
omm      22991  4878  0 18:03 pts/1    00:00:00 grep --color=auto openG
[[email protected] ~]$ 

[[email protected] ~]$ gs_om -t status --detail
[  CMServer State   ]

node       node_ip         instance                                     state
-------------------------------------------------------------------------------
1  oracle  192.168.56.105  1    /data/openGauss/data/cmserver/cm_server Primary
2  oracle2 192.168.56.106  2    /data/openGauss/data/cmserver/cm_server Standby
3  oracle3 192.168.56.107  3    /data/openGauss/data/cmserver/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Standby Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Primary Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

4 failover 切换

上节看到的是正常的情况,但如果主机故障时,则需要在备机执行failover命令。

在原主库正常的情况下,执行failover命令,可以正常成功,也会自动恢复高可用。

[[email protected] ~]$ gs_ctl failover -D /data/openGauss/install/data/dn
[2023-04-07 18:37:21.152][9364][][gs_ctl]: gs_ctl failover ,datadir is /data/openGauss/install/data/dn 
[2023-04-07 18:37:21.152][9364][][gs_ctl]: failover term (1)
[2023-04-07 18:37:21.163][9364][][gs_ctl]:  waiting for server to failover...
.[2023-04-07 18:37:22.193][9364][][gs_ctl]:  done
[2023-04-07 18:37:22.193][9364][][gs_ctl]:  failover completed (/data/openGauss/install/data/dn)

[[email protected] ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

[[email protected] ~]$ gs_om -t status --detail
[  CMServer State   ]

node       node_ip         instance                                     state
-------------------------------------------------------------------------------
1  oracle  192.168.56.105  1    /data/openGauss/data/cmserver/cm_server Standby
2  oracle2 192.168.56.106  2    /data/openGauss/data/cmserver/cm_server Primary
3  oracle3 192.168.56.107  3    /data/openGauss/data/cmserver/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node       node_ip         instance                             state            
---------------------------------------------------------------------------------
1  oracle  192.168.56.105  6001 /data/openGauss/install/data/dn P Primary Normal
2  oracle2 192.168.56.106  6002 /data/openGauss/install/data/dn S Standby Normal
3  oracle3 192.168.56.107  6003 /data/openGauss/install/data/dn S Standby Normal
[[email protected] ~]$

在集群正常运行的情况下,切换后对有自动回复主从关系,如果节点是:Standby Need repair(Disconnected),不能自动恢复,那么就需要重构该节点。

在需要重建备库实例的节点执行重构命令:

[[email protected] ~]$ gs_ctl build -b auto -D /data/openGauss/install/data/dn

5 双主异常处理

如果在切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象时,可以参考如下步骤处理:

1.查询数据库当前的实例状态:

gs_om -t status —detail

若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。

2.确定降为备机的节点,在节点上执行如下命令关闭服务。

gs_ctl stop -D /home/omm/cluster/dn1/

3.执行以下命令,以standby模式启动备节点。

gs_ctl start -D /home/omm/cluster/dn1/ -M standby

4.保存数据库主备机器信息。

gs_om -t refreshconf

 

5.查看数据库状态,确认实例状态恢复。

标签:5.0,node,gs,运维,192.168,install,openGauss,data
From: https://www.cnblogs.com/yaoyangding/p/17378974.html

相关文章

  • CWOI 2023.05.04 题解
    mzx的动态规划杂题选讲。stoARC153D-SumofSumofDigitsP7152[USACO20DEC]BovineGeneticsGCF1542E2AbnormalPermutationPairs(hardversion)题意给定\(n,m\),求有多少对长度为\(n\)的排列\(p,q\),满足以下条件:\(p\)的字典序小于\(q\);\(p\)的逆序对......
  • 【2023.05.04】幸运的猫(下)
    本次博客主要写黑猫回家后的故事未到家前我打电话和我父亲开玩笑说要带女朋友回家过年我爹还蛮激动的,问是哪里的女孩子,我说是福州的忘记了带回家后他是什么心情了哈哈果然还是要多写日记啊,不然什么都忘记了可太糟糕了初到家中初到家里的时候是还关在笼子里的,因为想把猫养......
  • 自动化运维工具Ansible
    一、ansible是什么?ansible是新出现的自动化运维工具,基于Python开发,集合了众多运维工具(puppet、chef、func、fabric)的优点,实现了批量系统配置、批量程序部署、批量运行命令等功能。ansible是基于paramiko开发的,并且基于模块化工作,本身没有批量部署的能力。真正具有批......
  • 【23.05.03】好题题解
    好题题解A题目大意:计算一个项数为\(n\)的多项式除以\(x^3-x\)的余数多项式。数据范围:对于\(100\%\)的数据:\(2\leqn\leq2\times10^5\)解题分析:水题,直接多项式除法模拟即可。需要注意细节。ACCode:#include<bits/stdc++.h>usingnamespacestd;#d......
  • drools5.0 下载地址
    http://download.jboss.org/drools/release/5.5.0.Final/参考:http://book.51cto.com/art/201405/439406.htm......
  • Linux运维5月2号
    了解安装VMware虚拟机 镜像文件以及镜像文件安装过程中的设置vmware安装步骤                                                             ......
  • 终于有人把openGauss3.0.0分布式原理讲透了,openGauss X ShardingSphere分布式原理和部
    本文为原理精讲,部署文章链接如下https://www.cnblogs.com/opengauss/p/17364285.html一、opengauss的背景和行业现状2022年,七大openGauss商业版发布,是基于openGauss3.0推出商业发行版目前海量数据库Vastbase表现最佳,一直是TOP1作者认为之所以海量数据库Vastbase......
  • 终于有人把openGauss3.0.0分布式原理讲透了,openGauss X ShardingSphere分布式原理和部
    本文为原理精讲,部署文章链接如下https://blog.51cto.com/u_13808894/6236819一、opengauss的背景和行业现状2022年,七大openGauss商业版发布,是基于openGauss3.0推出商业发行版目前海量数据库Vastbase表现最佳,一直是TOP1作者认为之所以海量数据库Vastbase目前无法被同......
  • OpenGauss备份与恢复
    备份与恢复概述数据备份是保护数据安全的重要手段之一,为了更好的保护数据安全,openGauss数据库支持三种备份恢复类型,以及多种备份恢复方案,备份和恢复过程中提供数据的可靠性保障机制。备份与恢复类型可分为逻辑备份与恢复、物理备份与恢复、闪回恢复。逻辑备份与恢复:通过逻辑......
  • android5.0使用Notification报RemoteServiceException的解决办法
    有时android5.0下使用Notification会报如下错误信息(比如开启重启动系统就要发送通知)android.app.RemoteServiceException:Badnotificationpostedfrompackage*:Couldn'tcreateicon:StatusBarIcon这个问题多数集中在setSmallIcon(R.drawable.scan......