首页 > 其他分享 >【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南

时间:2023-08-12 18:33:19浏览次数:42  
标签:5.0 版主 gs 21 22 Jul openGauss root opengauss

尚雷 openGauss 2023-07-29 17:58 发表于四川

收录于合集#第六届openGauss技术文章征集初审合格文章62个

前言:继前几日测试部署openGauss 5.0 并写了[[Centos/RHEL 7 安装部署openGauss 5.0 企业版 一主二备一级联操作指南]](http://mp.weixin.qq.com/s?__biz=MzIyMDE3ODk1Nw==&mid=2247510278&idx=1&sn=399a4a82472f5c30967e33a556c66420&chksm=97cd1664a0ba9f72b034fe055129128f74ecb337e8ec1badbbde6226ae650b02c08fdc797641&scene=21#wechat_redirect)的文章,近日测试了openGauss 从3.1.1升级 5.0.0,在升级过程中也遇到了一些问题。也非常希望看到此文的朋友,如果你在参照此文升级过程中遇到什么问题或者对此文有什么异议的地方,也希望能和我交流,不胜感激。

一、环境概要

本套数据库环境为openGauss 3.1.1企业版一主一备环境,前期安装部署openGauss 3.1.1前已参照openGauss官网安装了依赖包、关闭了防火墙\SElinux、调整了内核参数等其它相关所要求的环境准备,数据库相关环境信息如下:

对openGauss 3企业版集群安装部署不熟悉的可参照我之前写的文章:[Centos 7 系统 openGauss 3.1.0 一主两备集群安装部署指南],文章链接:https://www.modb.pro/db/551221

1.1 主机名称

主机名称

描述说明

opengauss-db1

主节点服务器名称

opengauss-db2

备节点服务器名称

1.2 主机地址

IP地址

描述说明

10.110.3.155

主节点IP地址

10.110.3.156

备节点一IP地址

1.3 端口号信息

端口号

参数名称

描述说明

15300

cmServerPortBase

主CM Server端口号

15300

cmServerPortStandby

备CM Server端口号

26000

dataPortBase

数据库节点的基础端口号

1.4 用户及组信息

项目名称

名称

所属类型

规划建议

用户名

omm

操作系统

建议集群各节点密码及ID相同

组名

dbgrp

操作系统

建议集群各节点组ID相同

1.5 软件目录信息

目录名称

对应名称

目录作用

/opt/software/openGauss

software

安装软件存放目录

/opt/gaussdb/install/app

gaussdbAppPath

数据库安装目录

/var/log/omm

gaussdbLogPath

日志目录

/opt/gaussdb/tmp

tmpMppdbPath

临时文件目录

/opt/gaussdb/install/om

gaussdbToolPath

数据库工具目录

/opt/gaussdb/corefile

corePath

数据库core文件目录

/opt/gaussdb/data/cmserver

cmDir

CM数据目录

/opt/gaussdb/install/data/dn

dataNode

数据库主备节点数据目录

1.6 XML配置文件信息

<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
    <!-- openGauss整体信息 -->
    <CLUSTER>
    <!-- 数据库名称 -->
        <PARAM name="clusterName" value="openGSDB" />
    <!-- 数据库节点名称(hostname) -->
        <PARAM name="nodeNames" value="opengauss-db1,opengauss-db2" />
    <!-- 节点IP,与nodeNames一一对应 -->
        <PARAM name="backIp1s" value="10.110.3.155,10.110.3.156"/>
    <!-- 数据库安装目录-->
        <PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" />
    <!-- 日志目录-->
        <PARAM name="gaussdbLogPath" value="/var/log/omm" />
    <!-- 临时文件目录-->
        <PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp"/>
    <!--数据库工具目录-->
        <PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" />
    <!--数据库core文件目录-->
        <PARAM name="corePath" value="/opt/gaussdb/corefile"/>
    <!-- openGauss类型,此处示例为单机类型,"single-inst"表示单机一主多备部署形态-->
        <PARAM name="clusterType" value="single-inst"/>
    </CLUSTER>
    <!-- 每台服务器上的节点部署信息 -->
    <DEVICELIST>
        <!-- opengauss-db1上的节点部署信息 -->
        <DEVICE sn="1000001">
        <!-- opengauss-db1的hostname -->
            <PARAM name="name" value="opengauss-db1"/>
        <!-- opengauss-db1所在的AZ及AZ优先级 -->
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
        <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="10.110.3.155"/>
            <PARAM name="sshIp1" value="10.110.3.155"/>
            
        <!--CM-->
     <!--CM数据目录-->
            <PARAM name="cmDir" value="/opt/gaussdb/install/cm" />
            <PARAM name="cmsNum" value="1" />
      <!--CM监听端口-->
            <PARAM name="cmServerPortBase" value="15300" />
            <PARAM name="cmServerlevel" value="1" />
      <!--CM所有实例所在节点名及监听ip-->
            <PARAM name="cmServerListenIp1" value="10.110.3.155,10.110.3.156" />
            <PARAM name="cmServerRelation" value="opengauss-db1,opengauss-db2" />
            
      <!--dbnode-->
        <PARAM name="dataNum" value="1"/>
      <!--DBnode端口号-->
        <PARAM name="dataPortBase" value="26000"/>
      <!--DBnode主节点上数据目录,及备机数据目录-->
        <PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>
      <!--DBnode节点上设定同步模式的节点数-->
        <PARAM name="dataNode1_syncNum" value="0"/>
        </DEVICE>

        <!-- opengauss-db2上的节点部署信息,其中"name"的值配置为主机名称(hostname) -->
        <DEVICE sn="1000002">
            <PARAM name="name" value="opengauss-db2"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="10.110.3.156"/>
            <PARAM name="sshIp1" value="10.110.3.156"/>
            <PARAM name="cmDir" value="/opt/gaussdb/install/cm" />
        </DEVICE>
    </DEVICELIST>
</ROOT>

二、准备工作

2.1 下载5.0.0软件安装包

2.1.1 下载安装包

使用注册账号登录openGauss官网https://www.opengauss.org/zh/download/下载页面,下载与操作系统匹配的openGauss 5.0.0软件安装包,选择openGauss_5.0.0 企业版下载,并将下载的软件包上传至服务器/opt/software/openGauss目录下。

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器

注:如果服务器可联网,可通过wget方式下载软件安装包。可用鼠标右键点击,然后选择“复制链接”,如数据库服务器可连外网,可在服务器上通过wget获取openGauss 5.0.0企业版软件安装包。

# root用户执行【主节点】
[root@opengauss-db1 ~]# cd /opt/software/openGauss
[root@opengauss-db1 openGauss]# wget https://opengauss.obs.cn-south-1.myhuaweicloud.com/5.0.0/x86/openGauss-5.0.0-CentOS-64bit-all.tar.gz

2.1.2 校验安装包

点击上图

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_数据库_02

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_数据库_03

,将复制的内容粘贴到文本文件,显示内容为:aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5,然后将下载的文件通过sha256sum命令进行校验,以确保下载安装包完整性。

# root用户执行【主节点】
[root@opengauss-db1 openGauss]# sha256sum openGauss-5.0.0-CentOS-64bit-all.tar.gz    
aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5  openGauss-5.0.0-CentOS-64bit-all.tar.gz
-- 如校验的值和官网SHA256值相同,表明文件完整

2.1.3 解压安装包

# root用户执行【主节点】
[root@opengauss-db1 ~]# cd /opt/software/openGauss
[root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-all.tar.gz 
[root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-om.tar.gz
[root@xsky-node1 openGauss]# ll
total 261040
drwxr-xr-x 14 root root       302 Mar 29 03:22 lib
-rw-r--r--  1 root root 133071038 Mar 29 20:11 openGauss-5.0.0-CentOS-64bit-all.tar.gz
-rw-r--r--  1 root root       105 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.sha256
-rw-r--r--  1 root root  22356000 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.tar.gz
-rw-r--r--  1 root root        65 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.sha256
-rw-r--r--  1 root root  11963876 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.tar.gz
-rw-r--r--  1 root root        65 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.sha256
-rw-r--r--  1 root root  99384569 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.tar.bz2
drwxr-xr-x 10 root root      4096 Mar 29 03:22 script
-rw-------  1 root root        65 Mar 29 03:21 upgrade_sql.sha256
-rw-------  1 root root    493211 Mar 29 03:21 upgrade_sql.tar.gz
-rw-r--r--  1 root root        32 Mar 29 03:22 version.cfg

2.2 检查健康状态

# root用户执行【任一节点】
-- 执行 gs_checkos -i A 命令
[root@opengauss-dbxxx ~]# /opt/software/openGauss/script/gs_checkos -i A --detail
Checking items:
    A1. [ OS version status ]                                   : Normal     
        [opengauss-db1]
        centos_7.9.2009_64bit     
    A2. [ Kernel version status ]                               : Normal     
        The names about all kernel versions are same. The value is "3.10.0-1160.92.1.el7.x86_64".
    A3. [ Unicode status ]                                      : Normal     
        The values of all unicode are same. The value is "LANG=en_US.UTF-8".
    A4. [ Time zone status ]                                    : Normal     
        The informations about all timezones are same. The value is "+0800".
    A5. [ Swap memory status ]                                  : Normal     
        The value about swap memory is correct.            
    A6. [ System control parameters status ]                    : Normal     
        All values about system control  parameters are correct.
    A7. [ File system configuration status ]                    : Normal     
        Both soft nofile and hard nofile are correct.      
    A8. [ Disk configuration status ]                           : Normal     
        The value about XFS mount parameters is correct.   
    A9. [ Pre-read block size status ]                          : Normal     
        The value about Logical block size is correct.     
    A11.[ Network card configuration status ]                   : Normal     
        The configuration about network card is correct.   
    A12.[ Time consistency status ]                             : Normal     
        The ntpd service is started, local time is "2023-07-21 16:24:44".
    A13.[ Firewall service status ]                             : Normal     
        The firewall service is stopped.                   
    A14.[ THP service status ]                                  : Normal     
        The THP service is stopped.                        
Total numbers:13. Abnormal numbers:0. Warning numbers:0.
-- 对非Normal值要进行调整

2.3 检查磁盘空间

# root用户执行【所有节点】
-- 通过 df -H 及 df -i 查看磁盘相应信息是否可用
-- df -h 查看磁盘空间
-- df -i 查看inode空闲数

2.4 检查版本信息

-- omm 用户 【任一节点】
-- 查询所有节点版本信息
[root@opengauss-dbxxx ~]# su - omm
Last login: Fri Jul 21 16:07:06 CST 2023 on pts/1
[omm@opengauss-dbxxx ~]$ gs_ssh -c "gsql -V"
Successfully execute command on all nodes.

Output:
[SUCCESS] opengauss-db1:
gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr  
[SUCCESS] opengauss-db2:
gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_数据库_04

2.5 检查集群状态

-- omm 用户 【任一节点】
[omm@opengauss-dbxxx ~]$ gs_om -t status --detail
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Primary Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Standby Normal

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器_05

2.6 备份数据库

物理备份数据库

-- omm 用户执行【主节点】
[root@opengauss-db1 ~]# su - omm
Last login: Fri Jul 21 16:51:53 CST 2023 on pts/1

-- 创建目录
[omm@opengauss-db1 ~]$ BACKUP_DIR=/opt/gaussdb/backup/`date '+%Y%m%d_%H%M%S'`
[omm@opengauss-db1 ~]$ mkdir -p $BACKUP_DIR

-- 执行物理备份
[omm@opengauss-db1 backup]$ gs_basebackup -D $BACKUP_DIR -p 26000 -P -l $BACKUP_DIR
INFO:  The starting position of the xlog copy of the full build is: 0/400E8B0. The slot minimum LSN is: 0/400E8B0. The disaster slot minimum LSN is: 0/0. The logical slot minimum LSN is: 0/0.
[2023-07-21 17:11:55]:begin build tablespace list
[2023-07-21 17:11:55]:finish build tablespace list
[2023-07-21 17:11:55]:begin get xlog by xlogstream
 check identify system successpace[2023-07-21 17:11:55]:                                            
[2023-07-21 17:11:55]: send START_REPLICATION 0/4000000 success                                     
[2023-07-21 17:11:55]: keepalive message is received                                                
[2023-07-21 17:11:55]: keepalive message is received                                                
97981/97981 kB (100%), 1/1 tablespace
[2023-07-21 17:12:00]:gs_basebackup: base backup successfully

-- 查看备份信息
[omm@opengauss-db1 ~]$ ls -l /opt/gaussdb/backup/20230721_171855
total 5084
-rw------- 1 omm dbgrp     216 Jul 21 17:19 backup_label
-rw------- 1 omm dbgrp     198 Jul 21 17:19 backup_label.old
drwx------ 5 omm dbgrp    4096 Jul 21 17:19 base
-rw------- 1 omm dbgrp       0 Jul 21 17:19 build_completed.done
-rw------- 1 omm dbgrp    4399 Jul 21 17:19 cacert.pem
drwx------ 4 omm dbgrp    4096 Jul 21 17:19 dbe_perf_standby
-rw------- 1 omm dbgrp      56 Jul 21 17:19 full_backup_label
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 global
-rw------- 1 omm dbgrp 4915200 Jul 21 17:19 gswlm_userinfo.cfg
-rw------- 1 omm dbgrp   21016 Jul 21 17:19 mot.conf
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_clog
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_csnlog
-rw------- 1 omm dbgrp       0 Jul 21 17:19 pg_ctl.lock
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_errorinfo
-rw------- 1 omm dbgrp    4676 Jul 21 17:19 pg_hba.conf
-rw------- 1 omm dbgrp    4676 Jul 21 17:19 pg_hba.conf.bak
-rw------- 1 omm dbgrp    1024 Jul 21 17:19 pg_hba.conf.lock
-rw------- 1 omm dbgrp    1636 Jul 21 17:19 pg_ident.conf
drwx------ 4 omm dbgrp    4096 Jul 21 17:19 pg_llog
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_logical
drwx------ 4 omm dbgrp    4096 Jul 21 17:19 pg_multixact
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_notify
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_replslot
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_serial
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_snapshots
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_stat_tmp
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_tblspc
drwx------ 2 omm dbgrp    4096 Jul 21 17:19 pg_twophase
-rw------- 1 omm dbgrp       4 Jul 21 17:19 PG_VERSION
drwx------ 3 omm dbgrp    4096 Jul 21 17:19 pg_xlog
-rw------- 1 omm dbgrp   35919 Jul 21 17:19 postgresql.conf
-rw------- 1 omm dbgrp   35919 Jul 21 17:19 postgresql.conf.guc.bak
-rw------- 1 omm dbgrp    1024 Jul 21 17:19 postgresql.conf.lock
-rw------- 1 omm dbgrp   35919 Jul 21 17:19 postgresql.conf.wal.bak
-rw------- 1 omm dbgrp       0 Jul 21 17:19 postmaster.pid.lock
-rw------- 1 omm dbgrp      10 Jul 21 17:19 rewind_lable
-rw------- 1 omm dbgrp    4402 Jul 21 17:19 server.crt
-rw------- 1 omm dbgrp    1766 Jul 21 17:19 server.key
-rw------- 1 omm dbgrp      56 Jul 21 17:19 server.key.cipher
-rw------- 1 omm dbgrp      24 Jul 21 17:19 server.key.rand
-rw------- 1 omm dbgrp       4 Jul 21 17:19 term_file
drwx------ 5 omm dbgrp    4096 Jul 21 17:19 undo

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器_06

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_数据库_07

2.7 停止集群

执行灰度升级,该步骤可不执行,此处停止集群,只为升级失败方便回退。

-- 停集群,omm 用户执行【主节点】
[omm@opengauss-db1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Down
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Down

cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器_08

2.8 备份目录及文件

-- root 用户执行【所有节点】
-- 升级前建议参照clusterconfig.xml文件对相应目录及文件进行备份,以防升级失败
-- 本次测试环境数据库相应目录如下,请参照实际生产环境执行
<PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" />
<PARAM name="gaussdbLogPath" value="/var/log/omm" />
<PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp" />
<PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" />
<PARAM name="corePath" value="/opt/gaussdb/corefile" />
<PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>

-- 备份目录
[root@opengauss-dbxxx ~]# cd /opt
[root@opengauss-dbxxx opt]# tar -czf gaussdb_3.1.1.tar ./gaussdb/

2.9 启动集群

-- 停集群,omm 用户执行【主节点】
[omm@opengauss-db1 ~]$ gs_om -t start
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state      : Normal
redistributing     : No
node_count         : 2
Datanode State
    primary           : 1
    standby           : 1
    secondary         : 0
    cascade_standby   : 0
    building          : 0
    abnormal          : 0
    down              : 0

Successfully started cluster.
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Primary Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Standby Normal

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器_09

三、执行升级

本次采用灰度升级集群

3.1 升级前预检查

# root用户执行【主节点】
[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?yes  -- 输入 yes
Please enter password for root
Password: 
Successfully created SSH trust for the root permission user.
Setting host ip env
Successfully set host ip env.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Are you sure you want to create the user[omm] and create trust for it (yes/no)? no  -- 输入no
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Checking OS software.
Successfully check os software.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Setting Core file
Successfully set core path.
Setting pssh path
Successfully set pssh path.
Setting Cgroup.
Successfully set Cgroup.
Set ARM Optimization.
No need to set ARM Optimization.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.

-- 可通过/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail查看预检查详细信息,如有告警等信息进行处理

3.2 执行升级

# root用户执行【主节点】
[root@opengauss-db1 ~]# chmod -R 755 /opt/software/openGauss/script/
[root@opengauss-db1 ~]# chown -R omm:dbgrp /opt/software/openGauss/script/

-- 灰度升级
[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xml
Static configuration matched with old static configuration files.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Successfully set upgrade_mode to 0.
Checking upgrade environment.
Successfully checked upgrade environment.
Start to do health check.
Successfully checked cluster status.
Upgrade all nodes.
NOTICE: The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.
Performing grey rollback.
No need to rollback.
The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.
Installing new binary.
Wait for the cluster status normal or degrade.
copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.
Successfully copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.
Successfully backup hotpatch config file.
Sync cluster configuration.
Successfully synced cluster configuration.
Switch symbolic link to new binary directory.
Successfully switch symbolic link to new binary directory.
Start check CMS parameter.
Old cluster version number less than 92574.
Switching all db processes.
Check cluster state.
Cluster state: [   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node         node_ip         port      instance     state
-----------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    26000      6001       P Primary Normal
2  opengauss-db2 10.110.3.156    26000      6002       S Standby Normal
Wait for the cluster status normal or degrade.
Wait for the cluster status normal or degrade.
Create checkpoint before switching.
Start to wait for om_monitor.
Switching DN processes.
Switch DN processes for rolling upgrade.
Ready to grey start cluster.
Grey start cluster successfully.
Wait for the cluster status normal or degrade.
Successfully switch all process version
The nodes ['opengauss-db1', 'opengauss-db2'] have been successfully upgraded to new version. Then do health check.
Start to do health check.
Successfully checked cluster status.
Waiting for the cluster status to become normal.
.
The cluster status is normal.
Upgrade main process has been finished, user can do some check now.
Once the check done, please execute following command to commit upgrade:

    gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml   

Successfully upgrade all nodes.

-- 升级提交
[omm@opengauss-db1 ~]$ gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml 
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Start to do health check.
Successfully checked cluster status.
Wait for the cluster status normal or degrade.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Successfully cleaned old install path.
Commit upgrade succeeded.
Start check CMS parameter.
Old cluster version number less than 92574.

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_服务器_10

3.3 信息核查

3.3.1 查看版本信息

# omm用户执行【任一节点】
-- 查看版本信息
-- 版本信息为 5.0.0
[omm@opengauss-db1 ~]$ gs_om -V
gs_om (openGauss OM 5.0.0 build 244a7e05) compiled at 2023-03-29 03:22:22 commit 0 last mr

-- 查看两节点数据库版本信息,都已升级到5.0.0
[omm@opengauss-db1 ~]$ gs_ssh -c "gsql -V"
Successfully execute command on all nodes.

Output:
[SUCCESS] opengauss-db1:
gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  
[SUCCESS] opengauss-db2:
gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr

3.3.2 查看集群状态信息

# omm用户执行【任一节点】
-- 集群状态信息
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Standby Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Primary Normal

-- 可以看到在升级后进行了主备切换

【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_数据库_11

3.3.3 查看数据库信息

# omm用户执行【任一节点】
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Standby Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Primary Normal
[omm@opengauss-db1 ~]$ 
[omm@opengauss-db1 ~]$ 
[omm@opengauss-db1 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
ERROR:  cannot execute CREATE DATABASE in a read-only transaction
-- 因为发生了主备切换,连接备节点无法创建数据库

四、附录

4.1 需修改version.cfg属主和属组

执行升级前,应同时修改主备节点/opt/software/openGauss/version.cfg属主和属组,如未修改,执行升级会报错。

-- 如未修改主备节点version.cfg属主和属组,执行升级时会报如下错误
[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xml
[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'
[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'
Start check CMS parameter.
float() argument must be a string or a number, not 'NoneType'

4.2 修改网卡MTU可能导致主备节点间无法SSH

在升级前预检查时,如果修改了主备节点网卡的MTU,在执行gs_upgradectl会卡主导致升级报错,此时两个节点间无法通过SSH互联,虽然可以互相ping通。

解决办法是将MTU值调整为默认1500,重启SSH服务

-- 升级预检查提示主备节点MTU值需调整,从1500调整到8192,但修改网卡MTU后执行gs_upgradectl升级卡主,最后报错,从升级日志里可看到如下相关信息:
[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent
[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.
[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.
[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.
[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.
[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.
[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.
[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.
[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156
Successfully appended authorized_key on remote node 10.110.3.156.
[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155
Successfully appended authorized_key on remote node 10.110.3.155.
[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.
[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.
[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.
[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.
[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.
[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.
[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.
[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
 1, lost connection
[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.
[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.
[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.
[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.
[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
 1, lost connection
 
 -- 此时查看主备节点SSH状态也是异常
 [root@opengauss-db2 ~]# systemctl status sshd.service
● sshd.service - OpenSSH server daemon
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2023-07-21 11:03:03 CST; 12h ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 2160 (sshd)
    Tasks: 1
   Memory: 4.2M
   CGroup: /system.slice/sshd.service
           └─2160 /usr/sbin/sshd -D

Jul 21 17:44:20 opengauss-db2 sshd[6374]: Accepted publickey for root from 10.110.3.155 port 63717 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 17:44:22 opengauss-db2 sshd[6417]: Accepted publickey for root from 10.110.3.155 port 63721 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 17:44:24 opengauss-db2 sshd[6463]: Accepted publickey for root from 10.110.3.155 port 63723 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 22:45:32 opengauss-db2 sshd[4829]: Accepted password for root from 10.110.3.155 port 30166 ssh2
Jul 21 22:45:37 opengauss-db2 sshd[4883]: Accepted password for root from 10.110.3.155 port 30172 ssh2
Jul 21 22:45:39 opengauss-db2 sshd[4922]: Connection closed by 10.110.3.155 port 30178 [preauth]
Jul 21 22:45:39 opengauss-db2 sshd[4928]: Connection closed by 10.110.3.155 port 30182 [preauth]
Jul 21 22:45:40 opengauss-db2 sshd[4930]: Accepted password for root from 10.110.3.155 port 30188 ssh2
Jul 21 23:06:46 opengauss-db2 sshd[13949]: Connection closed by 10.110.3.156 port 50810 [preauth]
Jul 21 23:27:22 opengauss-db2 sshd[22723]: Connection closed by 10.110.3.155 port 31050 [preauth]

-- 重新调整MTU,重启主备节点SSH服务
[root@opengauss-db2 ~]# systemctl restart sshd.service
[root@opengauss-db2 ~]# systemctl status sshd.service 
● sshd.service - OpenSSH server daemon
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2023-07-21 23:33:29 CST; 1s ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 25303 (sshd)
    Tasks: 1
   Memory: 1.3M
   CGroup: /system.slice/sshd.service
           └─25303 /usr/sbin/sshd -D

Jul 21 23:33:28 opengauss-db2 systemd[1]: Starting OpenSSH server daemon...
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 60002.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 60002.
Jul 21 23:33:29 opengauss-db2 systemd[1]: Started OpenSSH server daemon.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 22.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 22.

4.3 python3故障导致无法正常查看集群状态

-- 如果安装的python3故障,会导致gs_om无法查看集群状态
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
-bash: /opt/gaussdb/install/om/script/gs_om: Permission denied

4.4 集群升级后会发生主备切换

集群升级后导致主备节点发生切换,若连接原主库数据库会导致无法写入

-- 集群升级前状态信息
[omm@opengauss-db1 dn]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Standby Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Primary Normal

-- 集群升级后状态信息
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Standby Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Primary Normal

-- 连接原来的主库无法创建数据库
[omm@opengauss-db1 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
ERROR:  cannot execute CREATE DATABASE in a read-only transaction

-- 连接新主节点可以正常创建数据库
[omm@opengauss-db2 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
CREATE DATABASE
openGauss=# \l
                          List of databases
   Name    | Owner | Encoding  | Collate | Ctype | Access privileges 
-----------+-------+-----------+---------+-------+-------------------
 gaussdb   | omm   | UTF8      | C       | C     | 
 postgres  | omm   | SQL_ASCII | C       | C     | 
 template0 | omm   | SQL_ASCII | C       | C     | =c/omm           +
           |       |           |         |       | omm=CTc/omm
 template1 | omm   | SQL_ASCII | C       | C     | =c/omm           +
           |       |           |         |       | omm=CTc/omm
(4 rows)
[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?no
Setting host ip env
[GAUSS-51400] : Failed to execute the command: sed -i '/^export[ ]*HOST_IP=/d' /etc/profile. Result:{'opengauss-db1': 'Success', 'opengauss-db2': 'Failure'}.
Error:
[SUCCESS] opengauss-db1:
[FAILURE] opengauss-db2:


[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent
[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.
[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.
[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.
[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.
[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.
[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.
[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.
[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156
Successfully appended authorized_key on remote node 10.110.3.156.
[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155
Successfully appended authorized_key on remote node 10.110.3.155.
[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.
[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.
[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.
[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.
[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.
[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.
[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.
[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, 
[2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh
-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh
-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh
 total 32
drwx------   2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw-------   1 root root  504 Jul 21 22:45 authorized_keys
-rw-------   1 root root  464 Jul 21 22:45 id_om
-rw-------   1 root root  100 Jul 21 22:45 id_om.pub
-rw-------   1 root root 1679 Jul 21 11:35 id_rsa
-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub
-rw-------   1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
 1, lost connection
[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.
[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.
[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.
[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.
[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
 1, lost connection
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Down
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Down

cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
[omm@opengauss-db1 ~]$ cm_ctl switchover -a
cm_ctl: send switchover msg to cm_server, connect fail node_id:0, data_path:.
[omm@opengauss-db1 ~]$ cm_ctl query -Cv
[  CMServer State   ]

node             instance state
---------------------------------
1  opengauss-db1 1        Primary
2  opengauss-db2 2        Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             instance state            | node             instance state
------------------------------------------------------------------------------------------
1  opengauss-db1 6001     P Primary Normal | 2  opengauss-db2 6002     S Standby Normal
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[  CMServer State   ]

node             node_ip         instance                               state
-------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    1    /opt/gaussdb/install/cm/cm_server Primary
2  opengauss-db2 10.110.3.156    2    /opt/gaussdb/install/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                          state            
------------------------------------------------------------------------------------
1  opengauss-db1 10.110.3.155    6001 /opt/gaussdb/install/data/dn P Primary Normal
2  opengauss-db2 10.110.3.156    6002 /opt/gaussdb/install/data/dn S Standby Normal
```![image20230721172148201.png](https://oss-emcsprod-public.modb.pro/image/editor/20230722-c4e42479-422f-4161-979d-2fba202f7337.png)

标签:5.0,版主,gs,21,22,Jul,openGauss,root,opengauss
From: https://blog.51cto.com/u_16191492/7060810

相关文章

  • 【我和openGauss的故事】openGauss5.0企业版集群一主一备安装V1.0
    王zz[openGauss](javascript:void(0);)2023-07-2917:58发表于四川收录于合集#第六届openGauss技术文章征集初审合格文章62个一、基本环境文档说明:本方案只有主备数据同步,没有自动故障诊断、切换。安装方式:利用xml配置文件管理工具:om方式二、系统环境设置类型系统版本IP用户名/......
  • 【我和openGauss的故事】openGauss初体验
    T[openGauss](javascript:void(0);)2023-07-2917:58发表于四川收录于合集#第六届openGauss技术文章征集初审合格文章62个一:openGauss环境准备1.安装openGauss个人如果要使用openGauss数据库可以在本地进行安装,由于我们使用的大多是windows系统,而openGauss的安装只支持在Linux......
  • 【我和openGauss的故事】openGauss索引推荐功能测试
    _openGauss2023-07-2818:22发表于四川收录于合集#第六届openGauss技术文章征集初审合格文章62个一、单索引推荐单索引推荐功能,目前支持select查询,看官方介绍类似oracle中的sql_tunning_adviser,不过只是推荐创建索引。根据sql优化原理,猜测应该时根据选择来推荐索引。1、查看sql......
  • 【我和openGauss的故事】openGauss主备集群节点的添加与删除
    风一样自由openGauss2023-07-2917:58发表于四川一.环境准备已搭建openGauss一主两备集群(企业版5.0),环境如下:主机IP主机名节点类型10.100.10.92yf1主节点10.100.10.93yf2备节点10.100.10.94yf3备节点二.gs_dropnode删除集群备节点拟删除10.100.10.94节点。1.前提条件删除备......
  • openGauss数据库源码解析系列文章——安全管理源码解析(三)
    Gauss松鼠会[openGauss](javascript:void(0);)2023-07-2917:58发表于四川在上篇openGauss数据库源码解析系列文章——安全管理源码解析(一)我们围绕安全管理整体架构和代码概览、安全认证原理介绍和代码解析进行了简单介绍。本篇将继续角色管理、对象权限管理的学习,全文阅读需要3......
  • openGauss数据库源码解析系列文章——安全管理源码解析(四)
    四、对象权限管理权限管理是安全管理重要的一环,openGauss权限管理基于访问控制列表(accesscontrollist,ACL)实现。4.1权限管理1.访问控制列表访问控制列表是实现数据库对象权限管理的基础,每个对象都具有ACL,存储该对象的所有授权信息。当用户访问对象时,只有用户在对象的ACL中并且......
  • openGauss学习笔记-37 openGauss 高级数据管理-事务
    openGauss学习笔记-37openGauss高级数据管理-事务事务是用户定义的一个数据库操作序列,这些操作要么全做要么全不做,是一个不可分割的工作单位。openGauss数据库支持的事务控制命令有启动、设置、提交、回滚事务。openGauss数据库支持的事务隔离级别有读已提交和可重复读。READ......
  • 【我和openGauss的故事】Navicat连接openGauss_5.0.0 企业版数据库
    【我和openGauss的故事】Navicat连接openGauss_5.0.0企业版数据库心有阳光[openGauss](javascript:void(0);)2023-08-0316:49发表于四川收录于合集#第六届openGauss技术文章征集初审合格文章62个引言在当今互联网时代,数据成为企业和组织的重要资产。为了更好地管理和分析数据,......
  • 【我和openGauss的故事】openGauss 主备架构及同步复制模式理论学习与验证测试
    【我和openGauss的故事】openGauss主备架构及同步复制模式理论学习与验证测试尚雷[openGauss](javascript:void(0);)2023-08-0818:00发表于四川收录于合集#第六届openGauss技术文章征集初审合格文章62个备注:非常感谢在这研究本文相关内容中openGauss数据库官网行尘(张旭博)......
  • openGauss学习笔记-36 openGauss 高级数据管理-TRUNCATE TABLE语句
    openGauss学习笔记-36openGauss高级数据管理-TRUNCATETABLE语句清理表数据,TRUNCATETABLE用于删除表的数据,但不删除表结构。也可以用DROPTABLE删除表,但是这个命令会连表的结构一起删除,如果想插入数据,需要重新建立这张表。它和在目标表上进行无条件的DELETE有同样的效果,但由于......