26-mysql高可用-MAH

标签：info 10.0 26 17 Thu 2022 mysql MAH Nov

MHA Master High Availability

MHA 工作原理和架构

Master High Availability，对主节点进行监控，可实现自动故障转移至其它从节点；通过提升某一从节点为新的主节点，基于主从复制实现，还需要客户端配合实现，目前MHA主要支持一主多从的架构，要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器，一主二从，即一台充当master，一台充当备用master，另外一台充当从库，出于机器成本的考虑，淘宝进行了改造，目前淘宝TMHA已经支持一主一从
图解

26-mysql高可用-MAH_centos

工作原理：

图解

26-mysql高可用-MAH_二进制日志_02

##说明

#机制
1. MHA利用 SELECT 1 As Value 指令判断master服务器的健康性,一旦master 宕机,MHA 从宕机崩溃
的master保存二进制日志事件（binlog events）
2. 识别含有最新更新的slave
3. 应用差异的中继日志（relay log）到其他的slave
4. 应用从master保存的二进制日志事件（binlog events）到所有slave节点
5. 提升一个slave为新的master
6. 使其他的slave连接新的master进行复制
7. 故障服务器自动被剔除集群(masterha_conf_host),将配置信息去掉
8. 旧的Master的 VIP 漂移到新的master上，用户应用就可以访问新的Master
9. MHA是一次性的高可用性解决方案,Manager会自动退出

#选举新的Master
1. 如果设定权重(candidate_master=1),按照权重强制指定新主,但是默认情况下如果一个slave落后master 
二进制日志超过100M的relay logs，即使有权重,也会失效.如果设置check_repl_delay=0,即使落后很多日志,也强制选择其为新主
2. 如果从库数据之间有差异,最接近于Master的slave成为新主
3. 如果所有从库数据都一致,按照配置文件顺序最前面的当新主

#数据恢复
1. 当主服务器的SSH还能连接,从库对比主库position 或者GTID号,将二进制日志保存至各个从节点并且应用
(执行save_binary_logs 实现)
2. 当主服务器的SSH不能连接, 对比从库之间的relaylog的差异(执行apply_diff_relay_logs[实现])

#注意
为了尽可能的减少主库硬件损坏宕机造成的数据丢失，因此在配置MHA的同时建议配置成MySQL的半同步复制

安装

#安装包链接
https://github.com/yoshinorim/mha4mysql-manager/releases
https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58

#MHA软件
--MHA软件由两部分组成，Manager工具包和Node工具包
---------------------------------------------
#Manager工具包
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 故障转移（自动或手动）
masterha_conf_host 添加或删除配置的server信息
masterha_stop --cnotallow=app1.cnf 停止MHA
masterha_secondary_check 两个或多个网络线路检查MySQL主服务器的可用
---------------------------------------------
#Node工具包 --> 通常由MHA Manager的脚本触发，无需人为操作
save_binary_logs #保存和复制master的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog #去除不必要的ROLLBACK事件（MHA已不再使用此工具）
purge_relay_logs #清除中继日志（不会阻塞SQL线程）

#MHA自定义扩展：
secondary_check_script #通过多条网络路由检测master的可用性
master_ip_ailover_script #更新Application使用的masterip
shutdown_script #强制关闭master节点
report_script #发送报告
init_conf_load_script #加载初始配置参数
master_ip_online_change_script #更新master节点ip地址

#MHA配置文件
global配置，为各application提供默认配置，默认文件路径 /etc/masterha_default.cnf
application配置：为每个主从复制集群

案例（在主从架构的基础上加入MHA(mysql 服务安装参考19章)）

26-mysql高可用-MAH_二进制日志_03

#安装说明 --> MHA的管理节点必须是CentOS7
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 只支持CentOS7上安装,不支持在CentOS8安装,
支持MySQL5.7和MySQL8.0 ,但和CentOS8版本上的Mariadb-10.3.17不兼容
mha4mysql-manager-0.56-0.el6.noarch.rpm 不支持CentOS 8，只支持CentOS7及以下版本

节点密钥互通 ssh-key

[root@mha-manager ~]#ssh-keygen
[root@mha-manager ~]#ssh-copy-id 127.0.0.1
[root@mha-manager ~]#rsync -av .ssh 10.0.0.8:/root/
[root@mha-manager ~]#rsync -av .ssh 10.0.0.7:/root/
[root@mha-manager ~]#rsync -av .ssh 10.0.0.28:/root/

#记得及时测试
ssh ip

配置监控邮件

[root@mha-server ~]#cat /etc/mail.rc
set from=xxx@xx.com
set smtp=smtp.xx.com
set smtp-auth-userxxx@xx.com
set smtp-auth-password=xxxxxxxxxxxxxx

#记得及时测试 --> 可以做成脚本-加执行权限
echo "MHA is failover!" | mail -s "MHA Warning" xxx@xxx.com

配置VIP虚拟地址-这里暂时用perl脚本

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
my $vip = '10.0.0.100/24';  --> 这里设置vip
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_notallow=0, etc) here.
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}


sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master 
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

管理节点配置

#注意: 此文件的行尾不要加空格等符号
------------------------------说明s------------------------------------------
user=mhauser #用于远程连接MySQL所有节点的用户,需要有管理员的权限
password=123456
manager_workdir=/data/mastermha/app1/ #目录会自动生成,无需手动创建
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root #用于实现远程ssh基于KEY的连接,访问二进制日志
repl_user=repluser #主从复制的用户信息
repl_password=123456
ping_interval=1 #健康性检查的时间间隔,检查命令:SELECT 1 As Value
master_ip_failover_script=/usr/local/bin/master_ip_failover #切换VIP的perl脚本,不
支持跨网络,也可用Keepalived实现
report_script=/usr/local/bin/sendmail.sh #当执行报警脚本
check_repl_delay=0 #默认值为1,表示如果slave中从库落后主库relay log超过100M，主库不会选
择这个从库为新的master，因为这个从库进行恢复需要很长的时间.通过设置参数check_repl_delay=0，
mha触发主从切换时会忽略复制的延时，对于设置candidate_master=1的从库非常有用，这样确保这个从库
一定能成为最新的master
master_binlog_dir=/data/mysql/ #指定二进制日志存放的目录,mha4mysql-manager-0.58必须指
定,之前版本不需要指定
------------------------------说明e------------------------------------------

[root@mha-server ~]#mkdir /etc/mastermha/
[root@mha-server ~]#vim /etc/mastermha/app1.cnf
[root@mha-server ~]#cat /etc/mastermha/app1.cnf 

[server default]
user=mhauser  
password=123456
manager_workdir=/data/mastermha/app1/ 
manager_log=/data/mastermha/app1/manager.log  
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=123456
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
report_script=/usr/local/bin/sendmail.sh
check_repl_delay=0
master_binlog_dir=/data/mysql-binarylog/
[server1]
hostname=10.0.0.8
candidate_master=1
[server2]
hostname=10.0.0.7
[server3]
hostname=10.0.0.28
candidate_master=1

#说明: 主库宕机谁来接管新的master
1. 所有从节点日志都是一致的，默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致，自动选择最接近于主库的从库充当新主
3. 如果对于某节点设定了权重（candidate_master=1），权重节点会优先选择。但是此节点日志量落后主库
超过100M日志的话，也不会被选择。可以配合check_repl_delay=0，关闭日志量的检查，强制选择候选节点

Master 配置（配置说明参考25章）

[root@mysql-master ~]#cat /etc/my.cnf
[mysqld]
server-id=8 
log-bin
datadir=/data/mysql
socket=/data/mysql/mysql.sock                                                                                                   
log-error=/data/mysql/mysql.log
pid-file=/data/mysql/mysql.pid
default_authentication_plugin=mysql_native_password
#default_authentication_plugin=caching_sha2_password
log_bin=/data/mysql-binarylog/mysql-binlog
plugin-load-add = "semisync_master.so" 
rpl_semi_sync_master_enabled=ON
rpl_semi_sync_master_timeout=3000

skip_name_resolve=1 --> 禁止反向解析，主从都要配置
general_log

gtid_mode=ON
enforce_gtid_consistency
[client]
socket=/data/mysql/mysql.sock

#添加MHA用户并授权
mysql> create user mhauser@'10.0.0.%' identified by '123456';
mysql> grant all on *.* to mhauser@'10.0.0.%';

mysql> show grants for mhauser@'10.0.0.%'\G;
*************************** 1. row ***************************
Grants for mhauser@10.0.0.%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `mhauser`@`10.0.0.%`
*************************** 2. row ***************************
Grants for mhauser@10.0.0.%: GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ABORT_EXEMPT,AUDIT_ADMIN,AUTHENTICATION_POLICY_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FIREWALL_EXEMPT,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,GROUP_REPLICATION_STREAM,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PASSWORDLESS_USER_ADMIN,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SENSITIVE_VARIABLES_OBSERVER,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `mhauser`@`10.0.0.%`
2 rows in set (0.00 sec)

ERROR: 
No query specified

---------------------------------------------------------------------
mysql> show processlist; --> 确认自己主线程
+----+-----------------+-----------------+------+------------------+------+-----------------------------------------------------------------+------------------+
| Id | User            | Host            | db   | Command          | Time | State                                                           | Info             |
+----+-----------------+-----------------+------+------------------+------+-----------------------------------------------------------------+------------------+
|  5 | event_scheduler | localhost       | NULL | Daemon           | 1437 | Waiting on empty queue                                          | NULL             |
|  8 | repluser        | 10.0.0.28:46112 | NULL | Binlog Dump GTID | 1427 | Source has sent all binlog to replica; waiting for more updates | NULL             |
|  9 | root            | localhost       | NULL | Query            |    0 | init                                                            | show processlist |
| 17 | repluser        | 10.0.0.7:37780  | NULL | Binlog Dump GTID |   32 | Source has sent all binlog to replica; waiting for more updates | NULL             |
+----+-----------------+-----------------+------+------------------+------+-----------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)

slave 配置

[root@mysql-slave02 ~]#cat /etc/my.cnf
[mysqld]
server-id=28 
log-bin
datadir=/data/mysql
socket=/data/mysql/mysql.sock                                                                                                   
log-error=/data/mysql/mysql.log
pid-file=/data/mysql/mysql.pid
default_authentication_plugin=mysql_native_password
#default_authentication_plugin=caching_sha2_password
log_bin=/data/mysql-binarylog/mysql-binlog

plugin-load-add = "semisync_slave.so"
rpl_semi_sync_slave_enabled=ON

gtid_mode=ON
enforce_gtid_consistency

read_only
relay_log_purge=0
skip_name_resolve=1 --> 禁止反向解析
general_log
[client]
socket=/data/mysql/mysql.sock

-------------------------------------------------------------------------
#确认从属关系
10.0.0.7
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 10.0.0.8
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-binlog.000023
          Read_Master_Log_Pos: 197
               Relay_Log_File: mysql-slave01-relay-bin.000002
                Relay_Log_Pos: 379
        Relay_Master_Log_File: mysql-binlog.000023
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 197
              Relay_Log_Space: 597
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 8
                  Master_UUID: d07146ee-630f-11ed-a2db-000c2942b4d3
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: d07146ee-630f-11ed-a2db-000c2942b4d3:1-15
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

ERROR: 
No query specified

10.0.0.28
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 10.0.0.8
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-binlog.000022
          Read_Master_Log_Pos: 197
               Relay_Log_File: mysql-slave02-relay-bin.000002
                Relay_Log_Pos: 379
        Relay_Master_Log_File: mysql-binlog.000022
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 197
              Relay_Log_Space: 597
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 8
                  Master_UUID: d07146ee-630f-11ed-a2db-000c2942b4d3
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: d07146ee-630f-11ed-a2db-000c2942b4d3:1-15
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

ERROR: 
No query specified

各节点mysql服务配置完成后重启，检查MHA状态[有问题看提示和日志]

#检查环境
[root@mha-manager ~]#masterha_check_ssh --cnotallow=/etc/mastermha/app1.cnf
[root@mha-manager ~]#masterha_check_repl --cnotallow=/etc/mastermha/app1.cnf
#查看状态
[root@mha-manager ~]#masterha_check_status --cnotallow=/etc/mastermha/app1.cnf

----------------------------------------------------------------------------------
[root@mha-server ~]#masterha_check_ssh --cnotallow=/etc/mastermha/app1.cnf
Thu Nov 17 18:25:46 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 18:25:46 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:25:46 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:25:46 2022 - [info] Starting SSH connection tests..
Thu Nov 17 18:25:47 2022 - [debug] 
Thu Nov 17 18:25:46 2022 - [debug]  Connecting via SSH from root@10.0.0.8(10.0.0.8:22) to root@10.0.0.7(10.0.0.7:22)..
Thu Nov 17 18:25:46 2022 - [debug]   ok.
Thu Nov 17 18:25:46 2022 - [debug]  Connecting via SSH from root@10.0.0.8(10.0.0.8:22) to root@10.0.0.28(10.0.0.28:22)..
Thu Nov 17 18:25:47 2022 - [debug]   ok.
Thu Nov 17 18:25:48 2022 - [debug] 
Thu Nov 17 18:25:47 2022 - [debug]  Connecting via SSH from root@10.0.0.28(10.0.0.28:22) to root@10.0.0.8(10.0.0.8:22)..
Thu Nov 17 18:25:48 2022 - [debug]   ok.
Thu Nov 17 18:25:48 2022 - [debug]  Connecting via SSH from root@10.0.0.28(10.0.0.28:22) to root@10.0.0.7(10.0.0.7:22)..
Thu Nov 17 18:25:48 2022 - [debug]   ok.
Thu Nov 17 18:25:48 2022 - [debug] 
Thu Nov 17 18:25:46 2022 - [debug]  Connecting via SSH from root@10.0.0.7(10.0.0.7:22) to root@10.0.0.8(10.0.0.8:22)..
Thu Nov 17 18:25:47 2022 - [debug]   ok.
Thu Nov 17 18:25:47 2022 - [debug]  Connecting via SSH from root@10.0.0.7(10.0.0.7:22) to root@10.0.0.28(10.0.0.28:22)..
Thu Nov 17 18:25:48 2022 - [debug]   ok.
Thu Nov 17 18:25:48 2022 - [info] All SSH connection tests passed successfully.

----------------------------------------------------------------------------------
[root@mha-server ~]#masterha_check_repl --cnotallow=/etc/mastermha/app1.cnf
Thu Nov 17 18:26:11 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 18:26:11 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:26:11 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:26:11 2022 - [info] MHA::MasterMonitor version 0.58.
Thu Nov 17 18:26:13 2022 - [info] GTID failover mode = 1
Thu Nov 17 18:26:13 2022 - [info] Dead Servers:
Thu Nov 17 18:26:13 2022 - [info] Alive Servers:
Thu Nov 17 18:26:13 2022 - [info]   10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:26:13 2022 - [info]   10.0.0.7(10.0.0.7:3306)
Thu Nov 17 18:26:13 2022 - [info]   10.0.0.28(10.0.0.28:3306)
Thu Nov 17 18:26:13 2022 - [info] Alive Slaves:
Thu Nov 17 18:26:13 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:26:13 2022 - [info]     GTID ON
Thu Nov 17 18:26:13 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:26:13 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:26:13 2022 - [info]     GTID ON
Thu Nov 17 18:26:13 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:26:13 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:26:13 2022 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:26:13 2022 - [info] Checking slave configurations..
Thu Nov 17 18:26:13 2022 - [info] Checking replication filtering settings..
Thu Nov 17 18:26:13 2022 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Nov 17 18:26:13 2022 - [info]  Replication filtering check ok.
Thu Nov 17 18:26:13 2022 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Nov 17 18:26:13 2022 - [info] Checking SSH publickey authentication settings on the current master..
Thu Nov 17 18:26:13 2022 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Thu Nov 17 18:26:13 2022 - [info] 
10.0.0.8(10.0.0.8:3306) (current master)
 +--10.0.0.7(10.0.0.7:3306)
 +--10.0.0.28(10.0.0.28:3306)

Thu Nov 17 18:26:13 2022 - [info] Checking replication health on 10.0.0.7..
Thu Nov 17 18:26:13 2022 - [info]  ok.
Thu Nov 17 18:26:13 2022 - [info] Checking replication health on 10.0.0.28..
Thu Nov 17 18:26:13 2022 - [info]  ok.
Thu Nov 17 18:26:13 2022 - [info] Checking master_ip_failover_script status:
Thu Nov 17 18:26:13 2022 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.8 --orig_master_ip=10.0.0.8 --orig_master_port=3306 
Checking the Status of the script.. OK 
Thu Nov 17 18:26:13 2022 - [info]  OK.
Thu Nov 17 18:26:13 2022 - [warning] shutdown_script is not defined.
Thu Nov 17 18:26:13 2022 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
----------------------------------------------------------------------------------
[root@mha-server ~]#masterha_check_status --cnotallow=/etc/mastermha/app1.cnf
app1 (pid:27558) is running(0:PING_OK), master:10.0.0.8

启动MHA，监控执行日志

#开启MHA,默认是前台运行,生产环境一般为后台执行
[root@mha-manager ~]#nohup masterha_manager --cnotallow=/etc/mastermha/app1.cnf --remove_dead_master_conf 
--ignore_last_failover < /dev/null > /var/log/mha_manager.log 2>&1 &
#选项说明
--remove_dead_master_conf：该参数代表当发生主从切换后，老的主库的 ip 将会从配置文件中移除。
--manger_log：日志存放位置。
--ignore_last_failover：在缺省情况下，如果 MHA 检测到连续发生宕机，且两次宕机间隔不足 8 小
时的话，则不会进行 Failover， 之所以这样限制是为了避免 ping-pong 效应。该参数代表忽略上次
MHA 触发切换产生的文件，默认情况下，MHA 发生切换后会在日志记目录，也就是上面设置的日志
app1.failover.complete文件，下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换，除
非在第一次切换后收到删除该文件，为了方便，这里设置为--ignore_last_failover
-------------------------------------------------------------------------------------

#测试环境
[root@mha-server ~]#masterha_manager --cnotallow=/etc/mastermha/app1.cnf --remove_dead_master_conf --ignore_last_failover
Thu Nov 17 17:27:18 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 17:27:18 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 17:27:18 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..

[root@mha-server ~]#cat /data/mastermha/app1/manager.log 
Thu Nov 17 17:26:43 2022 - [info] MHA::MasterMonitor version 0.58.
Thu Nov 17 17:26:44 2022 - [info] GTID failover mode = 1
Thu Nov 17 17:26:44 2022 - [info] Dead Servers:
Thu Nov 17 17:26:44 2022 - [info] Alive Servers:
Thu Nov 17 17:26:44 2022 - [info]   10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:26:44 2022 - [info]   10.0.0.7(10.0.0.7:3306)
Thu Nov 17 17:26:44 2022 - [info]   10.0.0.28(10.0.0.28:3306)
Thu Nov 17 17:26:44 2022 - [info] Alive Slaves:
Thu Nov 17 17:26:44 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 17:26:44 2022 - [info]     GTID ON
Thu Nov 17 17:26:44 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:26:44 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 17:26:44 2022 - [info]     GTID ON
Thu Nov 17 17:26:44 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:26:44 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 17:26:44 2022 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:26:44 2022 - [info] Checking slave configurations..
Thu Nov 17 17:26:44 2022 - [info] Checking replication filtering settings..
Thu Nov 17 17:26:44 2022 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Nov 17 17:26:44 2022 - [info]  Replication filtering check ok.
Thu Nov 17 17:26:44 2022 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Nov 17 17:26:44 2022 - [info] Checking SSH publickey authentication settings on the current master..
Thu Nov 17 17:26:44 2022 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Thu Nov 17 17:26:44 2022 - [info] 
10.0.0.8(10.0.0.8:3306) (current master)
 +--10.0.0.7(10.0.0.7:3306)
 +--10.0.0.28(10.0.0.28:3306)

Thu Nov 17 17:26:44 2022 - [info] Checking master_ip_failover_script status:
Thu Nov 17 17:26:44 2022 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.8 --orig_master_ip=10.0.0.8 --orig_master_port=3306 
Checking the Status of the script.. OK 
Thu Nov 17 17:26:45 2022 - [info]  OK.
Thu Nov 17 17:26:45 2022 - [warning] shutdown_script is not defined.
Thu Nov 17 17:26:45 2022 - [info] Set master ping interval 1 seconds.
Thu Nov 17 17:26:45 2022 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Nov 17 17:26:45 2022 - [info] Starting ping health check on 10.0.0.8(10.0.0.8:3306)..
Thu Nov 17 17:26:45 2022 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Nov 17 17:26:52 2022 - [info] Got terminate signal. Exit.
Thu Nov 17 17:27:18 2022 - [info] MHA::MasterMonitor version 0.58.
Thu Nov 17 17:27:19 2022 - [info] GTID failover mode = 1
Thu Nov 17 17:27:19 2022 - [info] Dead Servers:
Thu Nov 17 17:27:19 2022 - [info] Alive Servers:
Thu Nov 17 17:27:19 2022 - [info]   10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:27:19 2022 - [info]   10.0.0.7(10.0.0.7:3306)
Thu Nov 17 17:27:19 2022 - [info]   10.0.0.28(10.0.0.28:3306)
Thu Nov 17 17:27:19 2022 - [info] Alive Slaves:
Thu Nov 17 17:27:19 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 17:27:19 2022 - [info]     GTID ON
Thu Nov 17 17:27:19 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:27:19 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 17:27:19 2022 - [info]     GTID ON
Thu Nov 17 17:27:19 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:27:19 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 17:27:19 2022 - [info] Current Alive Master: 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 17:27:19 2022 - [info] Checking slave configurations..
Thu Nov 17 17:27:19 2022 - [info] Checking replication filtering settings..
Thu Nov 17 17:27:19 2022 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Nov 17 17:27:19 2022 - [info]  Replication filtering check ok.
Thu Nov 17 17:27:19 2022 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Nov 17 17:27:19 2022 - [info] Checking SSH publickey authentication settings on the current master..
Thu Nov 17 17:27:20 2022 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Thu Nov 17 17:27:20 2022 - [info] 
10.0.0.8(10.0.0.8:3306) (current master)
 +--10.0.0.7(10.0.0.7:3306)
 +--10.0.0.28(10.0.0.28:3306)

Thu Nov 17 17:27:20 2022 - [info] Checking master_ip_failover_script status:
Thu Nov 17 17:27:20 2022 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.8 --orig_master_ip=10.0.0.8 --orig_master_port=3306 
Checking the Status of the script.. OK 
Thu Nov 17 17:27:20 2022 - [info]  OK.
Thu Nov 17 17:27:20 2022 - [warning] shutdown_script is not defined.
Thu Nov 17 17:27:20 2022 - [info] Set master ping interval 1 seconds.
Thu Nov 17 17:27:20 2022 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Nov 17 17:27:20 2022 - [info] Starting ping health check on 10.0.0.8(10.0.0.8:3306)..
Thu Nov 17 17:27:20 2022 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

#查看状态
[root@mha-server ~]#masterha_check_status --cnotallow=/etc/mastermha/app1.cnf
app1 (pid:27558) is running(0:PING_OK), master:10.0.0.8

模拟故障

#当 master down机后,mha管理程序自动退出
[root@mysql-master ~]#systemctl stop mysqld
[root@mysql-master ~]#systemctl status mysql
● mysqld.service - LSB: start and stop MySQL
   Loaded: loaded (/etc/rc.d/init.d/mysqld; generated)
   Active: inactive (dead) since Thu 2022-11-17 18:34:38 CST; 13s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 35734 ExecStop=/etc/rc.d/init.d/mysqld stop (code=exited, status=0/SUCCESS)
  Process: 34835 ExecStart=/etc/rc.d/init.d/mysqld start (code=exited, status=0/SUCCESS)

Nov 17 16:00:05 mysql-master systemd[1]: Starting LSB: start and stop MySQL...
Nov 17 16:00:11 mysql-master mysqld[34835]: Starting MySQL....
Nov 17 16:00:11 mysql-master systemd[1]: Started LSB: start and stop MySQL.
Nov 17 18:34:25 mysql-master systemd[1]: Stopping LSB: start and stop MySQL...
Nov 17 18:34:37 mysql-master mysqld[35175]: Starting M
Nov 17 18:34:38 mysql-master mysqld[35734]: Shutting down MySQL............
Nov 17 18:34:38 mysql-master mysqld[35798]: Shutting d
Nov 17 18:34:38 mysql-master systemd[1]: mysqld.service: Succeeded.
Nov 17 18:34:38 mysql-master systemd[1]: Stopped LSB: start and stop MySQL.
------------------------------------------------------------------------------------
[root@mha-server ~]#cat /data/mastermha/app1/manager.log
...
Thu Nov 17 17:27:20 2022 - [warning] shutdown_script is not defined.
Thu Nov 17 17:27:20 2022 - [info] Set master ping interval 1 seconds.
Thu Nov 17 17:27:20 2022 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Nov 17 17:27:20 2022 - [info] Starting ping health check on 10.0.0.8(10.0.0.8:3306)..
Thu Nov 17 17:27:20 2022 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Nov 17 18:34:26 2022 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Thu Nov 17 18:34:26 2022 - [info] Executing SSH check script: exit 0
Thu Nov 17 18:34:26 2022 - [info] HealthCheck: SSH to 10.0.0.8 is reachable.
Thu Nov 17 18:34:27 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.8' (111))
Thu Nov 17 18:34:27 2022 - [warning] Connection failed 2 time(s)..
Thu Nov 17 18:34:28 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.8' (111))
Thu Nov 17 18:34:28 2022 - [warning] Connection failed 3 time(s)..
Thu Nov 17 18:34:29 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.8' (111))
Thu Nov 17 18:34:29 2022 - [warning] Connection failed 4 time(s)..
Thu Nov 17 18:34:29 2022 - [warning] Master is not reachable from health checker!
Thu Nov 17 18:34:29 2022 - [warning] Master 10.0.0.8(10.0.0.8:3306) is not reachable!
Thu Nov 17 18:34:29 2022 - [warning] SSH is reachable.
Thu Nov 17 18:34:29 2022 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mastermha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Nov 17 18:34:29 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 18:34:29 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:34:29 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 18:34:30 2022 - [info] GTID failover mode = 1
Thu Nov 17 18:34:30 2022 - [info] Dead Servers:
Thu Nov 17 18:34:30 2022 - [info]   10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:30 2022 - [info] Alive Servers:
Thu Nov 17 18:34:30 2022 - [info]   10.0.0.7(10.0.0.7:3306)
Thu Nov 17 18:34:30 2022 - [info]   10.0.0.28(10.0.0.28:3306)
Thu Nov 17 18:34:30 2022 - [info] Alive Slaves:
Thu Nov 17 18:34:30 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:30 2022 - [info]     GTID ON
Thu Nov 17 18:34:30 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:30 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:30 2022 - [info]     GTID ON
Thu Nov 17 18:34:30 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:30 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:34:30 2022 - [info] Checking slave configurations..
Thu Nov 17 18:34:30 2022 - [info] Checking replication filtering settings..
Thu Nov 17 18:34:30 2022 - [info]  Replication filtering check ok.
Thu Nov 17 18:34:30 2022 - [info] Master is down!
Thu Nov 17 18:34:30 2022 - [info] Terminating monitoring script.
Thu Nov 17 18:34:30 2022 - [info] Got exit code 20 (Master dead).
Thu Nov 17 18:34:30 2022 - [info] MHA::MasterFailover version 0.58.
Thu Nov 17 18:34:30 2022 - [info] Starting master failover.
Thu Nov 17 18:34:30 2022 - [info] 
Thu Nov 17 18:34:30 2022 - [info] * Phase 1: Configuration Check Phase..
Thu Nov 17 18:34:30 2022 - [info] 
Thu Nov 17 18:34:31 2022 - [info] GTID failover mode = 1
Thu Nov 17 18:34:31 2022 - [info] Dead Servers:
Thu Nov 17 18:34:31 2022 - [info]   10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:31 2022 - [info] Checking master reachability via MySQL(double check)...
Thu Nov 17 18:34:31 2022 - [info]  ok.
Thu Nov 17 18:34:31 2022 - [info] Alive Servers:
Thu Nov 17 18:34:31 2022 - [info]   10.0.0.7(10.0.0.7:3306)
Thu Nov 17 18:34:31 2022 - [info]   10.0.0.28(10.0.0.28:3306)
Thu Nov 17 18:34:31 2022 - [info] Alive Slaves:
Thu Nov 17 18:34:31 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:31 2022 - [info]     GTID ON
Thu Nov 17 18:34:31 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:31 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:31 2022 - [info]     GTID ON
Thu Nov 17 18:34:31 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:31 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:34:31 2022 - [info] Starting GTID based failover.
Thu Nov 17 18:34:31 2022 - [info] 
Thu Nov 17 18:34:31 2022 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Nov 17 18:34:31 2022 - [info] 
Thu Nov 17 18:34:31 2022 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Nov 17 18:34:31 2022 - [info] 
Thu Nov 17 18:34:31 2022 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Nov 17 18:34:31 2022 - [info] Executing master IP deactivation script:
Thu Nov 17 18:34:31 2022 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=10.0.0.8 --orig_master_ip=10.0.0.8 --orig_master_port=3306 --command=stopssh --ssh_user=root  
Thu Nov 17 18:34:31 2022 - [info]  done.
Thu Nov 17 18:34:31 2022 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Nov 17 18:34:32 2022 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] * Phase 3: Master Recovery Phase..
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] The latest binary log file/position on all slaves is mysql-binlog.000020:1029
Thu Nov 17 18:34:32 2022 - [info] Retrieved Gtid Set: d07146ee-630f-11ed-a2db-000c2942b4d3:13-15
Thu Nov 17 18:34:32 2022 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Nov 17 18:34:32 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:32 2022 - [info]     GTID ON
Thu Nov 17 18:34:32 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:32 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:32 2022 - [info]     GTID ON
Thu Nov 17 18:34:32 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:32 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:34:32 2022 - [info] The oldest binary log file/position on all slaves is mysql-binlog.000020:1029
Thu Nov 17 18:34:32 2022 - [info] Retrieved Gtid Set: d07146ee-630f-11ed-a2db-000c2942b4d3:13-15
Thu Nov 17 18:34:32 2022 - [info] Oldest slaves:
Thu Nov 17 18:34:32 2022 - [info]   10.0.0.7(10.0.0.7:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:32 2022 - [info]     GTID ON
Thu Nov 17 18:34:32 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:32 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:32 2022 - [info]     GTID ON
Thu Nov 17 18:34:32 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:32 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] * Phase 3.3: Determining New Master Phase..
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] Searching new master from slaves..
Thu Nov 17 18:34:32 2022 - [info]  Candidate masters from the configuration file:
Thu Nov 17 18:34:32 2022 - [info]   10.0.0.28(10.0.0.28:3306)  Versinotallow=8.0.31 (oldest major version between slaves) log-bin:enabled
Thu Nov 17 18:34:32 2022 - [info]     GTID ON
Thu Nov 17 18:34:32 2022 - [info]     Replicating from 10.0.0.8(10.0.0.8:3306)
Thu Nov 17 18:34:32 2022 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov 17 18:34:32 2022 - [info]  Non-candidate masters:
Thu Nov 17 18:34:32 2022 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Nov 17 18:34:32 2022 - [info] New master is 10.0.0.28(10.0.0.28:3306)
Thu Nov 17 18:34:32 2022 - [info] Starting master failover..
Thu Nov 17 18:34:32 2022 - [info] 
From:
10.0.0.8(10.0.0.8:3306) (current master)
 +--10.0.0.7(10.0.0.7:3306)
 +--10.0.0.28(10.0.0.28:3306)

To:
10.0.0.28(10.0.0.28:3306) (new master) --> 切换成功
 +--10.0.0.7(10.0.0.7:3306)
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Nov 17 18:34:32 2022 - [info] 
Thu Nov 17 18:34:32 2022 - [info]  Waiting all logs to be applied.. 
Thu Nov 17 18:34:32 2022 - [info]   done.
Thu Nov 17 18:34:32 2022 - [info]  Replicating from the latest slave 10.0.0.7(10.0.0.7:3306) and waiting to apply..
Thu Nov 17 18:34:32 2022 - [info]  Waiting all logs to be applied on the latest slave.. 
Thu Nov 17 18:34:32 2022 - [info]  Resetting slave 10.0.0.28(10.0.0.28:3306) and starting replication from the new master 10.0.0.7(10.0.0.7:3306)..
Thu Nov 17 18:34:32 2022 - [info]  Executed CHANGE MASTER.
Thu Nov 17 18:34:32 2022 - [info]  Slave started.
Thu Nov 17 18:34:32 2022 - [info]  Waiting to execute all relay logs on 10.0.0.28(10.0.0.28:3306)..

------------------------------------------------------------------------------------
[root@mha-server ~]#masterha_manager --cnotallow=/etc/mastermha/app1.cnf --remove_dead_master_conf --ignore_last_failover
Thu Nov 17 19:50:02 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 19:50:02 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 19:50:02 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 19:51:00 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 17 19:51:00 2022 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Thu Nov 17 19:51:00 2022 - [info] Reading server configuration from /etc/mastermha/app1.cnf..

MHA停止服务
------------------------------------------------------------------------------------
#验证VIP漂移至新的Master上 同时 slave线程取消
[root@mysql-slave02 ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:d3:0d:20 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.28/24 brd 10.0.0.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary eth0:1
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fed3:d20/64 scope link 
       valid_lft forever preferred_lft forever

mysql> show slave status\G;
Empty set, 1 warning (0.00 sec)

------------------------------------------------------------------------------------
#验证10.0.0.7从服务器的master节点迁移至新master

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 10.0.0.28
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-binlog.000002
          Read_Master_Log_Pos: 197
               Relay_Log_File: mysql-slave01-relay-bin.000002
                Relay_Log_Pos: 379
        Relay_Master_Log_File: mysql-binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 197
              Relay_Log_Space: 597
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 28
                  Master_UUID: 44647e52-641f-11ed-94b2-000c29d30d20
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: d07146ee-630f-11ed-a2db-000c2942b4d3:1-15
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

ERROR: 
No query specified
------------------------------------------------------------------------------------
#自动修改manager节点上的配置文件,将master剔除
[root@mha-server ~]#cat /etc/mastermha/app1.cnf 
[server default]
check_repl_delay=0
manager_log=/data/mastermha/app1/manager.log
manager_workdir=/data/mastermha/app1/
master_binlog_dir=/data/mysql-binarylog/
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=123456
ping_interval=1
remote_workdir=/data/mastermha/app1/
repl_password=123456
repl_user=repluser
report_script=/usr/local/bin/sendmail.sh
ssh_user=root
user=mhauser

[server2]
hostname=10.0.0.7
          --> 自动删除
[server3]
candidate_master=1
hostname=10.0.0.28

------------------------------------------------------------------------------------
#确认查收报警邮件

如果出错,需要删除下面文件再执行MHA

[root@mha-manager ~]#rm -rf /data/mastermha/app1/ #mha_master --> 自己的工作路径

#修复主从
.修复故障的主库,保证数据同步
.修复主从,手工新故障库加入新的主,设为为从库
.修复manager的配置文件
.清理相关目录
.检查ssh互信和replication的复制是否成功
.检查VIP,如果有问题,重新配置VIP
.重新运行MHA,查询MHA状态,确保运行正常

我是moore 大家一起加油！【有问题私信我就好】

标签：info,10.0,26,17,Thu,2022,mysql,MAH,Nov
From： https://blog.51cto.com/mooreyxia/5866143

相关文章

赞助商

阅读排行