配置主从同步: 创建用于同步的用户: create user repl@'%' identified by 'Admin@123'; grant REPLICATION CLIENT,REPLICATION SLAVE on *.* to rep@'%'; 创建用户mha连接的用户: CREATE USER 'mha'@'%' IDENTIFIED WITH mysql_native_password BY 'Admin@123'; GRANT REPLICATION SLAVE ON *.* TO 'mha'@'%'; grant all on *.* to mha; 主库全备恢复到备库环境: mysqldump -uroot -p"Admin@123" --single-transaction --master-data=2 testdb --flush-logs -R -E | sed -e 's/DEFINE.*PROCEDURE/PROCEDURE/;s/DEFINE.*FUNCTION/FUNCTION/;s/DEFINER.*DEFINER//' |gzip> /tmp/testdb.sql.gz 恢复后同步: CHANGE MASTER TO MASTER_HOST='10.43.20.100', MASTER_USER='repl', MASTER_PASSWORD='Admin@123', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000003', MASTER_LOG_POS=157; 主备同步创建完成后,进行mha的安装部署: 1.配置两条服务器免密: 可使用文末免密脚本进行配置。 2.第一步安装依赖:(以下步骤两节点均需操作,特定情况会说明) 将mysql环境变量写入profile文件 export PATH=/usr/local/mysql/bin:$PATH 配置yum源: wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo yum install -y cpan perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-DBD-MySQL python3-devel gcc 查看依赖安装情况: rpm -qa cpan \ perl-Config-Tiny \ epel-release \ perl-Log-Dispatch \ perl-Parallel-ForkManager \ perl-Time-HiRes\ perl-DBD-MySQL\ python3-devel\ 3.安装python3依赖包 pip3 install psutil requests pymysql -i https://pypi.tuna.tsinghua.edu.cn/simple 4.解压软件到/data目录下(两节点均需操作) tar -zxvf mha20250113.tar.gz -C /data/
软件下载地址: https://elanjie.lanzout.com/iB7dA2kvpqvg
5.修改配置文件 目录格式如下: [root@node02 mha]# ls -rtl total 4 drwxr-xr-x 8 root root 4096 Jan 13 19:00 mha_soft drwxr-xr-x 2 root root 41 Jan 13 19:04 conf drwxr-xr-x 3 root root 230 Jan 13 19:43 ha_monitor drwxr-xr-x 3 root root 252 Jan 13 23:25 scripts drwxr-xr-x 2 root root 6 Jan 13 23:28 masterha drwxr-xr-x 2 root root 54 Jan 14 10:03 manager 5.1 conf文件夹保持为空,利用命令生成配置文件 进入scripts目录下,执行如下命令: python3 setup_mysql_mhavip.py -V 10.43.20.102 -I 10.43.20.100,10.43.20.101 -P 3306 -S 22 -N 'ens33' 说明:-V 为虚拟ip地址,-I分别为两台服务器地址,主库写前边,备库写后边,-P为数据库端口,-S为ssh端口,-N为网卡名称 脚本生成依赖于,/data/mha/mha_soft下的masterha.cnf和app.cnf文件,如果ssh端口有改变,需修改masterha.cnf文件,将虚拟ip地址,主备服务器地址,ssh端口,网卡名称修改为对应值. 状态输出: [root@node02 scripts]# python3 setup_mysql_mhavip.py -V 10.43.20.102 -I 10.43.20.100,10.43.20.101 -P 3306 -S 22 -N 'ens33' 2025-01-14 10:15:51 - install mha - INFO - yum install Perl dependencies, perl-Module-Install ...loaded. 2025-01-14 10:15:51 - install mha - INFO - yum install Perl dependencies, perl-Module-Build ...loaded. 2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Log-Dispatch ...loaded. 2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Parallel-ForkManager ...loaded. 2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Config-Tiny ...loaded. 2025-01-14 10:16:03 - install mha - INFO - yum install mysql driver,perl-DBD-MySQL ...loaded. 2025-01-14 10:16:03 - system check - WARNING - /data/mha/masterha tmp already exists 2025-01-14 10:16:03 - install mha - WARNING - mha app node already exists. 2025-01-14 10:16:03 - system check - WARNING - /data/mha/scripts tmp already exists 2025-01-14 10:16:03 - system check - WARNING - /data/mha/conf tmp already exists 2025-01-14 10:16:04 - system check - WARNING - /data/mha/manager tmp already exists 2025-01-14 10:16:04 - install mha - WARNING - mha manager already exists. ['[server default]\n', '#workdir on the management server\n', 'manager_workdir=/data/mha/manager/\n', 'manager_log=/data/mha/manager/manager.log\n', '\n', '#workdir on the node for mysql server\n', 'master_binlog_dir=/data/mysqldata/data\n', '\n', '#检测master可用性\n', 'secondary_check_script=/data/mha/scripts/masterha_secondary_check -s 10.43.20.100 -s 10.43.20.101 --ssh_port=22\n', '#自动故障切换master脚本\n', 'master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22\n', '#手动切换master脚本\n', 'master_ip_online_change_script=/data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22\n', '\n', '[server1]\n', 'hostname=10.43.20.100\n', 'port=3306\n', 'candidate_master=1\n', 'check_repl_delay=0\n', '\n', '[server2]\n', 'hostname=10.43.20.101\n', 'port=3306\n', 'candidate_master=1\n', 'check_repl_delay=0\n'] 2025-01-14 10:16:04 - install mha - INFO - complete ok! 提示配置完成,查看conf文件夹下的app.cnf文件,内容如下: [root@node02 conf]# ls -rtl total 8 -rw-r--r-- 1 root root 383 Jan 14 10:16 masterha.cnf -rw-r--r-- 1 root root 772 Jan 14 10:16 app.cnf 检查文件是否正确。 4.2测试联通性: [root@node02 mha]# masterha_check_ssh --conf=/data/mha/conf/app.cnf Tue Jan 14 10:45:28 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jan 14 10:45:28 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:45:28 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:45:28 2025 - [info] Starting SSH connection tests.. Tue Jan 14 10:45:30 2025 - [debug] Tue Jan 14 10:45:29 2025 - [debug] Connecting via SSH from root@10.43.20.101(10.43.20.101:22) to root@10.43.20.100(10.43.20.100:22).. Tue Jan 14 10:45:30 2025 - [debug] ok. Tue Jan 14 10:45:30 2025 - [debug] Tue Jan 14 10:45:28 2025 - [debug] Connecting via SSH from root@10.43.20.100(10.43.20.100:22) to root@10.43.20.101(10.43.20.101:22).. Tue Jan 14 10:45:30 2025 - [debug] ok. Tue Jan 14 10:45:30 2025 - [info] All SSH connection tests passed successfully. [root@node02 mha]# 连通性测试通过。 4.3修改ha_monitor配置 修改:parameter.py,配合HA_Monitor.py,进行mha的监控 [root@node02 mha]# cat ha_monitor/parameter.py par_var = {"vip": "10.43.20.102", "local_ip1": "10.43.20.100", "local_ip2": "10.43.20.101", "mysql_port": "3306", "netcard": "ens33", "sleeptime": 60} 4.4【备节点】启动manager和monitor [root@node02 scripts]# cd /data/mha/scripts/ [root@node02 scripts]# ./start.sh [root@node02 scripts]# [root@node02 scripts]# 查看日志输出: [root@node02 manager]# tail -10f manager.log Tue Jan 14 10:51:59 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf.. Tue Jan 14 10:51:59 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:51:59 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:51:59 2025 - [info] MHA::MasterMonitor version 0.58. Tue Jan 14 10:52:01 2025 - [info] GTID failover mode = 0 Tue Jan 14 10:52:01 2025 - [info] Dead Servers: Tue Jan 14 10:52:01 2025 - [info] Alive Servers: Tue Jan 14 10:52:01 2025 - [info] 10.43.20.100(10.43.20.100:3306) Tue Jan 14 10:52:01 2025 - [info] 10.43.20.101(10.43.20.101:3306) Tue Jan 14 10:52:01 2025 - [info] Alive Slaves: Tue Jan 14 10:52:01 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 10:52:01 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 10:52:01 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 10:52:01 2025 - [info] Current Alive Master: 10.43.20.100(10.43.20.100:3306) Tue Jan 14 10:52:01 2025 - [info] Checking slave configurations.. Tue Jan 14 10:52:01 2025 - [info] Checking replication filtering settings.. Tue Jan 14 10:52:01 2025 - [info] binlog_do_db= , binlog_ignore_db= Tue Jan 14 10:52:01 2025 - [info] Replication filtering check ok. Tue Jan 14 10:52:01 2025 - [info] GTID (with auto-pos) is not supported Tue Jan 14 10:52:01 2025 - [info] Starting SSH connection tests.. Tue Jan 14 10:52:03 2025 - [info] All SSH connection tests passed successfully. Tue Jan 14 10:52:03 2025 - [info] Checking MHA Node version.. Tue Jan 14 10:52:03 2025 - [info] Version check ok. Tue Jan 14 10:52:03 2025 - [info] Checking SSH publickey authentication settings on the current master.. Tue Jan 14 10:52:04 2025 - [info] HealthCheck: SSH to 10.43.20.100 is reachable. Tue Jan 14 10:52:04 2025 - [info] Master MHA Node version is 0.58. Tue Jan 14 10:52:04 2025 - [info] Checking recovery script configurations on 10.43.20.100(10.43.20.100:3306).. Tue Jan 14 10:52:04 2025 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata/data --output_file=/data/mha/masterha/save_binary_logs_test --manager_version=0.58 --start_file=mysqld-bin.000005 Tue Jan 14 10:52:04 2025 - [info] Connecting to root@10.43.20.100(10.43.20.100:22).. Creating /data/mha/masterha if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysqldata/data, up to mysqld-bin.000005 Tue Jan 14 10:52:05 2025 - [info] Binlog setting check done. Tue Jan 14 10:52:05 2025 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Jan 14 10:52:05 2025 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.43.20.101 --slave_ip=10.43.20.101 --slave_port=3306 --workdir=/data/mha/masterha --target_version=8.0.28 --manager_version=0.58 --relay_dir=/data/mysqldata/data --current_relay_log=relay.000002 --slave_pass=xxx Tue Jan 14 10:52:05 2025 - [info] Connecting to root@10.43.20.101(10.43.20.101:22).. Checking slave recovery environment settings.. Relay log found at /data/mysqldata/data, up to relay.000002 Temporary relay log file is /data/mysqldata/data/relay.000002 Checking if super_read_only is defined and turned on..Disabling super_read_only, enabling read_only, so that the applying can be done on the slave Testing mysql connection and privileges.. done. Enabling super_read_only again..Enabling super_read_only again after applying Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Jan 14 10:52:09 2025 - [info] Slaves settings check done. Tue Jan 14 10:52:09 2025 - [info] 10.43.20.100(10.43.20.100:3306) (current master) +--10.43.20.101(10.43.20.101:3306) Tue Jan 14 10:52:09 2025 - [info] Checking master_ip_failover_script status: Tue Jan 14 10:52:09 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --command=status --ssh_user=root --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 Tue Jan 14 10:52:09 2025 - [info] OK. Tue Jan 14 10:52:09 2025 - [warning] shutdown_script is not defined. Tue Jan 14 10:52:09 2025 - [info] Set master ping interval 2 seconds. Tue Jan 14 10:52:09 2025 - [info] Set secondary check script: /data/mha/scripts/masterha_secondary_check -s 10.43.20.100 -s 10.43.20.101 --ssh_port=22 Tue Jan 14 10:52:09 2025 - [info] Starting ping health check on 10.43.20.100(10.43.20.100:3306).. Tue Jan 14 10:52:09 2025 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond.. [root@node02 ha_monitor]# tail -10f monitor.log 2025-01-14 10:13:15 check manager WARNING 从库的manager进程不存在,正在启动... 2025-01-14 10:13:15 - check manager - WARNING - 从库的manager进程不存在,正在启动... 2025-01-14 10:13:15 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:13:15 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:14:15 check_mysql INFO mysqld is run. 2025-01-14 10:14:15 - check_mysql - INFO - mysqld is run. 2025-01-14 10:14:15 read only INFO it's a slave,read only mode is on, keep it. 2025-01-14 10:14:15 - read only - INFO - it's a slave,read only mode is on, keep it. 2025-01-14 10:14:15 ha_monitor INFO manager is not run. 2025-01-14 10:14:15 - ha_monitor - INFO - manager is not run. 2025-01-14 10:14:16 check manager WARNING 从库的manager进程不存在,正在启动... 2025-01-14 10:14:16 - check manager - WARNING - 从库的manager进程不存在,正在启动... 2025-01-14 10:14:16 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:14:16 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:51:59 check_mysql INFO mysqld is run. 2025-01-14 10:51:59 - check_mysql - INFO - mysqld is run. 2025-01-14 10:51:59 read only INFO it's a slave,read only mode is on, keep it. 2025-01-14 10:51:59 - read only - INFO - it's a slave,read only mode is on, keep it. 2025-01-14 10:51:59 ha_monitor INFO manager is run. 2025-01-14 10:51:59 - ha_monitor - INFO - manager is run. 2025-01-14 10:51:59 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:51:59 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:52:59 check_mysql INFO mysqld is run. 2025-01-14 10:52:59 - check_mysql - INFO - mysqld is run. 2025-01-14 10:53:00 read only INFO it's a slave,read only mode is on, keep it. 2025-01-14 10:53:00 - read only - INFO - it's a slave,read only mode is on, keep it. 2025-01-14 10:53:00 ha_monitor INFO manager is run. 2025-01-14 10:53:00 - ha_monitor - INFO - manager is run. 2025-01-14 10:53:00 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it. 2025-01-14 10:53:00 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it. 4.5【主节点】启动monitor [root@node01 scripts]# cd /data/mha/scripts/ [root@node01 scripts]# ./start.sh [root@node01 scripts]# ./start.sh [root@node01 scripts]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:77:5d:9b brd ff:ff:ff:ff:ff:ff inet 10.43.20.100/24 brd 10.43.20.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet 10.43.20.102/24 scope global secondary ens33 valid_lft forever preferred_lft forever inet6 fe80::6e7:5185:9e7b:c840/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::79c0:be13:f42:d85e/64 scope link noprefixroute valid_lft forever preferred_lft forever 主节点vip已自动绑定。 4.6两节点添加crontab监控 * * * * * if [[ ` ps -ef|grep -i ha_monitor.py|grep -v grep|wc -l ` -ne 1 ]];then /bin/nohup /bin/python3 /data/mha/ha_monitor/HA_Monitor.py >> /data/mha/ha_monitor/monitor.log 2>&1 & fi 4.7测试主备切换 备节点执行: masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=10.43.20.101 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1 完整日志输出如下: [root@node02 scripts]# masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=10.43.20.101 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1 Tue Jan 14 10:57:18 2025 - [info] MHA::MasterRotate version 0.58. Tue Jan 14 10:57:18 2025 - [info] Starting online master switch.. Tue Jan 14 10:57:18 2025 - [info] Tue Jan 14 10:57:18 2025 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 14 10:57:18 2025 - [info] Tue Jan 14 10:57:18 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf.. Tue Jan 14 10:57:18 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:57:18 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf.. Tue Jan 14 10:57:20 2025 - [info] GTID failover mode = 0 Tue Jan 14 10:57:20 2025 - [info] Current Alive Master: 10.43.20.100(10.43.20.100:3306) Tue Jan 14 10:57:20 2025 - [info] Alive Slaves: Tue Jan 14 10:57:20 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 10:57:20 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 10:57:20 2025 - [info] Primary candidate for the new Master (candidate_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.43.20.100(10.43.20.100:3306)? (YES/no): yes Tue Jan 14 10:57:22 2025 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Jan 14 10:57:22 2025 - [info] ok. Tue Jan 14 10:57:22 2025 - [info] Checking MHA is not monitoring or doing failover.. Tue Jan 14 10:57:22 2025 - [info] Checking replication health on 10.43.20.101.. Tue Jan 14 10:57:22 2025 - [info] ok. Tue Jan 14 10:57:22 2025 - [info] 10.43.20.101 can be new master. Tue Jan 14 10:57:22 2025 - [info] From: 10.43.20.100(10.43.20.100:3306) (current master) +--10.43.20.101(10.43.20.101:3306) To: 10.43.20.101(10.43.20.101:3306) (new master) +--10.43.20.100(10.43.20.100:3306) Starting master switch from 10.43.20.100(10.43.20.100:3306) to 10.43.20.101(10.43.20.101:3306)? (yes/NO): yes Tue Jan 14 10:57:23 2025 - [info] Checking whether 10.43.20.101(10.43.20.101:3306) is ok for the new master.. Tue Jan 14 10:57:23 2025 - [info] ok. Tue Jan 14 10:57:23 2025 - [info] 10.43.20.100(10.43.20.100:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Tue Jan 14 10:57:24 2025 - [info] 10.43.20.100(10.43.20.100:3306): Resetting slave pointing to the dummy host. Tue Jan 14 10:57:24 2025 - [info] ** Phase 1: Configuration Check Phase completed. Tue Jan 14 10:57:24 2025 - [info] Tue Jan 14 10:57:24 2025 - [info] * Phase 2: Rejecting updates Phase.. Tue Jan 14 10:57:24 2025 - [info] Tue Jan 14 10:57:24 2025 - [info] Executing master ip online change script to disable write on the current master: Tue Jan 14 10:57:24 2025 - [info] /data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22 --command=stop --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Tue Jan 14 10:57:24 2025 165131 Set read_only on the new master.. ok. Tue Jan 14 10:57:24 2025 172860 Drpping app user on the orig master.. Tue Jan 14 10:57:24 2025 173679 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds) {'Time' => '55381','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'} Tue Jan 14 10:57:24 2025 676107 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds) {'Time' => '55381','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'} Tue Jan 14 10:57:25 2025 175901 Waiting all running 1 threads are disconnected.. (max 500 milliseconds) {'Time' => '55382','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'} Tue Jan 14 10:57:25 2025 676904 Set read_only=1 on the orig master.. ok. Tue Jan 14 10:57:25 2025 679971 Waiting all running 1 queries are disconnected.. (max 500 milliseconds) {'Time' => '55382','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'} Tue Jan 14 10:57:26 2025 178121 Killing all application threads.. Tue Jan 14 10:57:26 2025 179579 done. Disabling the VIP on old master: 10.43.20.100 Tue Jan 14 10:57:26 2025 - [info] ok. Tue Jan 14 10:57:26 2025 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Tue Jan 14 10:57:26 2025 - [info] Executing FLUSH TABLES WITH READ LOCK.. Tue Jan 14 10:57:26 2025 - [info] ok. Tue Jan 14 10:57:26 2025 - [info] Orig master binlog:pos is mysqld-bin.000005:14752736. Tue Jan 14 10:57:26 2025 - [info] Waiting to execute all relay logs on 10.43.20.101(10.43.20.101:3306).. Tue Jan 14 10:57:26 2025 - [info] master_pos_wait(mysqld-bin.000005:14752736) completed on 10.43.20.101(10.43.20.101:3306). Executed 0 events. Tue Jan 14 10:57:26 2025 - [info] done. Tue Jan 14 10:57:26 2025 - [info] Getting new master's binlog name and position.. Tue Jan 14 10:57:26 2025 - [info] mysqld-bin.000003:14752781 Tue Jan 14 10:57:26 2025 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.43.20.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000003', MASTER_LOG_POS=14752781, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Jan 14 10:57:26 2025 - [info] Executing master ip online change script to allow write on the new master: Tue Jan 14 10:57:26 2025 - [info] /data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22 --command=start --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Enabling the VIP - 10.43.20.102 on the new master - 10.43.20.101 Tue Jan 14 10:57:27 2025 053683 Set read_only=0 on the new master. Tue Jan 14 10:57:27 2025 054082 Creating app user on the new master.. Tue Jan 14 10:57:27 2025 - [info] ok. Tue Jan 14 10:57:27 2025 - [info] Tue Jan 14 10:57:27 2025 - [info] * Switching slaves in parallel.. Tue Jan 14 10:57:27 2025 - [info] Tue Jan 14 10:57:27 2025 - [info] Unlocking all tables on the orig master: Tue Jan 14 10:57:27 2025 - [info] Executing UNLOCK TABLES.. Tue Jan 14 10:57:27 2025 - [info] ok. Tue Jan 14 10:57:27 2025 - [info] Starting orig master as a new slave.. Tue Jan 14 10:57:27 2025 - [info] Resetting slave 10.43.20.100(10.43.20.100:3306) and starting replication from the new master 10.43.20.101(10.43.20.101:3306).. Tue Jan 14 10:57:27 2025 - [info] Executed CHANGE MASTER. Tue Jan 14 10:57:28 2025 - [info] Slave started. Tue Jan 14 10:57:28 2025 - [info] All new slave servers switched successfully. Tue Jan 14 10:57:28 2025 - [info] Tue Jan 14 10:57:28 2025 - [info] * Phase 5: New master cleanup phase.. Tue Jan 14 10:57:28 2025 - [info] Tue Jan 14 10:57:28 2025 - [info] 10.43.20.101: Resetting slave info succeeded. Tue Jan 14 10:57:28 2025 - [info] Switching master to 10.43.20.101(10.43.20.101:3306) completed successfully. 查看备节点ip: [root@node02 scripts]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:eb:6b:8f brd ff:ff:ff:ff:ff:ff inet 10.43.20.101/24 brd 10.43.20.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet 10.43.20.102/24 scope global secondary ens33 valid_lft forever preferred_lft forever inet6 fe80::6e7:5185:9e7b:c840/64 scope link noprefixroute valid_lft forever preferred_lft forever VIP已绑定,主备切换成功。 [root@localhost][(none)]> show slave status\G Empty set, 1 warning (0.00 sec) 备节点已没有slave状态。 [root@localhost][(none)]> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 100 | | 3306 | 101 | 283a32ac-d194-11ef-91ab-000c29775d9b | +-----------+------+------+-----------+--------------------------------------+ 1 row in set, 1 warning (0.03 sec) 100原主库已成为备节点。 4.8模拟101主库宕机 [root@node01 scripts]# pkill -u mysql [root@node01 scripts]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:77:5d:9b brd ff:ff:ff:ff:ff:ff inet 10.43.20.100/24 brd 10.43.20.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet6 fe80::6e7:5185:9e7b:c840/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::79c0:be13:f42:d85e/64 scope link noprefixroute valid_lft forever preferred_lft forever manager日志输出: Tue Jan 14 11:36:36 2025 - [info] Master is not reachable from all other monitoring servers. Failover should start. Tue Jan 14 11:36:36 2025 - [info] HealthCheck: SSH to 10.43.20.100 is reachable. Tue Jan 14 11:36:38 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111)) Tue Jan 14 11:36:38 2025 - [warning] Connection failed 2 time(s).. Tue Jan 14 11:36:40 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111)) Tue Jan 14 11:36:40 2025 - [warning] Connection failed 3 time(s).. Tue Jan 14 11:36:42 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111)) Tue Jan 14 11:36:42 2025 - [warning] Connection failed 4 time(s).. Tue Jan 14 11:36:42 2025 - [warning] Master is not reachable from health checker! Tue Jan 14 11:36:42 2025 - [warning] Master 10.43.20.100(10.43.20.100:3306) is not reachable! Tue Jan 14 11:36:42 2025 - [warning] SSH is reachable. Tue Jan 14 11:36:42 2025 - [info] Connecting to a master server failed. Reading configuration file /data/mha/conf/masterha.cnf and /data/mha/conf/app.cnf again, and trying to connect to all servers to check server status.. Tue Jan 14 11:36:42 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf.. Tue Jan 14 11:36:42 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf.. Tue Jan 14 11:36:42 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf.. Tue Jan 14 11:36:43 2025 - [info] GTID failover mode = 0 Tue Jan 14 11:36:43 2025 - [info] Dead Servers: Tue Jan 14 11:36:43 2025 - [info] 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:43 2025 - [info] Alive Servers: Tue Jan 14 11:36:43 2025 - [info] 10.43.20.101(10.43.20.101:3306) Tue Jan 14 11:36:43 2025 - [info] Alive Slaves: Tue Jan 14 11:36:43 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 11:36:43 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:43 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 11:36:43 2025 - [info] Checking slave configurations.. Tue Jan 14 11:36:43 2025 - [info] Checking replication filtering settings.. Tue Jan 14 11:36:43 2025 - [info] Replication filtering check ok. Tue Jan 14 11:36:43 2025 - [info] Master is down! Tue Jan 14 11:36:43 2025 - [info] Terminating monitoring script. Tue Jan 14 11:36:43 2025 - [info] Got exit code 20 (Master dead). Tue Jan 14 11:36:43 2025 - [info] MHA::MasterFailover version 0.58. Tue Jan 14 11:36:43 2025 - [info] Starting master failover. Tue Jan 14 11:36:43 2025 - [info] Tue Jan 14 11:36:43 2025 - [info] * Phase 1: Configuration Check Phase.. Tue Jan 14 11:36:43 2025 - [info] Tue Jan 14 11:36:44 2025 - [info] GTID failover mode = 0 Tue Jan 14 11:36:44 2025 - [info] Dead Servers: Tue Jan 14 11:36:44 2025 - [info] 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:44 2025 - [info] Checking master reachability via MySQL(double check)... Tue Jan 14 11:36:44 2025 - [info] ok. Tue Jan 14 11:36:44 2025 - [info] Alive Servers: Tue Jan 14 11:36:44 2025 - [info] 10.43.20.101(10.43.20.101:3306) Tue Jan 14 11:36:44 2025 - [info] Alive Slaves: Tue Jan 14 11:36:44 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 11:36:44 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:44 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 11:36:44 2025 - [info] Starting Non-GTID based failover. Tue Jan 14 11:36:44 2025 - [info] Tue Jan 14 11:36:44 2025 - [info] ** Phase 1: Configuration Check Phase completed. Tue Jan 14 11:36:44 2025 - [info] Tue Jan 14 11:36:44 2025 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Jan 14 11:36:44 2025 - [info] Tue Jan 14 11:36:44 2025 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Jan 14 11:36:44 2025 - [info] Executing master IP deactivation script: Tue Jan 14 11:36:44 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --command=stopssh --ssh_user=root Disabling the VIP on old master: 10.43.20.100 Tue Jan 14 11:36:45 2025 - [info] done. Tue Jan 14 11:36:45 2025 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Jan 14 11:36:45 2025 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3: Master Recovery Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] The latest binary log file/position on all slaves is mysqld-bin.000005:14752736 Tue Jan 14 11:36:45 2025 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 11:36:45 2025 - [info] The oldest binary log file/position on all slaves is mysqld-bin.000005:14752736 Tue Jan 14 11:36:45 2025 - [info] Oldest slaves: Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] Fetching dead master's binary logs.. Tue Jan 14 11:36:45 2025 - [info] Executing command on the dead master 10.43.20.100(10.43.20.100:3306): save_binary_logs --command=save --start_file=mysqld-bin.000005 --start_pos=14752736 --binlog_dir=/data/mysqldata/data --output_file=/data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 Creating /data/mha/masterha if not exists.. ok. Concat binary/relay logs from mysqld-bin.000005 pos 14752736 to mysqld-bin.000005 EOF into /data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 157.. ok. No need to dump effective binlog data from /data/mysqldata/data/mysqld-bin.000005 (pos starts 14752736, filesize 14752736). Skipping. ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 31, event_type: 35 Binlog Checksum enabled /data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog has no effective data events. Event not exists. Tue Jan 14 11:36:45 2025 - [info] Additional events were not found from the orig master. No need to save. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3.3: Determining New Master Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue Jan 14 11:36:45 2025 - [info] All slaves received relay logs to the same position. No need to resync each other. Tue Jan 14 11:36:45 2025 - [info] Searching new master from slaves.. Tue Jan 14 11:36:45 2025 - [info] Candidate masters from the configuration file: Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306) Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jan 14 11:36:45 2025 - [info] Non-candidate masters: Tue Jan 14 11:36:45 2025 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Jan 14 11:36:45 2025 - [info] New master is 10.43.20.101(10.43.20.101:3306) Tue Jan 14 11:36:45 2025 - [info] Starting master failover.. Tue Jan 14 11:36:45 2025 - [info] From: 10.43.20.100(10.43.20.100:3306) (current master) +--10.43.20.101(10.43.20.101:3306) To: 10.43.20.101(10.43.20.101:3306) (new master) Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3.4: New Master Diff Log Generation Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] * Phase 3.5: Master Log Apply Phase.. Tue Jan 14 11:36:45 2025 - [info] Tue Jan 14 11:36:45 2025 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue Jan 14 11:36:45 2025 - [info] Starting recovery on 10.43.20.101(10.43.20.101:3306).. Tue Jan 14 11:36:45 2025 - [info] This server has all relay logs. Waiting all logs to be applied.. Tue Jan 14 11:36:45 2025 - [info] done. Tue Jan 14 11:36:45 2025 - [info] All relay logs were successfully applied. Tue Jan 14 11:36:45 2025 - [info] Getting new master's binlog name and position.. Tue Jan 14 11:36:45 2025 - [info] mysqld-bin.000007:157 Tue Jan 14 11:36:45 2025 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.43.20.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000007', MASTER_LOG_POS=157, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Jan 14 11:36:45 2025 - [info] Executing master IP activate script: Tue Jan 14 11:36:45 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --command=start --ssh_user=root --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxx Enabling the VIP - 10.43.20.102 on the new master - 10.43.20.101 Set read_only=0 on the new master. Creating app user on the new master.. Tue Jan 14 11:36:46 2025 - [info] OK. Tue Jan 14 11:36:46 2025 - [info] ** Finished master recovery successfully. Tue Jan 14 11:36:46 2025 - [info] * Phase 3: Master Recovery Phase completed. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] * Phase 4: Slaves Recovery Phase.. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] Generating relay diff files from the latest slave succeeded. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] All new slave servers recovered successfully. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] * Phase 5: New master cleanup phase.. Tue Jan 14 11:36:46 2025 - [info] Tue Jan 14 11:36:46 2025 - [info] Resetting slave info on the new master.. Tue Jan 14 11:36:46 2025 - [info] 10.43.20.101: Resetting slave info succeeded. Tue Jan 14 11:36:46 2025 - [info] Master failover to 10.43.20.101(10.43.20.101:3306) completed successfully. Tue Jan 14 11:36:46 2025 - [info] ----- Failover Report ----- app: MySQL Master failover 10.43.20.100(10.43.20.100:3306) to 10.43.20.101(10.43.20.101:3306) succeeded Master 10.43.20.100(10.43.20.100:3306) is down! Check MHA Manager logs at node02:/data/mha/manager/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.43.20.100(10.43.20.100:3306) The latest slave 10.43.20.101(10.43.20.101:3306) has all relay logs for recovery. Selected 10.43.20.101(10.43.20.101:3306) as a new master. 10.43.20.101(10.43.20.101:3306): OK: Applying all logs succeeded. 10.43.20.101(10.43.20.101:3306): OK: Activated master IP address. Generating relay diff files from the latest slave succeeded. 10.43.20.101(10.43.20.101:3306): Resetting slave info succeeded. Master failover to 10.43.20.101(10.43.20.101:3306) completed successfully. 免密脚本: #!/usr/bin/env bash ## Config to do nodes="node11,node12,node13,node14,node15" # 可以在这里添加其他节点,用逗号隔开 user=test ## Please Don't edit content below ssh-keygen -q -P "" -f $HOME/.ssh/id_rsa > /dev/null # 将节点列表转换为数组 nodes_array=(${nodes//,/ }) # 定义一个函数来处理每个节点 setup_ssh_trust() { local node=$1 if [ "`hostname`" == "$node" ]; then ssh-copy-id -o StrictHostKeyChecking=no $user@$node > /dev/null else ssh-copy-id -o StrictHostKeyChecking=no $user@$node > /dev/null ssh $node 'ssh-keygen -q -P "" -f $HOME/.ssh/id_rsa' > /dev/null scp -rp $node:$HOME/.ssh/id_rsa.pub ./auth.$node > /dev/null fi } # 处理所有节点 for node in "${nodes_array[@]}" do setup_ssh_trust $node done # 合并所有公钥到 authorized_keys cat ./auth.* >> $HOME/.ssh/authorized_keys rm -rf ./auth.* echo "Transfer authorized_keys" # 将 authorized_keys 和 known_hosts 传输到所有节点 for node in "${nodes_array[@]}" do if [ "`hostname`" != "$node" ]; then scp -rp $HOME/.ssh/authorized_keys $user@$node:$HOME/.ssh/authorized_keys scp -rp $HOME/.ssh/known_hosts $user@$node:$HOME/.ssh/known_hosts fi done exit 0
标签:info,Jan,14,mha,centos7,2025,10.43,Tue,节点 From: https://www.cnblogs.com/elanjie/p/18670480