MySQL MHA高可用
什么是MHA
MHA(Master High Availability)是用于提高数据库集群可用性的方案,基于虚拟IP技术,可以实现在Master故障时,在30s内实现故障转移并确保数据一致性。可以理解为MHA就是作用于数据库集群的KeepAlived。
MHA工作原理
MHA的组成
MHA由MHA Manager和MHA Node组成,其中Manager相当于服务端,Node相当于客户端;
Manager可以部署在Master或Slave机器上,也可以单独的部署在一台机器上;一般建议部署在一台单独的机器上,因为Manager是用于探测数据库实例是否在线和进行故障转移的核心程序,如果部署在Master上,一旦Master机器报销,整个数据库集群的不再可以;另外,如果部署在Slave机器上,那么这台机器就不能够被提升为Master。
Node需要部署在所有MySQL数据库实例的机器上。

MHA故障转移流程
- Manager会每隔一段时间(可以由用户在配置文件中指定)对主库进行一次探测;
- 如果Manager检测到Master故障,会执行:
- 尝试从其他Node发起ssh链接;
- 尝试从其他Node发起mysql链接。
- 如果两种链接皆失败,则开始进行故障转移:
- 通过对比relay log,找到拥有最新数据的从库;
- 将最新的从库中的新数据同步到其他从库中;
- 提升从库为新的主库(如果没有预先指定,则提升拥有最新数据的从库为新主库);
- 通过原主库的binlog补全新主库的数据;
- 其他从库CHANGE MASTER TO新的主库,并进行数据同步。
部署MHA
环境准备
使用三台机器,其中MySQL一主一从,Manager单独布置在一台管理机器上。
| 身份 | 主机名 | 公网ip | 内网ip | 开启binlog | 开启GTID |
|---|---|---|---|---|---|
| Master | db-51 | 10.0.0.51 | 172.16.1.51 | yes | yes |
| Slave | db-52 | 10.0.0.52 | 172.16.1.52 | yes | yes |
| Manager | master-61 | 10.0.0.61 | 172.16.1.61 | - | - |
配置文件
[mysqld]port=3306user=mysqlbasedir=/opt/mysqldatadir=/mysql_data/mysql_3306socket=/tmp/mysql.sock
server_id=51log_bin=/mysql_data/binlog/mysql-bin
autocommit=0binlog_format=rowgtid-mode=onenforce-gtid-consistency=truelog-slave-updates=1
[mysql]socket=/tmp/mysql.sock除了server_id其他都一样
配置主从复制
创建复制账号
--在master机器上执行--grant replication slave on*.* to repl@'172.16.1.%' identified by '123456';从库进行连接
change master to master_host='172.16.1.51', master_user='repl', master_password='123456' , MASTER_AUTO_POSITION=1;
start slave;使三个节点之间互相免密登录:
ssh-keygenssh-copy-id root@172.16.1.51
ssh-keygenssh-copy-id root@172.16.1.52
ssh-keygenssh-copy-id root@172.16.1.61安装MHA
为所有节点安装环境依赖
yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN perl-Time-HiRes三个节点都安装MHA Node
# 获得软件源wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
# 安装yum localinstall -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm安装MHA Manager
MHA Manager Github仓库
这里由于只用了两台机器,因此Manager必须装在另外单独的一台主机上,否则对于一主一从配置来说,装在主库上,挂掉后会导致MHA失效;装在从库上,会导致无法提升为主库;
# 获取软件源wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
# 安装yum localinstall -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm配置MHA
为所有节点创建MHA用户
因为目前已经配置好了主从复制,所以在主库上创建就可以了,会自动同步到从库。
grant all privileges on *.* to mha@'%' identified by '123456';创建MHA Manager配置文件
创建目录
mkdir -p /etc/mhamkdir -p /var/log/mha/app1配置文件
[server default]manager_log=/var/log/mha/app1/manager.logmanager_workdir=/var/log/mha/app1.logmaster_binlog_dir=/mysql_binlog/
# 这里涉及到vip偏移脚本,在后面# master_ip_failover_script=/usr/local/bin/master_ip_failover
user=mhapassword=123456ping_interval=5repl_user=replrepl_password=123456ssh_user=root
[server1]hostname=172.16.1.51port=3306
[server2]hostname=172.16.1.52port=3306检查环境配置
# 检查ssh免密登录masterha_check_ssh --conf=/etc/mha/app1.cnf
# 检查主从复制masterha_check_repl --conf=/etc/mha/app1.cnf开发Virtual IP漂移脚本
脚本路径:/usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;use warnings FATAL => 'all';
use Getopt::Long;
my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port);
my $vip = '172.16.1.55/24';my $key = '1';my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";my $ssh_Bcast_arp="/sbin/arping -I eth1 -c 3 -A 10.0.0.55";
GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port,);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) {
my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; }}
sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;}sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;}
sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";}增加可执行权限:
chmod +x /usr/local/bin/master_ip_failover将脚本路径加入到配置文件中:
master_ip_failover_script=/usr/local/bin/master_ip_failover
配置虚拟IP
在db-51机器上创建虚拟IP
# 这里的虚拟IP必须是和perl脚本上的一致,并且虚拟ip必须是网段内未被占用的IPifconfig eth1:1 172.16.1.55/24重启MHA
# 停止MHAmasterha_stop --conf=/etc/mha/app1.cnf
# 启动MHA(后台运行)nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/mha/app1/manager.log 2>&1 &
# 检查MHA运行状态masterha_check_status --conf=/etc/mha/app1.cnfapp1 (pid:2220) is running(0:PING_OK), master:172.16.1.51此时MHA就已经部署完毕了。
模拟故障
实时查看manager日志:
tail -f /var/log/mha/app1/manager.log停止当前mysql主库
systemctl stop mysqld此时查看日志,可以发现manager尝试多次连接mysql主库失败,manager进行了主库身份转移。
Tue Aug 22 15:56:07 2023 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Tue Aug 22 15:56:07 2023 - [info] Executing SSH check script: exit 0Tue Aug 22 15:56:07 2023 - [info] HealthCheck: SSH to 172.16.1.51 is reachable.Tue Aug 22 15:56:12 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.1.51' (111))Tue Aug 22 15:56:12 2023 - [warning] Connection failed 2 time(s)..Tue Aug 22 15:56:17 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.1.51' (111))Tue Aug 22 15:56:17 2023 - [warning] Connection failed 3 time(s)..Tue Aug 22 15:56:22 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.1.51' (111))Tue Aug 22 15:56:22 2023 - [warning] Connection failed 4 time(s)..Tue Aug 22 15:56:22 2023 - [warning] Master is not reachable from health checker!Tue Aug 22 15:56:22 2023 - [warning] Master 172.16.1.51(172.16.1.51:3306) is not reachable!Tue Aug 22 15:56:22 2023 - [warning] SSH is reachable.Tue Aug 22 15:56:22 2023 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..Tue Aug 22 15:56:22 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Tue Aug 22 15:56:22 2023 - [info] Reading application default configuration from /etc/mha/app1.cnf..Tue Aug 22 15:56:22 2023 - [info] Reading server configuration from /etc/mha/app1.cnf..Tue Aug 22 15:56:23 2023 - [info] GTID failover mode = 1Tue Aug 22 15:56:23 2023 - [info] Dead Servers:Tue Aug 22 15:56:23 2023 - [info] 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:23 2023 - [info] Alive Servers:Tue Aug 22 15:56:23 2023 - [info] 172.16.1.52(172.16.1.52:3306)Tue Aug 22 15:56:23 2023 - [info] Alive Slaves:Tue Aug 22 15:56:23 2023 - [info] 172.16.1.52(172.16.1.52:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabledTue Aug 22 15:56:23 2023 - [info] GTID ONTue Aug 22 15:56:23 2023 - [info] Replicating from 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:23 2023 - [info] Checking slave configurations..Tue Aug 22 15:56:23 2023 - [info] read_only=1 is not set on slave 172.16.1.52(172.16.1.52:3306).Tue Aug 22 15:56:23 2023 - [info] Checking replication filtering settings..Tue Aug 22 15:56:23 2023 - [info] Replication filtering check ok.Tue Aug 22 15:56:23 2023 - [info] Master is down!Tue Aug 22 15:56:23 2023 - [info] Terminating monitoring script.Tue Aug 22 15:56:23 2023 - [info] Got exit code 20 (Master dead).Tue Aug 22 15:56:23 2023 - [info] MHA::MasterFailover version 0.58.Tue Aug 22 15:56:23 2023 - [info] Starting master failover.Tue Aug 22 15:56:23 2023 - [info]Tue Aug 22 15:56:23 2023 - [info] * Phase 1: Configuration Check Phase..Tue Aug 22 15:56:23 2023 - [info]Tue Aug 22 15:56:24 2023 - [info] GTID failover mode = 1Tue Aug 22 15:56:24 2023 - [info] Dead Servers:Tue Aug 22 15:56:24 2023 - [info] 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:24 2023 - [info] Checking master reachability via MySQL(double check)...Tue Aug 22 15:56:24 2023 - [info] ok.Tue Aug 22 15:56:24 2023 - [info] Alive Servers:Tue Aug 22 15:56:24 2023 - [info] 172.16.1.52(172.16.1.52:3306)Tue Aug 22 15:56:24 2023 - [info] Alive Slaves:Tue Aug 22 15:56:24 2023 - [info] 172.16.1.52(172.16.1.52:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabledTue Aug 22 15:56:24 2023 - [info] GTID ONTue Aug 22 15:56:24 2023 - [info] Replicating from 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:24 2023 - [info] Starting GTID based failover.Tue Aug 22 15:56:24 2023 - [info]Tue Aug 22 15:56:24 2023 - [info] ** Phase 1: Configuration Check Phase completed.Tue Aug 22 15:56:24 2023 - [info]Tue Aug 22 15:56:24 2023 - [info] * Phase 2: Dead Master Shutdown Phase..Tue Aug 22 15:56:24 2023 - [info]Tue Aug 22 15:56:24 2023 - [info] Forcing shutdown so that applications never connect to the current master..Tue Aug 22 15:56:24 2023 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.Tue Aug 22 15:56:24 2023 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Tue Aug 22 15:56:25 2023 - [info] * Phase 2: Dead Master Shutdown Phase completed.Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 3: Master Recovery Phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 3.1: Getting Latest Slaves Phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:725Tue Aug 22 15:56:25 2023 - [info] Retrieved Gtid Set: 06fdc979-3cb2-11ee-8345-000c29948844:1-2Tue Aug 22 15:56:25 2023 - [info] Latest slaves (Slaves that received relay log files to the latest):Tue Aug 22 15:56:25 2023 - [info] 172.16.1.52(172.16.1.52:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabledTue Aug 22 15:56:25 2023 - [info] GTID ONTue Aug 22 15:56:25 2023 - [info] Replicating from 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:25 2023 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:725Tue Aug 22 15:56:25 2023 - [info] Retrieved Gtid Set: 06fdc979-3cb2-11ee-8345-000c29948844:1-2Tue Aug 22 15:56:25 2023 - [info] Oldest slaves:Tue Aug 22 15:56:25 2023 - [info] 172.16.1.52(172.16.1.52:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabledTue Aug 22 15:56:25 2023 - [info] GTID ONTue Aug 22 15:56:25 2023 - [info] Replicating from 172.16.1.51(172.16.1.51:3306)Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 3.3: Determining New Master Phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] Searching new master from slaves..Tue Aug 22 15:56:25 2023 - [info] Candidate masters from the configuration file:Tue Aug 22 15:56:25 2023 - [info] Non-candidate masters:Tue Aug 22 15:56:25 2023 - [info] New master is 172.16.1.52(172.16.1.52:3306)Tue Aug 22 15:56:25 2023 - [info] Starting master failover..Tue Aug 22 15:56:25 2023 - [info]From:172.16.1.51(172.16.1.51:3306) (current master) +--172.16.1.52(172.16.1.52:3306)
To:172.16.1.52(172.16.1.52:3306) (new master)Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 3.3: New Master Recovery Phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] Waiting all logs to be applied..Tue Aug 22 15:56:25 2023 - [info] done.Tue Aug 22 15:56:25 2023 - [info] Getting new master's binlog name and position..Tue Aug 22 15:56:25 2023 - [info] mysql-bin.000001:725Tue Aug 22 15:56:25 2023 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Aug 22 15:56:25 2023 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 725, 06fdc979-3cb2-11ee-8345-000c29948844:1-2Tue Aug 22 15:56:25 2023 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.Tue Aug 22 15:56:25 2023 - [info] ** Finished master recovery successfully.Tue Aug 22 15:56:25 2023 - [info] * Phase 3: Master Recovery Phase completed.Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 4: Slaves Recovery Phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 4.1: Starting Slaves in parallel..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] All new slave servers recovered successfully.Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] * Phase 5: New master cleanup phase..Tue Aug 22 15:56:25 2023 - [info]Tue Aug 22 15:56:25 2023 - [info] Resetting slave info on the new master..Tue Aug 22 15:56:25 2023 - [info] 172.16.1.52: Resetting slave info succeeded.Tue Aug 22 15:56:25 2023 - [info] Master failover to 172.16.1.52(172.16.1.52:3306) completed successfully.Tue Aug 22 15:56:25 2023 - [info] Deleted server1 entry from /etc/mha/app1.cnf .Tue Aug 22 15:56:25 2023 - [info]结果报告:
----- Failover Report -----
app1: MySQL Master failover 172.16.1.51(172.16.1.51:3306) to 172.16.1.52(172.16.1.52:3306) succeeded
Master 172.16.1.51(172.16.1.51:3306) is down!
Check MHA Manager logs at master-61:/var/log/mha/app1/manager.log for details.
Started automated(non-interactive) failover.Selected 172.16.1.52(172.16.1.52:3306) as a new master.172.16.1.52(172.16.1.52:3306): OK: Applying all logs succeeded.172.16.1.52(172.16.1.52:3306): Resetting slave info succeeded.Master failover to 172.16.1.52(172.16.1.52:3306) completed successfully.在完成故障转移后,可以看到现在的master是db-52;同时,manager还修改了自己的配置文件,挂掉的原master被删除
[server default]manager_log=/var/log/mha/app1/manager.logmanager_workdir=/var/log/mha/app1.logmaster_binlog_dir=/mysql_binlog/password=123456ping_interval=5repl_password=123456repl_user=replssh_user=rootuser=mha
[server2]hostname=172.16.1.52port=3306除此之外,通过masterha_check_status命令还可以知道,一旦mha完成主从切换,manager管理进程就会自动终止。
[root@master-61 ~]#masterha_check_status --conf=/etc/mha/app1.cnfapp1 is stopped(2:NOT_RUNNING).MHA故障修复
方法比较简单
- 重启挂掉的mysql数据库
- 通过
CHANGE MASTER TO使其成为新的主库的从库。 - 修复MHA Manager的配置文件
db-51重新加入主从复制
systemctl start mysqld
change master to master_host='172.16.1.52', master_user='repl', master_password='123456' , MASTER_AUTO_POSITION=1;
start slave;可以在db-52上进行确认:
mysql> show slave hosts;+-----------+------+------+-----------+--------------------------------------+| Server_id | Host | Port | Master_id | Slave_UUID |+-----------+------+------+-----------+--------------------------------------+| 51 | | 3306 | 52 | 06fdc979-3cb2-11ee-8345-000c29948844 |+-----------+------+------+-----------+--------------------------------------+1 row in set (0.00 sec)修复MHA Manager配置文件
[server default]manager_log=/var/log/mha/app1/manager.logmanager_workdir=/var/log/mha/app1.logmaster_binlog_dir=/mysql_binlog/password=123456ping_interval=5repl_password=123456repl_user=replssh_user=rootuser=mha
[server1]hostname=172.16.1.51port=3306
[server2]hostname=172.16.1.52port=3306再次确认ssh链接和repl主从复制状态后,重新启动MHA就可以了。
支持与分享
如果这篇文章对你有帮助,欢迎分享给更多人或赞助支持!
部分内容可能已过时