Home > Database > Mysql Tutorial > Detailed explanation of MySQL high availability solution MMM

Detailed explanation of MySQL high availability solution MMM

黄舟
Release: 2017-10-04 09:26:27
Original
1954 people have browsed it

MySQL itself does not provide a replication failover solution. Server failover can be achieved through the MMM solution, thereby achieving high availability of mysql. MMM not only provides the function of floating IP, but also if the current master server hangs up, your back-end slave server will automatically be transferred to the new master server for synchronous replication without having to manually change the synchronization configuration

一, MMM introduction:

MMM is Multi-Master Replication Manager for MySQL: mysql multi-master replication manager, based on perl implementation, a set of monitoring, failover and management of mysql master-master replication configuration Scalable script suite (only one node can be written to at any time), MMM can also read load balance the slave servers, so it can be used to start virtual IPs on a group of servers for replication, among other things , it also has scripts to implement data backup and resynchronization functions between nodes. MySQL itself does not provide a replication failover solution. Server failover can be achieved through the MMM solution, thereby achieving high availability of MySQL. MMM not only provides the function of floating IP, but also if the current master server hangs up, your back-end slave server will automatically be transferred to the new master server for synchronous replication without having to manually change the synchronization configuration. This solution is currently a relatively mature solution. For details, please see the official website: http://mysql-mmm.org

##Advantages: High availability, good scalability, automatic switching in case of failure, for Master-master synchronization only provides one database write operation at the same time to ensure data consistency. When the master server hangs up, another master immediately takes over, and other slave servers can automatically switch without manual intervention.

Disadvantages: The monitor node is a single point, but you can also combine this with keepalived or haertbeat to make it highly available; there are at least three nodes, which requires the number of hosts and needs to be read and written. Separation requires writing a read-write separation program on the front end. The performance is not very stable in business systems that are very busy in reading and writing, and problems such as replication delays and switching failures may occur. The MMM solution is not very suitable for environments with high data security requirements and busy reading and writing.

Applicable scenarios:

The applicable scenarios for MMM are scenarios where the database access is large and reading and writing can be separated.

The main functions of Mmm are provided by the following three scripts:
mmm_mond is the monitoring daemon process responsible for all monitoring work and determines the removal of nodes (mmm_mond process performs regular heartbeat detection, and if it fails, the write ip will be floated to another master ) etc.
mmm_agentd is an agent daemon running on the mysql server, and is provided to the monitoring node through a simple remote service set
mmm_control manages the mmm_mond process through the command line
During the entire monitoring process, you need to add it to mysql Relevant authorized users include an mmm_monitor user and an mmm_agent user. If you want to use mmm's backup tool, you must also add an mmm_tools user.

2. Deployment and implementation

1. Environment introduction

OS: centos7.2 (64-bit) Database system: mysql5.7.13

Close selinux

Configure ntp, synchronize time

##roleIPhostnameServer-id##Master1##master2 2##Slave1192.168.31.250192.168.31.4slave24monitormonitor1

##Write vip

Read vip

192.168.31.83

master1

1

##192.168.31.2


Master2(backup)
192.168.31.141

#192.168.31.3


slave1

##3


Slave2

##192.168.31.225

##192.168.31.5


192.168.31.106

none

2. Configure the /etc/hosts file on all hosts and add the following content:

192.168.31.83 master1
192.168.31.141 master2
192.168.31.250 slave1
192.168.31.225 slave2
192.168.31.106 monitor1

Install perl, perl-develperl-CPAN libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64 package on all hosts
#yum -y install perl-* libart_lgpl .x86_64 rrdtool.x86_64 rrdtool-perl.x86_64

Note: Use centos7 online yum source installation

Install perl related libraries

#cpan - i Algorithm::Diff Class::Singleton DBI DBD::mysql Log::Dispatch Log::Log4perl Mail::Send Net::Ping Proc::Daemon Time::HiRes Params::Validate Net::ARP

3. Install mysql5.7 and configure replication on the master1, master2, slave1, and slave2 hosts.

master1 and master2 are mutually master and slave, and slave1 and slave2 are the slaves of master1.
In each mysql Add the following content to the configuration file /etc/my.cnf. Note that the server_id cannot be repeated.

master1 host:


log-bin = mysql-bin
binlog_format = mixed
server-id = 1
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
master2主机:
log-bin = mysql-bin
binlog_format = mixed
server-id = 2
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 2
slave1主机:
server-id = 3
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
read_only  = 1
slave2主机:
server-id = 4
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
read_only  = 1
Copy after login

After completing the modification to my.cnf, restart the mysql service through systemctl restart mysqld

If you want to enable the firewall on the 4 database hosts, you must either turn off the firewall or create access rules:

firewall-cmd --permanent --add-port=3306/tcp
firewall-cmd --reload
Master-slave configuration (master1 and master2 are configured as master-master, slave1 and slave2 are configured as slaves of master1):
Authorization on master1:

mysql> grant replication slave on *.* to rep@'192.168.31.%' identified by '123456';
Copy after login
Copy after login


Authorization on master2:

mysql> grant replication slave on *.* to rep@'192.168.31.%' identified by '123456';
Copy after login
Copy after login


Configure master2, slave1 and slave2 as slave libraries of master1:
Execute show master status on master1; obtain the binlog file and Position point

mysql> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+---------------------------------------------------+
| mysql-bin.000001 | 452 | | | |
+------------------+----------+--------------+------------------+-----------------------------------------------------+
Copy after login

Execute on master2, slave1 and slave2

mysql> change master to master_host='192.168.31.83',master_port=3306,master_user='rep',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=452;
mysql>slave start;
Copy after login

Verify master-slave replication:
master2 host:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.83
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy after login
Copy after login
Copy after login

slave1 host:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.83
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy after login
Copy after login
Copy after login

slave2 host:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.83
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy after login
Copy after login
Copy after login

If Slave_IO_Running and Slave_SQL_Running are both yes, then the master-slave configuration is OK
Configure master1 as the slave library of master2:
Execute on master2 show master status; Get binlog file and Position point

mysql> show master status;
+------------------+----------+--------------+------------------+--------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+---------------------------------------------------+
| mysql-bin.000001 | 452 | | | |
+------------------+----------+--------------+------------------+----------------------------------------------------+
Copy after login

Execute on master1:

mysql> change master to master_host='192.168.31.141',master_port=3306,master_user='rep',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=452;
mysql> start slave;
Copy after login

Verify master-slave replication:
master1 host:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.141
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy after login

If Slave_IO_Running and Slave_SQL_Running are both yes, then the master-slave configuration is OK
4. Mysql-mmm configuration:
Create users on 4 mysql nodes
Create agent account:

mysql> grant super,replicationclient,process on *.* to 'mmm_agent'@'192.168.31.%' identified by '123456';
Copy after login


Create a monitoring account:

mysql> grant replication client on *.* to 'mmm_monitor'@'192.168.31.%' identified by '123456';
Copy after login


Note 1: Because the previous master-slave replication and the master-slave are already ok, I It will be ok if executed on the master1 server.
Check whether monitoring and proxy accounts exist on master2, slave1, and slave2 db

mysql> select user,host from mysql.user where user in ('mmm_monitor','mmm_agent');
+-------------+----------------------------+
| user | host |
+-------------+----------------------------+
| mmm_agent | 192.168.31.% |
| mmm_monitor | 192.168.31.% |
+-------------+------------------------------+
Copy after login

or

mysql> show grants for 'mmm_agent'@'192.168.31.%';
+-----------------------------------------------------------------------------------------------------------------------------+
| Grants for mmm_agent@192.168.31.% |
+-----------------------------------------------------------------------------------------------------------------------------+
| GRANT PROCESS, SUPER, REPLICATION CLIENT ON *.* TO 'mmm_agent'@'192.168.31.%' |
+-----------------------------------------------------------------------------------------------------------------------------+
mysql> show grants for 'mmm_monitor'@'192.168.31.%';
+-----------------------------------------------------------------------------------------------------------------------------+
| Grants for mmm_monitor@192.168.31.% |
+-----------------------------------------------------------------------------------------------------------------------------+
| GRANT REPLICATION CLIENT ON *.* TO 'mmm_monitor'@'192.168.31.%' |
Copy after login

Note 2:
mmm_monitor user: mmm monitoring is used to check the health of the mysql server process
mmm_agent user: mmm agent is used to change the read-only mode, replicated main server, etc.
5. Mysql-mmm installation
On the monitor host (192.168.31.106) Install the monitoring program on the database server

cd /tmp
wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz
tar -zxf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install
Copy after login
Copy after login

Install the agent on the database server (master1, master2, slave1, slave2)

cd /tmp
wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz
tar -zxf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install
Copy after login
Copy after login

6. Configure mmm

Writing configuration files, the five hosts must be consistent:
After completing the installation, all configuration files are placed under /etc/mysql-mmm/. The management server and database server must contain a common file mmm_common.conf, the content is as follows:
active_master_rolewriter#Active master role indicator, all db servers must enable the read_only parameter, and the writer server monitoring agent will automatically set read_only Property is closed.

<host default>
cluster_interfaceeno16777736#群集的网络接口
pid_path /var/run/mmm_agentd.pid#pid路径
bin_path /usr/lib/mysql-mmm/#可执行文件路径
replication_user rep#复制用户
replication_password 123456#复制用户密码
agent_usermmm_agent#代理用户
agent_password 123456#代理用户密码
</host>
<host master1>#master1的host名
ip 192.168.31.83#master1的ip
mode master#角色属性,master代表是主
peer master2#与master1对等的服务器的host名,也就是master2的服务器host名
</host>
<host master2>#和master的概念一样
ip 192.168.31.141
mode master
peer master1
</host>
<host slave1>#从库的host名,如果存在多个从库可以重复一样的配置
ip 192.168.31.250#从的ip
mode slave#slave的角色属性代表当前host是从
</host>
<host slave2>#和slave的概念一样
ip 192.168.31.225
mode slave
</host>
<role writer>#writer角色配置
Copy after login

hosts master1,master2#能进行写操作的服务器的host名,如果不想切换写操作这里可以只配置master,这样也可以避免因为网络延时而进行write的切换,但是一旦master出现故障那么当前的MMM就没有writer了只有对外的read操作。
ips 192.168.31.2#对外提供的写操作的虚拟IP
mode exclusive#exclusive代表只允许存在一个主,也就是只能提供一个写的IP

#read角色配置
hosts master2,slave1,slave2#对外提供读操作的服务器的host名,当然这里也可以把master加进来
ips 192.168.31.3, 192.168.31.4, 192.168.31.5#对外提供读操作的虚拟ip,这三个ip和host不是一一对应的,并且ips也hosts的数目也可以不相同,如果这样配置的话其中一个hosts会分配两个ip
mode balanced#balanced代表负载均衡

同时将这个文件拷贝到其它的服务器,配置不变
#for host in master1 master2 slave1 slave2 ; do scp /etc/mysql-mmm/mmm_common.conf $host:/etc/mysql-mmm/ ; done
代理文件配置
编辑 4台mysql节点机上的/etc/mysql-mmm/mmm_agent.conf
在数据库服务器上,还有一个mmm_agent.conf需要修改,其内容是:
includemmm_common.conf
this master1
注意:这个配置只配置db服务器,监控服务器不需要配置,this后面的host名改成当前服务器的主机名。
启动代理进程
在 /etc/init.d/mysql-mmm-agent的脚本文件的#!/bin/sh下面,加入如下内容
source /root/.bash_profile
添加成系统服务并设置为自启动

#chkconfig --add mysql-mmm-agent
#chkconfigmysql-mmm-agent on
#/etc/init.d/mysql-mmm-agent start
Copy after login

注:添加source /root/.bash_profile目的是为了mysql-mmm-agent服务能启机自启。
自动启动和手动启动的唯一区别,就是激活一个console 。那么说明在作为服务启动的时候,可能是由于缺少环境变量
服务启动失败,报错信息如下:

Daemon bin: &#39;/usr/sbin/mmm_agentd&#39;
Daemon pid: &#39;/var/run/mmm_agentd.pid&#39;
Starting MMM Agent daemon... Can&#39;t locate Proc/Daemon.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/sbin/mmm_agentd line 7.
BEGIN failed--compilation aborted at /usr/sbin/mmm_agentd line 7.
failed
Copy after login

解决方法:

# cpanProc::Daemon
# cpan Log::Log4perl
# /etc/init.d/mysql-mmm-agent start
Daemon bin: &#39;/usr/sbin/mmm_agentd&#39;
Daemon pid: &#39;/var/run/mmm_agentd.pid&#39;
Starting MMM Agent daemon... Ok
# netstat -antp | grep mmm_agentd
tcp 0 0 192.168.31.83:9989 0.0.0.0:* LISTEN 9693/mmm_agentd
配置防火墙
firewall-cmd --permanent --add-port=9989/tcp
firewall-cmd --reload
编辑 monitor主机上的/etc/mysql-mmm/mmm_mon.conf 
includemmm_common.conf
Copy after login
<monitor>
ip 127.0.0.1##为了安全性,设置只在本机监听,mmm_mond默认监听9988
pid_path /var/run/mmm_mond.pid
bin_path /usr/lib/mysql-mmm/
status_path/var/lib/misc/mmm_mond.status
ping_ips192.168.31.83,192.168.31.141,192.168.31.250,192.168.31.225#用于测试网络可用性 IP 地址列表,只要其中有一个地址 ping 通,就代表网络正常,这里不要写入本机地址
auto_set_online 0#设置自动online的时间,默认是超过60s就将它设置为online,默认是60s,这里将其设为0就是立即online
</monitor>
Copy after login
<check default>
check_period 5
trap_period 10
timeout 2
#restart_after 10000
max_backlog 86400
</check>
check_period
Copy after login

描述:检查周期默认为5s
默认值:5s
trap_period
描述:一个节点被检测不成功的时间持续trap_period秒,就慎重的认为这个节点失败了。
默认值:10s
timeout
描述:检查超时的时间
默认值:2s
restart_after
描述:在完成restart_after次检查后,重启checker进程
默认值:10000
max_backlog
描述:记录检查rep_backlog日志的最大次数
默认值:60

<host default>
monitor_usermmm_monitor#监控db服务器的用户
monitor_password 123456#监控db服务器的密码
</host>
debug 0#debug 0正常模式,1为debug模式
启动监控进程:
在 /etc/init.d/mysql-mmm-agent的脚本文件的#!/bin/sh下面,加入如下内容 
source /root/.bash_profile 
添加成系统服务并设置为自启动
#chkconfig --add mysql-mmm-monitor
#chkconfigmysql-mmm-monitor on
#/etc/init.d/mysql-mmm-monitor start
Copy after login

启动报错:

Starting MMM Monitor daemon: Can not locate Proc/Daemon.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/sbin/mmm_mond line 11.
BEGIN failed--compilation aborted at /usr/sbin/mmm_mond line 11.
failed
Copy after login

解决方法:安装下列perl的库

#cpanProc::Daemon
#cpan Log::Log4perl
[root@monitor1 ~]# /etc/init.d/mysql-mmm-monitor start
Daemon bin: &#39;/usr/sbin/mmm_mond&#39;
Daemon pid: &#39;/var/run/mmm_mond.pid&#39;
Starting MMM Monitor daemon: Ok
[root@monitor1 ~]# netstat -anpt | grep 9988
tcp 0 0 127.0.0.1:9988 0.0.0.0:* LISTEN 8546/mmm_mond
Copy after login

注1:无论是在db端还是在监控端如果有对配置文件进行修改操作都需要重启代理进程和监控进程。
注2:MMM启动顺序:先启动monitor,再启动 agent

检查集群状态:

[root@monitor1 ~]# mmm_control show
master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2)
master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5)
slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4)
slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
Copy after login
Copy after login

如果服务器状态不是ONLINE,可以用如下命令将服务器上线,例如:

#mmm_controlset_online主机名

例如:[root@monitor1 ~]#mmm_controlset_onlinemaster1
从上面的显示可以看到,写请求的VIP在master1上,所有从节点也都把master1当做主节点。
查看是否启用vip

[root@master1 ~]# ipaddr show dev eno16777736
eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000
link/ether 00:0c:29:6d:2f:82 brdff:ff:ff:ff:ff:ff
inet 192.168.31.83/24 brd 192.168.31.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.31.2/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe6d:2f82/64 scope link
valid_lft forever preferred_lft forever
[root@master2 ~]# ipaddr show dev eno16777736
eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000
link/ether 00:0c:29:75:1a:9c brdff:ff:ff:ff:ff:ff
inet 192.168.31.141/24 brd 192.168.31.255 scope global dynamic eno16777736
valid_lft 35850sec preferred_lft 35850sec
inet 192.168.31.5/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe75:1a9c/64 scope link
valid_lft forever preferred_lft forever
[root@slave1 ~]# ipaddr show dev eno16777736
eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000
link/ether 00:0c:29:02:21:19 brdff:ff:ff:ff:ff:ff
inet 192.168.31.250/24 brd 192.168.31.255 scope global dynamic eno16777736
valid_lft 35719sec preferred_lft 35719sec
inet 192.168.31.4/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe02:2119/64 scope link
valid_lft forever preferred_lft forever
[root@slave2 ~]# ipaddr show dev eno16777736
eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000
link/ether 00:0c:29:e2:c7:fa brdff:ff:ff:ff:ff:ff
inet 192.168.31.225/24 brd 192.168.31.255 scope global dynamic eno16777736
valid_lft 35930sec preferred_lft 35930sec
inet 192.168.31.3/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fee2:c7fa/64 scope link
valid_lft forever preferred_lft forever
在master2,slave1,slave2主机上查看主mysql的指向
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.83
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Copy after login

MMM高可用性测试:

服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。
首先查看整个集群的状态,可以看到整个集群状态正常

[root@monitor1 ~]# mmm_control show
master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2)
master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5)
slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4)
slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
Copy after login
Copy after login

模拟master1宕机,手动停止mysql服务,观察monitor日志,master1的日志如下:

[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2017/01/09 22:02:55 WARN Check &#39;rep_threads&#39; on &#39;master1&#39; is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can&#39;t connect to MySQL server on &#39;192.168.31.83&#39; (111)
2017/01/09 22:02:55 WARN Check &#39;rep_backlog&#39; on &#39;master1&#39; is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can&#39;t connect to MySQL server on &#39;192.168.31.83&#39; (111)
2017/01/09 22:03:05 ERROR Check &#39;mysql&#39; on &#39;master1&#39; has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can&#39;t connect to MySQL server on &#39;192.168.31.83&#39; (111)
2017/01/09 22:03:07 FATAL State of host &#39;master1&#39; changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2017/01/09 22:03:07 INFO Removing all roles from host &#39;master1&#39;:
2017/01/09 22:03:07 INFO Removed role &#39;writer(192.168.31.2)&#39; from host &#39;master1&#39;
2017/01/09 22:03:07 INFO Orphaned role &#39;writer(192.168.31.2)&#39; has been assigned to &#39;master2&#39;
Copy after login

查看群集的最新状态

[root@monitor1 ~]# mmm_control show
master1(192.168.31.83) master/HARD_OFFLINE. Roles:
master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5), writer(192.168.31.2)
slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4)
slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
Copy after login

从显示结果可以看出master1的状态有ONLINE转换为HARD_OFFLINE,写VIP转移到了master2主机上。
检查所有的db服务器群集状态

[root@monitor1 ~]# mmm_control checks all
master1 ping [last change: 2017/01/09 21:31:47] OK
master1 mysql [last change: 2017/01/09 22:03:07] ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can&#39;t connect to MySQL server on &#39;192.168.31.83&#39; (111)
master1 rep_threads [last change: 2017/01/09 21:31:47] OK
master1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null
slave1 ping [last change: 2017/01/09 21:31:47] OK
slave1mysql [last change: 2017/01/09 21:31:47] OK
slave1 rep_threads [last change: 2017/01/09 21:31:47] OK
slave1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null
master2 ping [last change: 2017/01/09 21:31:47] OK
master2 mysql [last change: 2017/01/09 21:57:32] OK
master2 rep_threads [last change: 2017/01/09 21:31:47] OK
master2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null
slave2 ping [last change: 2017/01/09 21:31:47] OK
slave2mysql [last change: 2017/01/09 21:31:47] OK
slave2 rep_threads [last change: 2017/01/09 21:31:47] OK
slave2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null
Copy after login

从上面可以看到master1能ping通,说明只是服务死掉了。

查看master2主机的ip地址:

[root@master2 ~]# ipaddr show dev eno16777736
eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000
link/ether 00:0c:29:75:1a:9c brdff:ff:ff:ff:ff:ff
inet 192.168.31.141/24 brd 192.168.31.255 scope global dynamic eno16777736
valid_lft 35519sec preferred_lft 35519sec
inet 192.168.31.5/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.31.2/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe75:1a9c/64 scope link
valid_lft forever preferred_lft forever
Copy after login

slave1主机:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.141
Master_User: rep
Master_Port: 3306
Copy after login
Copy after login

slave2主机:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.141
Master_User: rep
Master_Port: 3306
Copy after login
Copy after login

启动master1主机的mysql服务,观察monitor日志,master1的日志如下:

[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2017/01/09 22:16:56 INFO Check &#39;mysql&#39; on &#39;master1&#39; is ok!
2017/01/09 22:16:56 INFO Check &#39;rep_backlog&#39; on &#39;master1&#39; is ok!
2017/01/09 22:16:56 INFO Check &#39;rep_threads&#39; on &#39;master1&#39; is ok!
2017/01/09 22:16:59 FATAL State of host &#39;master1&#39; changed from HARD_OFFLINE to AWAITING_RECOVERY
Copy after login

从上面可以看到master1的状态由hard_offline改变为awaiting_recovery状态
用如下命令将服务器上线:

[root@monitor1 ~]#mmm_controlset_onlinemaster1
Copy after login


查看群集最新状态

[root@monitor1 ~]# mmm_control show
master1(192.168.31.83) master/ONLINE. Roles:
master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5), writer(192.168.31.2)
slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4)
slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
Copy after login

可以看到主库启动不会接管主,只到现有的主再次宕机。
总结
(1)master2备选主节点宕机不影响集群的状态,就是移除了master2备选节点的读状态。
(2)master1主节点宕机,由master2备选主节点接管写角色,slave1,slave2指向新master2主库进行复制,slave1,slave2会自动change master到master2.
(3)如果master1主库宕机,master2复制应用又落后于master1时就变成了主可写状态,这时的数据主无法保证一致性。
如果master2,slave1,slave2延迟于master1主,这个时master1宕机,slave1,slave2将会等待数据追上db1后,再重新指向新的主node2进行复制操作,这时的数据也无法保证同步的一致性。
(4)如果采用MMM高可用架构,主,主备选节点机器配置一样,而且开启半同步进一步提高安全性或采用MariaDB/mysql5.7进行多线程从复制,提高复制的性能。

附:

1、日志文件:
日志文件往往是分析错误的关键,所以要善于利用日志文件进行问题分析。
db端:/var/log/mysql-mmm/mmm_agentd.log
监控端:/var/log/mysql-mmm/mmm_mond.log
2、命令文件:
mmm_agentd:db代理进程的启动文件
mmm_mond:监控进程的启动文件
mmm_backup:备份文件
mmm_restore:还原文件
mmm_control:监控操作命令文件
db服务器端只有mmm_agentd程序,其它的都是在monitor服务器端。
3、mmm_control用法
mmm_control程序可以用于监控群集状态、切换writer、设置online\offline操作等。
Valid commands are:
help - show this message #帮助信息
ping - ping monitor #ping当前的群集是否正常
show - show status #群集在线状态检查
checks [|all [|all]] - show checks status#执行监控检查操作
set_online - set host online #将host设置为online
set_offline - set host offline #将host设置为offline
mode - print current mode. #打印输出当前的mode
set_active - switch into active mode.

set_manual - switch into manual mode.
set_passive - switch into passive mode.
move_role [--force] - move exclusive role to host #移除writer服务器为指定的host服务器(Only use --force if you know what you are doing!)
set_ip - set role with ip to host
检查所有的db服务器群集状态:
[root@monitor1 ~]# mmm_control checks all
检查项包括:ping、mysql是否正常运行、复制线程是否正常等
检查群集环境在线状况:
[root@monitor1 ~]# mmm_control show
对指定的host执行offline操作:
[root@monitor1 ~]# mmm_controlset_offline slave2
对指定的host执行onine操作:
[root@monitor1 ~]# mmm_controlset_online slave2
执行write切换(手动切换):
查看当前的slave对应的master
[root@slave2 ~]# mysql -uroot -p123456 -e 'show slave status\G;'
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.141
writer切换,要确保mmm_common.conf文件中的writer属性有配置对应的host,否则无法切换
[root@monitor1 ~]# mmm_controlmove_role writer master1
OK: Role 'writer' has been moved from 'master2' to 'master1'. Now you can wait some time and check new roles info!
[root@monitor1 ~]# mmm_control show
master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2)
master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5)
slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4)
slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
save从库自动切换到了新的master
[root@slave2 ~]# mysql -uroot -p123456 -e 'show slave status\G;'
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.31.83

4、其它处理问题

If you do not want the writer to switch from master to backup (including the master-slave delay that will also cause the switch of writing VIP), you can remove when configuring /etc/mysql-mmm/mmm_common.conf backup
#writer role configuration
hosts master1 #Only one Hosts is configured here
ips 192.168.31.2#Virtual IP for external write operations
mode exclusive #exclusive representative Only one master is allowed to exist, that is, only one write IP

In this case, when master1 fails, the writer write operation will not switch to the master2 server, and the slave will not point to the new server. master, at this time the current MMM has previously provided external write services.

5. Summary

1. The virtual IP that provides external reading and writing is controlled by the monitor program. If the monitor is not started, the db server will not be assigned a virtual IP. However, if the virtual IP has been assigned, when the monitor program closes the originally assigned virtual IP, the external program will not be closed immediately and the external program can still be connected and accessed (as long as the network is not restarted). The advantage of this is that the reliability requirements for the monitor will be lower. However, if one of the DB servers fails at this time, it will not be able to handle the switch. That is, the original virtual IP will remain unchanged, and the DB that failed will not be able to handle the switch. The virtual IP will become inaccessible.

2. The agent program is controlled by the monitor program to handle operations such as write switching and slave library switching. If the monitor process is closed, the agent process will not play any role, and it cannot handle faults by itself.

3. The monitor program is responsible for monitoring the status of the db server, including the Mysql database, whether the server is running, whether the replication thread is normal, master-slave delay, etc.; it is also used to control the agent program to handle failures.

4. The monitor will monitor the status of the db server every few seconds. If the db server has changed from fault to normal, the monitor will automatically set it to online status after 60s (the default is 60s) Set to other values), determined by the configuration file parameter "auto_set_online" of the monitoring end. There are three statuses of the cluster server: HARD_OFFLINE→AWAITING_RECOVERY→online
5. By default, the monitor will control mmm_agent and modify the writer db server read_only to OFF, other db servers read_only is changed to ON, so for the sake of rigor, read_only=1 can be added to the my.cnf file of all servers to be controlled by the monitor to control the writer and read. The root user and replication user are not affected by the read_only parameter.

The above is the detailed content of Detailed explanation of MySQL high availability solution MMM. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template