> 데이터 베이스 > MySQL 튜토리얼 > MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL

MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL

WBOY
풀어 주다: 2016-06-01 13:11:34
원래의
1190명이 탐색했습니다.

架构:

两个Master(主备模式),一个或多个Slave(也可以没有Slave,只有主备Master):

1、Monitor运行MMM Daemon程序,实现所有Mysql服务器的监控和故障切换工作;

2、Master1和Master2互为主备,同时只有一个主可用于写操作(也可同时分担读操作),另一个作为备用,可以分担读操作,读写分离需要应用程序实现;

3、Slave机器与当前active Master同步,如当前active Master故障后,Master将切换到passive Master,同时MMM修改Slave与新的Master同步;

4、Application通过write和read ip进行读写操作;

MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL<br>

<br>

环境:

主机名 服务器IP地址 Write IP Read IP 备注
mysql01 10.0.60.100 10.0.60.160 10.0.60.161 默认为active Master,运行mmm agent
mysql02 10.0.60.101   10.0.60.162 默认为passive  Master,运行mmm agent
mysql03 10.0.60.102   10.0.60.163 Slave,由MMM维护,运行mmm agent
mysql04 10.0.60.103     监控机,运行MMM  Daemon程序

<br>

软件信息:

Mysql:5.6.17-log MySQL Community Server (GPL)

MMM:mysql-mmm-2.2.1

OS:CentOS release 6.4 (Final),kernel 2.6.32-358.el6.x86_64

<br>

一、配置复制环境

这里是全新配置,如果是已经存在了单master和slave环境,将配置不一样,可以结合xtrabackup工具实现数据的备份和恢复,配置主备master环境。

前提要求:

1、所有mysql实例开启read_only=1;

2、主备master需要开启log_bin;

3、所有mysql实例配置不同的server_id以及不同的二进制日志、relay日志文件名;

<br>

参考配置参数:

mysql01的特殊配置参数:

mysql01>/! grep -E "log_bin|server_id|read_only" my.cnf<br>log_bin = mysql01-bin<br>server_id = 1<br><strong>read</strong>_<strong>only</strong><br>mysql01>

<br>

mysql02的特殊配置参数:

mysql02>/! grep -E "log_bin|server_id|read_only" my.cnf<br>log_bin = mysql02-bin<br>server_id = 2<br><strong>read</strong>_<strong>only</strong><br>mysql02>

<br>

mysql03的特殊配置参数:

mysql03>/! grep -E "log_bin|server_id|read_only" my.cnf<br>log_bin = mysql_bin<br>server_id = 3<br>read_only

<br>

配置主从:

1、配置mysql01和mysql02互为主从:

在mysql01和mysql02上创建同样的复制账号:

<strong>grant</strong> replication slave <strong>on</strong> *.* <strong>to</strong> 'repl'@'10.0.60.%' identified <strong>by</strong> 'repl';

<br>

查看master状态:

<strong>show</strong> master status;

MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL

<br>

MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL

<br>

在每个节点执行CHANGE MASTER TO语句:

mysql01> change master <strong>to</strong> master_host = '10.0.60.101', <br>master_user='repl', <br>master_password='repl', <br>master_log_file='mysql02-bin.000001', <br>master_log_pos=545;

<br>

mysql02> change master <strong>to</strong> master_host = '10.0.60.100', <br>master_user='repl', <br>master_password='repl', <br>master_log_file='mysql01-bin.000001', <br>master_log_pos=545;

<br>

在两个节点开启slave:

<strong>start</strong> slave;

<br>

查看slave状态是否正常:

mysql02><strong>show</strong> slave status/<strong>G</strong>;<br>             Slave_IO_Running: Yes<br>            Slave_SQL_Running: Yes

<br>

2、配置mysql03为mysql01的从服务器

mysql03> change master <strong>to</strong> master_host = '10.0.60.100', <br>master_user='repl', <br>master_password='repl', <br>master_log_file='mysql01-bin.000001', <br>master_log_pos=545;

<br>

验证slave状态正常后,开始下面的步骤。

<br>

二、配置半同步

使用半同步机制,可以确保至少一台slave收到master的二进制日志,在一定程度上保证了数据的一致性,减少了当master当机时,造成数据丢失。

半同步机制由google贡献,从mysql 5.5开始原生支持该特性。

<br>

前提要求:

1、主备master都要安装并开启semisync master和slave,因mmm不能进行semisync配置和管理;

2、slave需要安装并开启semisync slave;

<br>

配置步骤:

1、mysql01和mysql02安装semisync master和slave插件:

mysql01>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';<br>Query OK, 0 <strong>rows</strong> affected (0.01 sec)<br><br>mysql01>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)

<br>

mysql02>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';<br>Query OK, 0 <strong>rows</strong> affected (0.05 sec)<br><br>mysql02>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';<br>Query OK, 0 <strong>rows</strong> affected (0.01 sec)

<br>

2、mysql01和mysql02开启semisync master和slave:

mysql01><strong>SET</strong> <strong>GLOBAL</strong> rpl_semi_sync_master_enabled = 1;<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)<br><br>mysql01><strong>SET</strong> <strong>GLOBAL</strong> rpl_semi_sync_slave_enabled = 1;<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)

<br>

mysql02><strong>SET</strong> <strong>GLOBAL</strong> rpl_semi_sync_master_enabled = 1;<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)<br><br>mysql02><strong>SET</strong> <strong>GLOBAL</strong> rpl_semi_sync_slave_enabled = 1;<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)

<br>

同时将参数写入到配置文件,以mysql实例开启时自动开启半同步:

mysql02>/! cat my.cnf|grep semi<br>rpl_semi_sync_master_enabled = 1<br>rpl_semi_sync_slave_enabled = 1

<br>

3、mysql03安装并开启semisync slave插件:

mysql03>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';<br>Query OK, 0 <strong>rows</strong> affected (0.01 sec)<br><br>mysql03><strong>SET</strong> <strong>GLOBAL</strong> rpl_semi_sync_slave_enabled = 1;<br>Query OK, 0 <strong>rows</strong> affected (0.00 sec)

<br>

同时将参数写入到配置文件,以mysql实例开启时自动开启半同步:

mysql03>/! cat my.cnf|grep semi<br>rpl_semi_sync_slave_enabled = 1<br>mysql03>

<br>

4、所有mysql实例停止slave并开启slave,使半同步机制生效:

stop slave;<strong>start</strong> slave;

<br>

5、查看semisync状态

mysql01><strong>show</strong> status <strong>like</strong> '%emi%';<br>+--------------------------------------------+-------+<br>| Variable_name                              | <strong>Value</strong> |<br>+--------------------------------------------+-------+<br>| Rpl_semi_sync_master_clients               | 2     |<br>| Rpl_semi_sync_master_net_avg_wait_time     | 0     |<br>| Rpl_semi_sync_master_net_wait_time         | 0     |<br>| Rpl_semi_sync_master_net_waits             | 0     |<br>| Rpl_semi_sync_master_no_times              | 0     |<br>| Rpl_semi_sync_master_no_tx                 | 0     |<br>| Rpl_semi_sync_master_status                | <strong>ON</strong>    |<br>| Rpl_semi_sync_master_timefunc_failures     | 0     |<br>| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |<br>| Rpl_semi_sync_master_tx_wait_time          | 0     |<br>| Rpl_semi_sync_master_tx_waits              | 0     |<br>| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |<br>| Rpl_semi_sync_master_wait_sessions         | 0     |<br>| Rpl_semi_sync_master_yes_tx                | 0     |<br>| Rpl_semi_sync_slave_status                 | <strong>ON</strong>    |<br>+--------------------------------------------+-------+<br>15 <strong>rows</strong> <strong>in</strong> <strong>set</strong> (0.00 sec)

<br>

三、配置MMM

Multi Master Replication Manager for Mysql(MMM)是一套开源的perl脚本,对Mysql Master-Master复制环境(在任何时刻只有一个节点可写)进行监控、故障恢复以及管理。同时能根据复制的延时情况管理读负载均衡,通过迁移read虚拟IP地址。同时也能用于数据备份,以及节点之间重同步。

主要由三个脚本组成:

1、mmm_mod:监控daemon程序,进行监控工作,并决定读、写角色的迁移;最好运行在专用的监控服务器上,可以管理多套Master-Slave集群。

2、mmm_agentd:客户端daemon程序,运行在所有mysql实例服务器,与监控节点进行简单的远程通信。

3、mmm_control:用于管理mmm_mond进程的命令行脚本。

<br>

前提需求:

1、支持环境:

两个节点的Master-Master环境,MMM需要5个IP地址(每个节点一个固定IP地址,一个write IP地址,两个read IP地址,write和read IP依据节点的可用性进行自动的迁移),正常情况下,active master有一个write IP和一个read IP地址,standby master有一个read IP地址,如果当前active master故障,write和read IP地址将迁移到standby master;

两个节点的Master-Master,以及一个或多个slave的环境,同时也是大多数企业使用的方案(可以更好的扩展读,同时有冗余的Slave可用于备份等工作,防止阻塞正常的事务)。

2、n+1个主机:n个运行mysql实例的服务器,一个机器用于运行MMM监控daemon程序;

3、2*(n+1) IP地址:每个主机一个固定IP地址,同时每台mysql实例一个read IP地址以及一个write IP地址;

4、monitor数据库用户:需要REPLICATION CLIENT权限,用于MMM监控(mmm_mond);

5、agent数据库用户:需要SUPER、REPLICATION CLIENT、PROCESS权限,用于MMM 客户端(mmm_agentd),可以只针对本机IP进行授权;

6、relication数据库用户:需要REPLICATION SLAVE权限,用于mysql复制;

7、tools数据库用户:需要SUPER、REPLICATION CLIENT、RELOAD权限,用于MMM tools(如mmm_backup、mmm_clone、mmm_restore)

<br>

1、在mysql实例服务器安装依赖包和mmm

安装依赖包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-ARP

<br>

如果在标准软件仓库和EPEL软件仓库没有,需要单独下载,可以去以下网址下载:

http://rpm.pbone.net

http://search.cpan.org/

http://www.rpmfind.net/

<br>

安装mmm:

tar xvf mysql-mmm-2.2.1.tar.gz<br>cd mysql-mmm-2.2.1<br>make install

<br>

2、在mysql实例服务器配置mmm agent

mmm_agentd使用mmm_agent.conf配置文件:

# cat /etc/mysql-mmm/mmm_agent.conf   <br>include mmm_common.conf #包含这个公用配置文件<br>#Description: name of this host,可以不是主机名,每台mysql实例的host不同(如mysql01设置为db1,mysql02设置为db2,mysql03设置为db3)<br>this db1<br>#Description: Enable debug mode,设置1,打印日志到前台,按ctrl+c将结束进程<br>debug 0<br>#Description: Maximum number of retries when killing threads to prevent further<br>#writes during the removal of the active_master_role.<br>max_kill_retries 10

<br>

公用配置文件:mmm_common.conf,每个实例一样,并要拷贝到监控服务器供mmm_mond使用,进行网卡接口的定义,每个主机的描述,复制和mmm agent的用户名和密码配置,以及读写规则等

# cat /etc/mysql-mmm/mmm_common.conf <br>#Description: name of the role for which identifies the active master,定义活动master为可写<br>active_master_role                              writer<br><br><host default> #默认段<br>        #Description: network interface on which the IPs of the roles should be configured,用于绑定ip的网络接口<br>        cluster_interface                       eth0<br><br>        pid_path                                /var/run/mmm_agentd.pid<br>        bin_path                                /usr/lib/mysql-mmm/<br>        #Description: Port on which mmm agentd listens<br>        agent_port                              9989<br>        #Description: Port on which mysqld is listening<br>        mysql_port                              3306<br><br>        #Description: mysql user used for replication<br>        replication_user                        repl<br>        #Description: mysql password used for replication<br>        replication_password                    repl<br>        #Description: mysql user for MMM Agent<br>        agent_user                              mmm_agent<br>        #Description: mysql password for MMM Agent<br>        agent_password                          mmm_agent<br></host><br><br><host db1> #命名段,指定每个mysql实例主机<br>        #Description: IP of host<br>        ip                                      10.0.60.100<br>        #Description: Mode of host. Either master or slave.<br>        mode                                    master<br>        #Description: Name of peer host (if mode is master)<br>        peer                                    db2<br></host><br><br><host db2><br>        ip                                      10.0.60.101<br>        mode                                    master<br>        peer                                    db1<br></host><br><br><host db3><br>        ip                                      10.0.60.102<br>        mode                                    slave<br></host><br><br><br><role writer> #定义write角色<br>        #Description: Hosts which may take over the role<br>        hosts                                   db1, db2<br>        #Description: One or multiple IPs associated with the role,指定浮动write IP地址<br>        ips                                     10.0.60.160<br>        #Description: Mode of role. Either balanced or exclusive<br>        mode                                    exclusive<br>        #Description: The preferred host for this role. Only allowed for exclusive roles.<br>        #prefer                                 -<br></role><br><br><role reader>  #定义read角色<br>        hosts                                   db1, db2, db3<br>        ips                                     10.0.60.161,10.0.60.162,10.0.60.163 #浮动read IP地址<br>        mode                                    balanced<br></role>

<br>

3、启动mmm agent服务

/etc/init.d/mysql-mmm-agent start<br>chkconfig --level 2345 mysql-mmm-agent on

<br>

4、在监控服务器(mysql04)安装依赖包和mmm

安装依赖包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-Ping

<br>

安装mmm:

tar xvf mysql-mmm-2.2.1.tar.gz<br>cd mysql-mmm-2.2.1<br>make install

<br>

5、配置mmm 监控配置文件

mmm_mond和mmm_control使用mmm_mon.conf或mmm_mon_CLUSTER.conf配置文件

mmm_mon.conf配置文件参考:

# cat /etc/mysql-mmm/mmm_mon.conf <br>include mmm_common.conf<br><br>#The monitor section is required by mmm_mond and mmm_control<br><br>        #Description: IP on which mmm_mond listens<br>        ip                                              127.0.0.1<br>        #Description: Port on which mmm mond listens<br>        port                                    9988<br>        #Description: Location of pid-file<br>        pid_path                                /var/run/mmm_mond.pid<br>        #Description: Path to directory containing MMM binaries<br>        bin_path                                /usr/lib/mysql-mmm/<br>        #Description: Location of of status file<br>        status_path                             /var/lib/misc/mmm_mond.status<br>        #Description: Break between network checks<br>        ping_interval                           1<br>        #Description: IPs used for network checks,指定所有mysql服务器IP,write和read IP地址,用于进行ping检查<br>        ping_ips                                10.0.60.100, 10.0.60.101, 10.0.60.102, 10.0.60.160, 10.0.60.161, 10.0.60.162, 10.0.60.163<br>        #Description: Duration in seconds for flap detection. See flap_count<br>        flap_duration                           3600<br>        #Description: Maximum number of downtimes within flap_duration seconds after<br>        #which a host is considered to be flapping.<br>        flap_count                              3<br>        #Description: How many seconds to wait before switching node status from<br>        #AWAITING_RECOVERY to ONLINE. 0 = disabled.<br>        auto_set_online                         0<br>        #Description: Binary used to kill hosts if roles couldn’t be removed because the agent<br>        #was not reachable. You have to provide a custom binary for this which<br>        #takes the hostname as first argument and the state of check ping (1 -ok; 0 - not ok) as second argument.<br>        kill_host_bin                           /usr/lib/mysql-mmm/monitor/kill_host<br>        #Description: Startup carefully i.e. switch into passive mode when writer role is<br>        #configured on multiple hosts<br>        careful_startup                         0<br>        #Description: Default mode of monitor.<br>        mode                                    active<br>        #Description: How many seconds to wait for other master to become ONLINE before<br>        #switching from mode WAIT to mode ACTIVE. 0 = infinite.<br>        wait_for_other_master                   120<br><br><br><br>        #Description: mysql user for MMM Monitor<br>        monitor_user                    mmm_agent<br>        #Description: mysql password for MMM Monitor<br>        monitor_password                mmm_agent<br><br><br> #check段,mmm执行ping、mysql、rep_threads、rep_backlog四种检查,可以分别进行检查间隔等参数配置。<br>        #Description: Perform check every 5 seconds<br>        check_period                    5<br>        #Description: Check is considered as failed if it doesn’t succeed for at least<br>        #trap period seconds.<br>        trap_period                     10<br>        #Description: Check times out after timeout seconds<br>        timeout                         2<br>        #Description: Restart checker process after restart after checks<br>        restart_after                   10000<br>        #Description: Maximum backlog for check rep_backlog.<br>        max_backlog                     60<br><br><br>#设置为1,开启调试模式,打印日志到前台,ctrl+c将结束进程,对于调试有帮助<br>debug 0

<br>

6、开启mmm monitor监控

/etc/init.d/mysql-mmm-monitor start<br>chkconfig --level 2345 mysql-mmm-monitor on

<br>

7、使用mmm_control查看状态

# mmm_control show<br>  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163), writer(10.0.60.160)<br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

<br>

注:

当节点第一次开启,状态为等待恢复。

设置节点online:

# mmm_control set_online db1

<br>

MMM如何工作:

当故障发生时,mmm迅速的迁移故障节点的IP地址从一个节点到另一个节点,并使用Net::ARP Perl模块更新ARP表。

处理过程:

在故障active master节点:

1、mysql 设置为read_only(set global read_only=1),防止写事务;

2、中断活动连接;

3、移除写ip;

<br>

在新master节点:

1、运行在passive master的mmm进程被通知即将成为active write;

2、slave将尝试从master的二进制日志抓取任何剩余事务;

3、关闭read_only(set global read_only=0);

4、绑定write ip,并发生arp通告;

<br>

四、测试

1、测试mysql01 mysql实例故障

手动关闭mysql01服务器上的mysql实例,期望master将迁移到mysql02

停止mysql01的mysqld进程:也可以使用"killall -15 mysqld"结束mysqld进程

mysql01>/! sh stop.sh

<br>

查看mmm_mond的日志:总共经过10s时间完成迁移

# tail -f /var/log/mysql-mmm/mmm_mond.log <br>2014/05/27 14:19:13  WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111<br>2014/05/27 14:19:13  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111<br>2014/05/27 14:19:22 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111<br>2014/05/27 14:19:23 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)<br>2014/05/27 14:19:23  INFO Removing all roles from host 'db1':<br>2014/05/27 14:19:23  INFO     Removed role 'reader(10.0.60.163)' from host 'db1'<br>2014/05/27 14:19:23  INFO     Removed role 'writer(10.0.60.160)' from host 'db1'<br>2014/05/27 14:19:23  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' #可以看到写IP已经迁移到mysql02<br>2014/05/27 14:19:23  INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'

<br>

使用mmm_control命令查看状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/HARD_OFFLINE. Roles: #mysql01状态已经变为HARD_OFFLINE,意外着ping错误或mysql故障<br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

<br>

检查mysql02的read_only变量是否改变:

mysql02>show global variables like 'read_only'; #默认在passive master时,read_only为ON<br>+---------------+-------+<br>| Variable_name | Value |<br>+---------------+-------+<br>| read_only     | OFF   |<br>+---------------+-------+<br>1 row in set (0.00 sec)<br><br>mysql02>

<br>

检查mysql03是否已经将mysql02作为主:

mysql03>show slave status/G;<br>*************************** 1. row ***************************<br>               Slave_IO_State: Waiting for master to send event<br>                  Master_Host: 10.0.60.101 #已经从Mysql02同步<br>                  Master_User: repl<br>                  Master_Port: 3306<br>                Connect_Retry: 10<br>              Master_Log_File: mysql02-bin.000014<br>          Read_Master_Log_Pos: 120<br>               Relay_Log_File: mysql03-relay-bin.000002<br>                Relay_Log_Pos: 285<br>        Relay_Master_Log_File: mysql02-bin.000014<br>             Slave_IO_Running: Yes<br>            Slave_SQL_Running: Yes

<br>

当再次启动mysql01的mysql实例,db1的状态将由HARD_OFFLINE改变为AWAITING_RECOVERY:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: <br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

<br>

需要手动设置为online,mmm才会分配read ip给mysql01,并与mysql02同步:

[root@mysql04 ~]# mmm_control set_online db1<br>OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!

<br>

mysql01>show slave status/G;<br>*************************** 1. row ***************************<br>               Slave_IO_State: Waiting for master to send event<br>                  Master_Host: 10.0.60.101 #mysql01已经从Mysql02同步<br>                  Master_User: repl<br>                  Master_Port: 3306<br>                Connect_Retry: 10<br>              Master_Log_File: mysql02-bin.000015<br>          Read_Master_Log_Pos: 120<br>               Relay_Log_File: mysql01-relay.000027<br>                Relay_Log_Pos: 285<br>        Relay_Master_Log_File: mysql02-bin.000015<br>             Slave_IO_Running: Yes<br>            Slave_SQL_Running: Yes

<br>

查看mysql02的semisync状态:

mysql02>show status like 'Rpl_semi%';<br>+--------------------------------------------+-------+<br>| Variable_name                              | Value |<br>+--------------------------------------------+-------+<br>| Rpl_semi_sync_master_clients               | 2     |<br>| Rpl_semi_sync_master_net_avg_wait_time     | 1053  |<br>| Rpl_semi_sync_master_net_wait_time         | 2106  |<br>| Rpl_semi_sync_master_net_waits             | 2     |<br>| Rpl_semi_sync_master_no_times              | 0     |<br>| Rpl_semi_sync_master_no_tx                 | 0     |<br>| Rpl_semi_sync_master_status                | ON    |<br>| Rpl_semi_sync_master_timefunc_failures     | 0     |<br>| Rpl_semi_sync_master_tx_avg_wait_time      | 1015  |<br>| Rpl_semi_sync_master_tx_wait_time          | 1015  |<br>| Rpl_semi_sync_master_tx_waits              | 1     |<br>| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |<br>| Rpl_semi_sync_master_wait_sessions         | 0     |<br>| Rpl_semi_sync_master_yes_tx                | 1     |<br>| Rpl_semi_sync_slave_status                 | ON    |<br>+--------------------------------------------+-------+<br>15 rows in set (0.00 sec)<br>

<br>

2、模拟mysql02(Active Master服务器) kernel panic,期望进行迁移

执行上面的测试后,当前active master为mysql02,使用下面命令模拟kernel panic:

mysql02>/! <strong>echo</strong> "c" > /proc/sysrq-trigger 

<br>

查看mmm_mond日志:总共经过了20s的时间完成迁移

# tail -f /var/log/mysql-mmm/mmm_mond.log<br>2014/05/27 14:44:42  WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)<br>2014/05/27 14:44:42  WARN Check 'rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)<br>2014/05/27 14:44:46 FATAL Can't reach agent on host 'db2'<br>2014/05/27 14:44:49 ERROR Check 'ping' on 'db2' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.101 #ping检查错误<br>2014/05/27 14:44:55 ERROR Check 'mysql' on 'db2' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) #mysql检查错误,不能连接<br>2014/05/27 14:44:59  INFO Check 'ping' on 'db2' is ok!<br>2014/05/27 14:45:02 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) #改变mysql02的状态为HARD_OFFLINE<br>2014/05/27 14:45:02  INFO Removing all roles from host 'db2':  #移除mysql02的角色<br>2014/05/27 14:45:02  INFO     Removed role 'reader(10.0.60.161)' from host 'db2'<br>2014/05/27 14:45:02  INFO     Removed role 'writer(10.0.60.160)' from host 'db2'<br>2014/05/27 14:45:02 FATAL Agent on host 'db2' is reachable again<br>2014/05/27 14:45:02  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db1' #分配角色到其他机器,write IP分配到mysql01,永远不会分配到mysql03<br>2014/05/27 14:45:02  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

<br>

使用mmm_control查看状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)<br>  db2(10.0.60.101) master/HARD_OFFLINE. Roles: <br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

<br>

3、模拟active master服务器网络不通,期望进行迁移,但是网络恢复后,将不会重启slave;

当前active master为mysql01,在mysql01上禁用网卡:

mysql01>/! cat down_net.sh<br>ifdown eth0<br>sleep 600<br>ifup eth0<br>mysql01>/! sh down_net.sh

<br>

查看mmm_mond日志:总共经过了9s完成迁移

2014/05/27 15:02:35  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)<br>2014/05/27 15:02:35  WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)<br>2014/05/27 15:02:41 FATAL Can't reach agent on host 'db1'<br>2014/05/27 15:02:41 ERROR Check 'ping' on 'db1' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.100  #ping检查错误<br>2014/05/27 15:02:44 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: not OK, mysql: OK) #改变状态<br>2014/05/27 15:02:44  INFO Removing all roles from host 'db1': #移除角色<br>2014/05/27 15:02:44  INFO     Removed role 'reader(10.0.60.163)' from host 'db1'  <br>2014/05/27 15:02:44  INFO     Removed role 'writer(10.0.60.160)' from host 'db1'<br>2014/05/27 15:02:44 ERROR Can't send offline status notification to 'db1' - killing it!<br>2014/05/27 15:02:44 FATAL Could not kill host 'db1' - there may be some duplicate ips now! (There's no binary configured for killing hosts.)<br>2014/05/27 15:02:44  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2'<br>2014/05/27 15:02:44  INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'<br>2014/05/27 15:02:48 ERROR Check 'mysql' on 'db1' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)

<br>

使用mmm_control查看状态:

[root@mysql04 ~]# mmm_control show<br># Warning: agent on host db1 is not reachable<br>  db1(10.0.60.100) master/HARD_OFFLINE. Roles: <br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

<br>

当网络恢复后,mmm会修改mysql01的slave配置,修改主为mysql02,但是没有重启slave,造成不能进行数据同步,需要手工重新开启slave。

使用mmm_control检查状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: <br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)<br><br>[root@mysql04 ~]# mmm_control set_online db1  #网络恢复后,手动设置为online<br>OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!<br>[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)<br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

<br>

检查mysql01的slave状态:看上去正常的

mysql01>show slave status/G;<br>*************************** 1. row ***************************<br>               Slave_IO_State: Waiting for master to send event<br>                  Master_Host: 10.0.60.101<br>                  Master_User: repl<br>                  Master_Port: 3306<br>                Connect_Retry: 10<br>              Master_Log_File: mysql02-bin.000015<br>          Read_Master_Log_Pos: 328<br>               Relay_Log_File: mysql01-relay.000027<br>                Relay_Log_Pos: 493<br>        Relay_Master_Log_File: mysql02-bin.000015<br>             Slave_IO_Running: Yes<br>            Slave_SQL_Running: Yes

<br>

但是在mysql02(当前active master)插入数据,mysql01不能从mysql02同步:

mysql02> insert into t1 values(2);<br>Query OK, 1 row affected (0.02 sec)<br>mysql02>select * from t1;<br>+----+<br>| id |<br>+----+<br>|  1 |<br>|  2 |<br>+----+<br>2 rows in set (0.00 sec)

<br>

mysql03已经同步了数据:

mysql03>select * from t1;<br>+----+<br>| id |<br>+----+<br>|  1 |<br>|  2 |<br>+----+<br>2 rows in set (0.00 sec)

<br>

而mysql01没有同步数据:

mysql01>select * from t1;<br>+----+<br>| id |<br>+----+<br>|  1 |<br>+----+<br>1 row in set (0.00 sec)

<br>

解决方法:先停止slave,然后启动slave;

mysql01>stop slave;<strong>start</strong> slave; <br>Query OK, 0 rows affected (0.00 sec)<br><br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql01>select * from t1;<br>+----+<br>| id |<br>+----+<br>|  1 |<br>|  2 |<br>+----+<br>2 rows in set (0.00 sec)

<br>

截图:

MMM结合Semisync机制实现Mysql Master-Master高可用_MySQL

<br>

4、模拟slave线程故障,不管是io或sql线程故障,期望进行迁移,恢复时如果在flap_duration时间内超过了flap_count次数的故障,将不会自动恢复,状态由REPLICATION_FAIL改为 AWAITING_RECOVERY (because it's flapping)

当前active master为mysql02。

停止active master(mysql02)的slave,不会造成迁移:

mmm_mond的日志:已经检测到db2(mysql02)复制线程错误

2014/05/27 15:39:02 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken

<br>

使用mmm_control查看状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.161)<br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.163)

<br>

如果当其他slave(mysql01、mysql03)的线程(不管是io还是sql线程)故障将会发生迁移:

手工停止io线程:

mysql01>stop slave io_thread;<br>Query OK, 0 rows affected (0.01 sec)

<br>

查看mmm_mond日志:

2014/05/27 15:43:28 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken<br>2014/05/27 15:43:31 FATAL State of host 'db1' changed from ONLINE to REPLICATION_FAIL<br>2014/05/27 15:43:31  INFO Removing all roles from host 'db1':<br>2014/05/27 15:43:31  INFO     Removed role 'reader(10.0.60.161)' from host 'db1'  #移除角色<br>2014/05/27 15:43:31  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

<br>

使用mmm_control查看状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/REPLICATION_FAIL. Roles: <br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161), reader(10.0.60.163)

<br>

当重新开启io线程后,mmm将自动恢复db1,并重新迁移read IP到db1(mysql01)上,如果故障超过:

重新开启线程:

mysql01>start slave io_thread;  <br>Query OK, 0 rows affected (0.00 sec)

<br>

查看mmm_mond日志:

2014/05/27 15:45:23  INFO Check 'rep_threads' on 'db1' is ok!<br>2014/05/27 15:45:25 FATAL State of host 'db1' changed from REPLICATION_FAIL to ONLINE<br>2014/05/27 15:45:25  INFO Moving role 'reader(10.0.60.163)' from host 'db3' to host 'db1'

<br>

使用mmm_control查看状态:

[root@mysql04 ~]# mmm_control show<br>  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)<br>  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)<br>  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161)

<br>

5、复制延时

延时检查有max_backlog控制,默认为60;

复制延时或错误,如果故障时间少于60s,状态为ONLINE,单会迁移,故障恢复后,mmm自动恢复read IP。如果rep_backlog和rel_threads同时错误,状态将为REPLICATION_FAIL。

<br>

6、mmm agent或monitor故障

不会迁移角色,如果此时有master或slave故障,也将不能迁移角色

<br>

参考:

MMM官网:http://mysql-mmm.org/

MMM博客:http://blog.mysql-mmm.org/

<br>

<br>

<br><br>

来自为知笔记(Wiz)

<br><br>

원천:php.cn
본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.
인기 튜토리얼
더>
최신 다운로드
더>
웹 효과
웹사이트 소스 코드
웹사이트 자료
프론트엔드 템플릿