对于一个网站或一个企业最重要的无疑就是数据,那么数据库的数据安全无疑就更加重要,所以我们必须保证数据库的数据完整,这里就介绍使用heartbeat来实现MySQL双机高可用. 当我们的MySQL数据库故障或MySQL数据库服务器出现故障的时候我们希望有一个备用能自动代
对于一个网站或一个企业最重要的无疑就是数据,那么数据库的数据安全无疑就更加重要,所以我们必须保证数据库的数据完整,这里就介绍使用heartbeat来实现MySQL双机高可用.
当我们的MySQL数据库故障或MySQL数据库服务器出现故障的时候我们希望有一个备用能自动代替主MySQL数据来完成当前的任务,当主MySQL服务器恢复故障的时候备用的能切换到备用等待下一次故障出现.这里我们就结合故障检测HA来实现.
HA会定时发送心跳包检测主备服务器的健康状态,当主服务器出现故障时会自动将vip切换到备用服务器,由备用服务器执行主服务器的任务,MySQL要实现这样的功能就必须保证主备服务器的数据一致.这就要用到MySQL主从双机. 本文使用环境: 系统:CentOS 5.5 32位 主MySQL: ip 192.168.3.101/24 主机名:master.org 备用MySQL:192.168.3.102/24???主机名:slave.org vip:192.168.3.103/24 MySQL:mysql-5.0.95.tar.gz heartbeat:Heartbeat-3-0-7e3a82377fa8.tar.bz2
yum -y install ncurses-devel openssl-devel wget http://dev.mysql.com/get/Downloads/MySQL-5.0/mysql-5.0.95.tar.gz/from/http://mysql.cdpa.nsysu.edu.tw/ useradd -M -s /sbin/nologin mysql tar -zxvf mysql-5.0.95.tar.gz cd mysql-5.0.95 ./configure --prefix=/usr/local/mysql \ --without-debug \ --with-extra-charsets=utf8,gbk \ --enable-assembler \ --with-mysqld-ldflags=-all-static \ --with-client-ldflags=-all-static \ --with-unix-socket-path=/tmp/mysql.sock \ --with-ssl make && make install cp support-files/my-medium.cnf /etc/my.cnf # 创建配置文件 cp support-files/mysql.server /etc/init.d/mysqld # 创建启动脚本 chmod +x /etc/init.d/mysqld echo '/usr/local/mysql/lib/mysql/' >> /etc/ld.so.conf ldconfig /usr/local/mysql/bin/mysql_install_db --user=mysql # 初始化数据库 chown -R root.mysql /usr/local/mysql/ chown -R mysql.mysql /usr/local/mysql/var/ ln -s /usr/local/mysql/bin/* /usr/local/bin/ # 为二进制文件做一个软链接
配置MySQl主从实现数据同步,在主从服务器上修改my.cnf(这里是新安装的数据库,如果是仅仅加从库,需要把主库的数据备份导入到从库,这里不再讲述)
vi /etc/my.cnf # [mysqld]里修改: log_bin = /var/log/mysql/mysql-bin.log # 启动二进制文件 server-id = 1921683101 # 设置服务器id
启动主库:
service mysqld start
在主库上创建一个用户授权给从库,用户为backup密码为backup:
mysql> grant replication slave on *.* to 'backup'@'192.168.3.102' identified by 'backup'; Query OK, 0 rows affected (0.16 sec)
查看主库状态:
mysql> show master status; +------------------+-----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+-----------+--------------+------------------+ | mysql-bin.000003 | 236 | | | +------------------+-----------+--------------+------------------+ 1 row in set (0.00 sec)
修改从库配置文件:
server-id = 1921683102 # server id必须保持唯一 log_bin = /var/log/mysql/mysql-bin.log # 启用二进制日志 master-host = 192.168.3.101 # 主库ip master-user = backup # 账号 master-pass = backup # 密码 master-port = 3306 # 连接主库的端口 master-connect-retry=60 # 连接失败后进行重试等待的描述
启动从库,并查看状态:
service mysqld start
在从库上执行下操作,指定主库的二进制文件名和偏移量(刚才在主库show master status;查看的参数):
mysql> show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.3.101 Master_User: backup Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 236 Relay_Log_File: cfhost-relay-bin.000002 Relay_Log_Pos: 235 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 236 Relay_Log_Space: 235 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 1 row in set (0.00 sec) ERROR: No query specified
如果show slave status \G;Slave_SQL_Running: No,则执在从库上执行下面命令(两个参数值通过在主库执行show master status; 命令查看获得):
mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> change master to master_log_file='mysql-bin.000003',master_log_pos=236; Query OK, 0 rows affected (0.01 sec)
在主库上创建一个数据库看看是否同步.
yum -y install pkgconfig glib2-devel python-devel pam-devel gnutls-devel swig
安装libnet
wget http://download.fedora.redhat.com/pub/epel/5/i386/libnet-1.1.5-1.el5.i386.rpm rpm -ivh libnet-1.1.5-1.el5.i386.rpm wget http://download.fedora.redhat.com/pub/epel/5/i386/libnet-devel-1.1.5-1.el5.i386.rpm rpm -ivh libnet-devel-1.1.5-1.el5.i386.rpm
useradd -M -s /sbin/nologin hacluster useradd -M -s /sbin/nologin haclient wget http://www.ultramonkey.org/download/heartbeat/2.0.8/heartbeat-2.0.8.tar.gz tar -zxvf heartbeat-2.0.8.tar.gz cd heartbeat-2.0.8 ./configure --sysconfdir=/etc make && make install
创建配置文件: 安装后要配置三个文件(如没有可手动建立):ha.cf、haresources、authkeys。这三个配置文件需要在/etc/ha.d目录下面,但是默认是没有这三个文件的,可以到官网上下这三个文件,也可以在源码包里找这三个文件,在源码目录下的DOC子目录里。
cat /usr/local/share/doc/heartbeat-2.0.8/ha.cf | egrep -v '^#\W' | grep -v '^#$' >> /etc/ha.d/ha.cf cat /usr/local/share/doc/heartbeat-2.0.8/haresources? | egrep -v '^#\W' | grep -v '^#$' >> /etc/ha.d/haresources cat /usr/local/share/doc/heartbeat-2.0.8/authkeys | egrep -v '^#\W' | grep '^#$' -v > /etc/ha.d/authkeys
编辑配置文件:
编辑ha.cf,该文件中包括为Heartbeat使用何种介质通路和如何配置他们的信息.
vi /etc/ha.d/ha.cf debugfile /var/log/ha-debug # 用于记录heartbeat的调试信息 logfile /var/log/ha-log # 用于记录heartbeat的日志信息 logfacility local0 keepalive 2 # 设置心跳间隔 watchdog /dev/watchdog deadtime 30 # 在30秒后宣布节点死亡 warntime 10 # 在日志中发出“late heartbeat“警告之前等待的时间,单位为秒 initdead 120 # 网络启动时间 udpport 694 # 广播/单播通讯使用的udp端口 #baud 19200 #serial /dev/ttyS0 # 使用串口heartbeat bcast eth0 # 使用网卡heartbeat,并在eth0接口上使用广播heartbeat auto_failback on # 当主节点从故障中恢复时,将自动切换到主节点 watchdog /dev/watchdog # 该指令是用于设置看门狗定时器,如果节点一分钟内都没有心跳,那么节点将重新启动 node master.org # 集群中机器的主机名,与“uname –n”的输出相同。 node slave.org ping 192.168.3.254 # ping网关来检测链路正常 respawn hacluster /usr/local/lib/heartbeat/ipfail # respawn调用/usr/lib/heartbeat/ipfail来主动进行切换 apiauth ipfail gid=haclient uid=hacluster # 设置启动ipfail的用户和组
配置haresources ,该文件列出所有节点所提供的服务以及服务的默认所有者.所有节点上的该文件必须相同
vi /etc/ha.d/haresources master.org IPaddr::192.168.3.103 mysql # vip
注意:!!
haresources最后一个字段是某个服务的心跳,如果mysql,如果主从库使用的是同一台盘阵或者一个分布式文件系统,这里一定要填写真实的启动脚本(/etc/init.d下),如果是主从同步的话请务必不填写真正的启动脚本,因为主库心跳存活的话heartbeat会自动停止从库的mysql,这样就无法同步,主库发生故障时转移故障就没有意义.
配置authkeys,?authkeys决定了您的认证密钥。共有三种认证方式:crc,md5,和sha1果您的Heartbeat运行于 安全 网络之上,如本例中的交叉线,可以使用crc,从资源的角度来看,这是代价最低的方法。如果网络并不 安全 ,但您也希望降低CPU使用,则使用md5。最后,如果您想得到最好的认证,而不考虑CPU使用情况,则使用sha1,它在三者之中最难破解。
vi /etc/ha.d/authkeys auth 1 1 crc chmod 600 /etc/ha.d/authkeys
不论您在关键字auth后面指定的是什么索引值,在后面必须要作为键值再次出现。如果您指定“auth 4”,则在后面一定要有一行的内容为“4 ”。 配置从库:
scp root@192.168.3.101:/etc/ha.d/ha.cf /etc/ha.d/ scp root@192.168.3.101:/etc/ha.d/authkeys /etc/ha.d/ scp root@192.168.3.101:/etc/ha.d/haresources /etc/ha.d/ vi /etc/ha.d/ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility???? local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 bcast?? eth0?????????? auto_failback on node??? master.org node??? slave.org ping 192.168.3.254 respawn hacluser /usr/local/lib/heartbeat/ipfail # respawn调用/usr/lib/heartbeat/ipfail来主动进行切换 apiauth ipfail gid=haclient uid=hacluster
启动主库heartbeat:
server heartbeat start
查看日志:
cat /var/log/ha-log heartbeat[32239]: 2012/02/19_13:45:29 info: Link 192.168.3.254:192.168.3.254 up. heartbeat[32239]: 2012/02/19_13:45:29 info: Status update for node 192.168.3.254: status ping heartbeat[32239]: 2012/02/19_13:45:29 info: Link master.org:eth0 up. heartbeat[32239]: 2012/02/19_13:45:41 WARN: node slave.org: is dead heartbeat[32239]: 2012/02/19_13:45:41 info: Comm_now_up(): updating status to active heartbeat[32239]: 2012/02/19_13:45:41 info: Local status now set to: 'active' heartbeat[32239]: 2012/02/19_13:45:41 info: Starting child client "/usr/local/lib/heartbeat/ipfail" (503,503) heartbeat[32239]: 2012/02/19_13:45:41 WARN: No STONITH device configured. heartbeat[32239]: 2012/02/19_13:45:41 WARN: Shared disks are not protected. heartbeat[32239]: 2012/02/19_13:45:41 info: Resources being acquired from slave.org. heartbeat[32247]: 2012/02/19_13:45:41 info: Starting "/usr/local/lib/heartbeat/ipfail" as uid 503 gid 503 (pid 32247) harc[32248]: 2012/02/19_13:45:42 info: Running /etc/ha.d/rc.d/status status mach_down[32275]: 2012/02/19_13:45:42 info: /usr/local/lib/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[32275]: 2012/02/19_13:45:42 info: mach_down takeover complete for node slave.org. heartbeat[32239]: 2012/02/19_13:45:42 info: mach_down takeover complete. heartbeat[32239]: 2012/02/19_13:45:42 info: Initial resource acquisition complete (mach_down) IPaddr[32300]: 2012/02/19_13:45:42 INFO: Resource is stopped heartbeat[32249]: 2012/02/19_13:45:42 info: Local Resource acquisition completed. harc[32338]: 2012/02/19_13:45:42 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[32338]: 2012/02/19_13:45:42 received ip-request-resp IPaddr::192.168.3.103 OK yes ResourceManager[32353]: 2012/02/19_13:45:42 info: Acquiring resource group: master.org IPaddr::192.168.3.103 mysqld IPaddr[32377]: 2012/02/19_13:45:42 INFO: Resource is stopped ResourceManager[32353]: 2012/02/19_13:45:42 info: Running /etc/ha.d/resource.d/IPaddr 192.168.3.103 start IPaddr[32429]: 2012/02/19_13:45:42 INFO: Using calculated nic for 192.168.3.103: eth0 IPaddr[32429]: 2012/02/19_13:45:42 DEBUG: Using calculated netmask for 192.168.3.103: 255.255.255.0 IPaddr[32429]: 2012/02/19_13:45:42 DEBUG: Using calculated broadcast for 192.168.3.103: 192.168.3.255 IPaddr[32429]: 2012/02/19_13:45:42 INFO: eval /sbin/ifconfig eth0:0 192.168.3.103 netmask 255.255.255.0 broadcast 192.168.3.255 IPaddr[32429]: 2012/02/19_13:45:43 DEBUG: Sending Gratuitous Arp for 192.168.3.103 on eth0:0 [eth0] IPaddr[32420]: 2012/02/19_13:45:43 INFO: Success ResourceManager[32353]: 2012/02/19_13:45:43 info: Running /etc/init.d/mysqld start heartbeat[32239]: 2012/02/19_13:45:56 info: Local Resource acquisition completed. (none) heartbeat[32239]: 2012/02/19_13:45:56 info: local resource transition completed.
从日志中看出来slave.org没起来是死亡的,并添加192.168.3.103vip
启动从库heartbeat
server heartbeat start
启动之后查看日志信息
Feb 19 13:50:22 slave heartbeat: [29159]: info: Local status now set to: 'up' Feb 19 13:50:23 slave heartbeat: [29159]: info: Link master.org:eth0 up. Feb 19 13:50:23 slave heartbeat: [29159]: info: Status update for node master.org: status active Feb 19 13:50:23 slave heartbeat: [29159]: info: Link 192.168.3.254:192.168.3.254 up. Feb 19 13:50:23 slave heartbeat: [29159]: info: Status update for node 192.168.3.254: status ping Feb 19 13:50:23 slave heartbeat: [29159]: info: Link slave.org:eth0 up. Feb 19 13:50:23 slave harc[29171]: info: Running /etc/ha.d/rc.d/status status Feb 19 13:50:24 slave heartbeat: [29159]: info: Comm_now_up(): updating status to active Feb 19 13:50:24 slave heartbeat: [29159]: info: Local status now set to: 'active' Feb 19 13:50:24 slave heartbeat: [29159]: info: Starting child client "/usr/local/lib/heartbeat/ipfail" (501,501) Feb 19 13:50:24 slave heartbeat: [29159]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 140 ms (> 50 ms) (GSource: 0x9b98448) Feb 19 13:50:24 slave heartbeat: [29182]: info: Starting "/usr/local/lib/heartbeat/ipfail" as uid 501 gid 501 (pid 29182) Feb 19 13:50:24 slave heartbeat: [29159]: info: remote resource transition completed. Feb 19 13:50:24 slave heartbeat: [29159]: info: remote resource transition completed. Feb 19 13:50:24 slave heartbeat: [29159]: info: Local Resource acquisition completed. (none) Feb 19 13:50:25 slave heartbeat: [29159]: info: master.org wants to go standby [foreign] Feb 19 13:50:26 slave heartbeat: [29159]: info: standby: acquire [foreign] resources from master.org Feb 19 13:50:26 slave heartbeat: [29183]: info: acquire local HA resources (standby). Feb 19 13:50:26 slave heartbeat: [29183]: info: local HA resource acquisition completed (standby). Feb 19 13:50:26 slave heartbeat: [29159]: info: Standby resource acquisition done [foreign]. Feb 19 13:50:26 slave heartbeat: [29159]: info: Initial resource acquisition complete (auto_failback) Feb 19 13:50:27 slave heartbeat: [29159]: info: remote resource transition completed. Feb 19 13:50:36 slave ipfail: [29182]: info: Ping node count is balanced. Feb 19 13:50:37 slave ipfail: [29182]: info: Giving up foreign resources (auto_failback). Feb 19 13:50:37 slave ipfail: [29182]: info: Delayed giveup in 4 seconds. Feb 19 13:50:42 slave ipfail: [29182]: info: giveup() called (timeout worked) Feb 19 13:50:42 slave heartbeat: [29159]: info: slave.org wants to go standby [foreign] Feb 19 13:50:43 slave heartbeat: [29159]: info: standby: master.org can take our foreign resources Feb 19 13:50:43 slave heartbeat: [29194]: info: give up foreign HA resources (standby). Feb 19 13:50:43 slave ResourceManager[29204]: info: Releasing resource group: master.org IPaddr::192.168.3.103 mysqld Feb 19 13:50:43 slave ResourceManager[29204]: info: Running /etc/init.d/mysqld stop Feb 19 13:50:45 slave ResourceManager[29204]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.3.103 stop Feb 19 13:50:45 slave IPaddr[29279]: INFO: Success Feb 19 13:50:45 slave heartbeat: [29194]: info: foreign HA resource release completed (standby). Feb 19 13:50:45 slave heartbeat: [29159]: info: Local standby process completed [foreign]. Feb 19 13:50:46 slave heartbeat: [29159]: WARN: 1 lost packet(s) for [master.org] [162:164] Feb 19 13:50:46 slave heartbeat: [29159]: info: remote resource transition completed. Feb 19 13:50:46 slave heartbeat: [29159]: info: No pkts missing from master.org! Feb 19 13:50:46 slave heartbeat: [29159]: info: Other node completed standby takeover of foreign resources.
现在尝试停止主库的MySQL服务
pkill mysqld
查看日志并无变化,所以得出结论heartbeat只检测心跳也就是只检测设备是否宕机,不会检测MySQL服务,所以我们同样要有一个脚本来检测MySQL服务,如果mysql服务宕掉,则尝试启动服务,若启动服务失败则kill掉heartbeat进程实现故障转移(和上一遍nginx+keepalived原理一致),脚本内容如下:
#!/bin/bash # filename:mysqlsc.sh ps aux | grep mysqld | grep -v grep 2> /dev/null 1>&2 # 过滤mysql进程 if [[ $? -eq 0 ]] # 如果过滤有mysql进程会返回0则认为mysql存活 then sleep 5 # 使脚本进入休眠 else # 如果nginx没有存活尝试启动mysql,如果失败则杀死heartbeat的进程 /etc/init.d/mysqld start ps aux | grep mysqld | grep -v grep 2> /dev/null 1>&2 if [[ $? -eq 0 ]] then pkill heartbeat fi fi
给这个脚本执行权限然后后台运行:
chmod +x mysqlsc.sh nohup sh mysqlsc.sh & # 后台运行
下面来尝试停止主库的heartbeat:
service heartbeat stop
查看从库日志:
heartbeat[29159]: 2012/02/19_14:03:05 info: Received shutdown notice from 'master.org'. heartbeat[29159]: 2012/02/19_14:03:05 info: Resources being acquired from master.org. heartbeat[29308]: 2012/02/19_14:03:05 info: acquire local HA resources (standby). heartbeat[29308]: 2012/02/19_14:03:05 info: local HA resource acquisition completed (standby). heartbeat[29159]: 2012/02/19_14:03:05 info: Standby resource acquisition done [foreign]. heartbeat[29309]: 2012/02/19_14:03:05 info: No local resources [/usr/local/lib/heartbeat/ResourceManager listkeys slave.org] to acquire. harc[29328]: 2012/02/19_14:03:05 info: Running /etc/ha.d/rc.d/status status mach_down[29338]: 2012/02/19_14:03:05 info: Taking over resource group IPaddr::192.168.3.103 ResourceManager[29358]: 2012/02/19_14:03:05 info: Acquiring resource group: master.org IPaddr::192.168.3.103 mysqld IPaddr[29382]: 2012/02/19_14:03:05 INFO: Resource is stopped ResourceManager[29358]: 2012/02/19_14:03:06 info: Running /etc/ha.d/resource.d/IPaddr 192.168.3.103 start IPaddr[29434]: 2012/02/19_14:03:06 INFO: Using calculated nic for 192.168.3.103: eth0 IPaddr[29434]: 2012/02/19_14:03:06 DEBUG: Using calculated netmask for 192.168.3.103: 255.255.255.0 IPaddr[29434]: 2012/02/19_14:03:06 DEBUG: Using calculated broadcast for 192.168.3.103: 192.168.3.255 IPaddr[29434]: 2012/02/19_14:03:06 INFO: eval /sbin/ifconfig eth0:0 192.168.3.103 netmask 255.255.255.0 broadcast 192.168.3.255 IPaddr[29434]: 2012/02/19_14:03:06 DEBUG: Sending Gratuitous Arp for 192.168.3.103 on eth0:0 [eth0] IPaddr[29425]: 2012/02/19_14:03:06 INFO: Success ResourceManager[29358]: 2012/02/19_14:03:06 info: Running /etc/init.d/mysqld start mach_down[29338]: 2012/02/19_14:03:07 info: /usr/local/lib/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[29338]: 2012/02/19_14:03:07 info: mach_down takeover complete for node master.org. heartbeat[29159]: 2012/02/19_14:03:07 info: mach_down takeover complete. heartbeat[29159]: 2012/02/19_14:03:17 WARN: node master.org: is dead heartbeat[29159]: 2012/02/19_14:03:17 info: Dead node master.org gave up resources. heartbeat[29159]: 2012/02/19_14:03:17 info: Link master.org:eth0 dead.
原文地址:heartbeat实现MySQL双机高可用, 感谢原作者分享。