아래 편집기에서 온라인 기사를 가져올 수 있습니다MYSQL 동기화 오류 보고 실패 요약 처리 방법(필독). 편집자가 꽤 좋다고 생각해서 지금 공유하고 참고하겠습니다. >
Failover 후 흔히 발생하는 문제는 데이터베이스 용량이 매우 작은 경우에는 덤핑한 후 import해서 처리하기 쉽습니다. 하지만 온라인 데이터베이스는 이렇게 간단하게 사용하면 150G~200G 정도 됩니다. 방법은 비용이 너무 높기 때문에 오랜 탐색 끝에 여러 처리 방법을 요약했습니다.생산 환경 아키텍처 다이어그램현재 네트워크 아키텍처는 데이터 복사본을 두 개 저장하고 생성합니다. 비동기 복제를 통한 고가용성 클러스터 오류가 발생하면 두 시스템이 슬레이브로 전환되어 마스터가 됩니다. 오류를 처리할 때 발생하는 가장 일반적인 오류는 마스터입니다. 다음은 제가 수집한 오류 메시지입니다. >
가장 일반적인 세 가지 상황 이 세 가지 상황은 HA 전환 중 발생합니다. =0, 소수의 binlog가 완전히 수신되지 않아 동기화 오류가 발생했습니다.첫 번째 유형: 마스터에서 레코드를 삭제했지만 찾을 수 없습니다. >두 번째 유형: 슬레이브에 이미 레코드가 있고 동일한 레코드가 마스터에 삽입됩니다. 세 번째 유형: 마스터의 레코드를 업데이트하지만 찾을 수 없습니다.
Last_SQL_Error: Could not execute Delete_rows event on table hcy.t1;
Can't find record in 't1',
Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;
the event's master log mysql-bin.000006, end_log_pos 254
비동기 복제간단히 말하면, 슬레이브가 수신을 완료했는지, 실행을 완료했는지에 관계없이 마스터가 binlog를 전송하면 이 작업이 끝납니다.
반동기 복제간단히 말하면 마스터가 binlog를 보내고, 슬레이브는 이를 수신했음을 확인하지만, 완료 여부와 상관없이 마스터에게 수신 신호를 보낸다. (Google에서 작성한 코드, 5.5에서 공식적으로 적용됨)
비동기의 단점마스터에 대한 쓰기 작업이 바쁜 경우 현재 POS는 예를 들어 , 포인트는 10이고 슬레이브의 IO_THREAD 스레드는 3을 수신합니다. 이때 마스터가 다운되어 슬레이브에 전송되지 않고 데이터가 손실되는 7 포인트의 차이가 발생합니다. 특수 상황
슬레이브의 릴레이 로그 릴레이 빈이 손상되었습니다. Last_SQL_Error: Could not execute Write_rows event on table hcy.t1;
Duplicate entry '2' for key 'PRIMARY',
Error_code: 1062;
handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000006, end_log_pos 924
사람의 실수에 주의하세요. 여러 슬레이브에 중복된 서버 ID가 있습니다. 이 경우 동기화가 지연되고 동기화가 완료되지 않습니다. 오류 로그. 해결책은 서버 ID를 일관되지 않게 변경하는 것입니다.
Last_SQL_Error: Could not execute Update_rows event on table hcy.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 263
문제 처리
삭제 실패
마스터 삭제 슬레이브에서 찾을 수 없는 기록입니다. Last_SQL_Error: Error initializing relay log position: I/O error reading the header from the binary log
Last_SQL_Error: Error initializing relay log position: Binlog has bad magic number;
It's not a binary log file that can be used by this version of MySQL
Slave: received end packet from server, apparent master shutdown: Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000012' at postion 106
슬레이브에 이미 레코드가 존재하고, 마스터에도 같은 레코드가 삽입되어 있습니다. Last_SQL_Error: Could not execute Delete_rows event on table hcy.t1;
Can't find record in 't1',
Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;
the event's master log mysql-bin.000006, end_log_pos 254
해결책: 슬레이브에서 desc hcy.t1을 사용하고 먼저 테이블 구조를 살펴보세요.
stop slave; set global sql_slave_skip_counter=1; start slave;
중복 기본 키 삭제
Last_SQL_Error: Could not execute Write_rows event on table hcy.t1;
Duplicate entry '2' for key 'PRIMARY',
Error_code: 1062;
handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000006, end_log_pos 924
업데이트 손실
마스터에서는 레코드가 업데이트되지만 슬레이브에서는 레코드를 찾을 수 없으며 데이터가 손실됩니다. mysql> desc hcy.t1;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | 0 | |
| name | char(4) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
해결책:
마스터에서 mysqlbinlog를 사용하여 잘못된 binlog 로그가 수행하는 작업을 분석합니다. mysql> delete from t1 where id=2;
Query OK, 1 row affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
……
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
……
mysql> select * from t1 where id=2;
Last_SQL_Error: Could not execute Update_rows event on table hcy.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 794
그런 다음 마스터로 이동하여
/usr/local/mysql/bin/mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS mysql-bin.000010 | grep -A '10' 794 #120302 12:08:36 server id 22 end_log_pos 794 Update_rows: table id 33 flags: STMT_END_F ### UPDATE hcy.t1 ### WHERE ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2='bbc' /* STRING(4) meta=65028 nullable=1 is_null=0 */ ### SET ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2='BTV' /* STRING(4) meta=65028 nullable=1 is_null=0 */ # at 794 #120302 12:08:36 server id 22 end_log_pos 821 Xid = 60 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
를 확인하고 슬레이브에서 손실된 데이터를 입력한 후 오류 보고를 건너뜁니다. mysql> select * from t1 where id=2;
Empty set (0.00 sec)
아아아아수제수리
解决方法:找到同步的binlog和POS点,然后重新做同步,这样就可以有新的中继日值了。
例子:
mysql> show slave status\G; *************************** 1. row *************************** Master_Log_File: mysql-bin.000010 Read_Master_Log_Pos: 1191 Relay_Log_File: vm02-relay-bin.000005 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1593 Last_Error: Error initializing relay log position: I/O error reading the header from the binary log Skip_Counter: 1 Exec_Master_Log_Pos: 821
Slave_IO_Running :接收master的binlog信息
Master_Log_File
Read_Master_Log_Pos
Slave_SQL_Running:执行写操作
Relay_Master_Log_File
Exec_Master_Log_Pos
以执行写的binlog和POS点为准。
Relay_Master_Log_File: mysql-bin.000010 Exec_Master_Log_Pos: 821 mysql> stop slave; Query OK, 0 rows affected (0.01 sec) mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=821; Query OK, 0 rows affected (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.8.22 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000010 Read_Master_Log_Pos: 1191 Relay_Log_File: vm02-relay-bin.000002 Relay_Log_Pos: 623 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1191 Relay_Log_Space: 778 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Ibbackup
各种大招都用上了,无奈slave数据丢失过多,ibbackup(需要银子)该你登场了。
Ibbackup热备份工具,是付费的。xtrabackup是免费的,功能上一样。
Ibbackup备份期间不锁表,备份时开启一个事务(相当于做一个快照),然后会记录一个点,之后数据的更改保存在ibbackup_logfile文件里,恢复时把ibbackup_logfile 变化的数据再写入到ibdata里。
Ibbackup 只备份数据( ibdata、.ibd ),表结构.frm不备份。
下面一个演示例子:
备份:ibbackup /bak/etc/my_local.cnf /bak/etc/my_bak.cnf
恢复:ibbackup --apply-log /bak/etc/my_bak.cnf
[root@vm01 etc]# more my_local.cnf datadir =/usr/local/mysql/data innodb_data_home_dir = /usr/local/mysql/data innodb_data_file_path = ibdata1:10M:autoextend innodb_log_group_home_dir = /usr/local/mysql/data innodb_buffer_pool_size = 100M innodb_log_file_size = 5M innodb_log_files_in_group=2 [root@vm01 etc]# ibbackup /bak/etc/my_local.cnf /bak/etc/my_bak.cnf InnoDB Hot Backup version 3.0.0; Copyright 2002-2005 Innobase Oy License A21488 is granted to vm01 (chunyang_he@126.com) (--apply-log works in any computer regardless of the hostname) Licensed for use in a computer whose hostname is 'vm01' Expires 2012-5-1 (year-month-day) at 00:00 See http://www.innodb.com for further information Type ibbackup --license for detailed license terms, --help for help Contents of /bak/etc/my_local.cnf: innodb_data_home_dir got value /usr/local/mysql/data innodb_data_file_path got value ibdata1:10M:autoextend datadir got value /usr/local/mysql/data innodb_log_group_home_dir got value /usr/local/mysql/data innodb_log_files_in_group got value 2 innodb_log_file_size got value 5242880 Contents of /bak/etc/my_bak.cnf: innodb_data_home_dir got value /bak/data innodb_data_file_path got value ibdata1:10M:autoextend datadir got value /bak/data innodb_log_group_home_dir got value /bak/data innodb_log_files_in_group got value 2 innodb_log_file_size got value 5242880 ibbackup: Found checkpoint at lsn 0 1636898 ibbackup: Starting log scan from lsn 0 1636864 120302 16:47:43 ibbackup: Copying log... 120302 16:47:43 ibbackup: Log copied, lsn 0 1636898 ibbackup: We wait 1 second before starting copying the data files... 120302 16:47:44 ibbackup: Copying /usr/local/mysql/data/ibdata1 ibbackup: A copied database page was modified at 0 1636898 ibbackup: Scanned log up to lsn 0 1636898 ibbackup: Was able to parse the log up to lsn 0 1636898 ibbackup: Maximum page number for a log record 0 120302 16:47:46 ibbackup: Full backup completed! [root@vm01 etc]# [root@vm01 etc]# cd /bak/data/ [root@vm01 data]# ls ibbackup_logfile ibdata1 [root@vm01 data]# ibbackup --apply-log /bak/etc/my_bak.cnf InnoDB Hot Backup version 3.0.0; Copyright 2002-2005 Innobase Oy License A21488 is granted to vm01 (chunyang_he@126.com) (--apply-log works in any computer regardless of the hostname) Licensed for use in a computer whose hostname is 'vm01' Expires 2012-5-1 (year-month-day) at 00:00 See http://www.innodb.com for further information Type ibbackup --license for detailed license terms, --help for help Contents of /bak/etc/my_bak.cnf: innodb_data_home_dir got value /bak/data innodb_data_file_path got value ibdata1:10M:autoextend datadir got value /bak/data innodb_log_group_home_dir got value /bak/data innodb_log_files_in_group got value 2 innodb_log_file_size got value 5242880 120302 16:48:38 ibbackup: ibbackup_logfile's creation parameters: ibbackup: start lsn 0 1636864, end lsn 0 1636898, ibbackup: start checkpoint 0 1636898 ibbackup: start checkpoint 0 1636898 InnoDB: Doing recovery: scanned up to log sequence number 0 1636898 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 .....99 Setting log file size to 0 5242880 ibbackup: We were able to parse ibbackup_logfile up to ibbackup: lsn 0 1636898 ibbackup: Last MySQL binlog file position 0 1191, file name ./mysql-bin.000010 ibbackup: The first data file is '/bak/data/ibdata1' ibbackup: and the new created log files are at '/bak/data/' 120302 16:48:38 ibbackup: Full backup prepared for recovery successfully! [root@vm01 data]# ls ibbackup_logfile ibdata1 ib_logfile0 ib_logfile1
把ibdata1 ib_logfile0 ib_logfile1拷贝到从,把.frm也拷贝过去,启动MySQL后,做同步,那个点就是上面输出的:
ibbackup: Last MySQL binlog file position 0 1191, file name ./mysql-bin.000010 CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=1191;
Maatkit工具包
简介
maatkit是一个开源的工具包,为mysql日常管理提供了帮助。目前,已被Percona公司收购并维护。其中:
mk-table-checksum是用来检测master和slave上的表结构和数据是否一致。
mk-table-sync是发生主从数据不一致时,来修复的。
这两个工具包,没有在现网实际操作的经验,这里仅仅是新技术探讨和学术交流,下面展示下如何使用。
[root@vm02]# mk-table-checksum h=vm01,u=admin,p=123456 h=vm02,u=admin,p=123456 -d hcy -t t1 Cannot connect to MySQL because the Perl DBI module is not installed or not found. Run 'perl -MDBI' to see the directories that Perl searches for DBI. If DBI is not installed, try: Debian/Ubuntu apt-get install libdbi-perl RHEL/CentOS yum install perl-DBI OpenSolaris pgk install pkg:/SUNWpmdbi
提示缺少perl-DBI模块,那么直接 yum install perl-DBI。
[root@vm02 bin]# mk-table-checksum h=vm01,u=admin,p=123456 h=vm02,u=admin,p=123456 -d hcy -t t1 DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG hcy t1 0 vm02 InnoDB NULL 1957752020 0 0 NULL NULL hcy t1 0 vm01 InnoDB NULL 1957752020 0 0 NULL NULL
如果表数据不一致,CHECKSUM的值是不相等的。
解释下输出的意思:
DATABASE:数据库名
TABLE:表名
CHUNK:checksum时的近似数值
HOST:MYSQL的地址
ENGINE:表引擎
COUNT:表的行数
CHECKSUM:校验值
TIME:所用时间
WAIT:等待时间
STAT:MASTER_POS_WAIT()返回值
LAG:slave的延时时间
如果你想过滤出不相等的都有哪些表,可以用mk-checksum-filter这个工具,只要在后面加个管道符就行了。
[root@vm02 ~]# mk-table-checksum h=vm01,u=admin,p=123456 h=vm02,u=admin,p=123456 -d hcy | mk-checksum-filter hcy t2 0 vm01 InnoDB NULL 1957752020 0 0 NULL NULL hcy t2 0 vm02 InnoDB NULL 1068689114 0 0 NULL NULL
知道有哪些表不一致,可以用mk-table-sync这个工具来处理。
注:在执行mk-table-checksum时会锁表,表的大小取决于执行的快慢。
MASTER上的t2表数据:
SLAVE上的t2表数据:
mysql> select * from t2; mysql> select * from t2; +----+------+ +----+------+ | id | name | | id | name | +----+------+ +----+------+ | 1 | a | | 1 | a | | 2 | b | | 2 | b | | 3 | ss | | 3 | ss | | 4 | asd | | 4 | asd | | 5 | ss | +----+------+ +----+------+ 4 rows in set (0.00 sec) 5 rows in set (0.00 sec) mysql> \! hostname; mysql> \! hostname; vm02 vm01 [root@vm02 ~]# mk-table-sync --execute --print --no-check-slave --transaction --databases hcy h=vm01,u=admin,p=123456 h=vm02,u=admin,p=123456 INSERT INTO `hcy`.`t2`(`id`, `name`) VALUES ('5', 'ss') /*maatkit src_db:hcy src_tbl:t2 src_dsn:h=vm01,p=...,u=admin dst_db:hcy dst_tbl:t2 dst_dsn:h=vm02,p=...,u=admin lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3246 user:root host:vm02*/;
它的工作原理是:先一行一行检查主从库的表是否一样,如果哪里不一样,就执行删除,更新,插入等操作,使其达到一致。表的大小决定着执行的快慢。
If C<--transaction> is specified, C<LOCK TABLES> is not used. Instead, lock and unlock are implemented by beginning and committing transactions. The exception is if L<"--lock"> is 3. If C<--no-transaction> is specified, then C<LOCK TABLES> is used for any value of L<"--lock">. See L<"--[no]transaction">. When enabled, either explicitly or implicitly, the transaction isolation level is set C<REPEATABLE READ> and transactions are started C<WITH CONSISTENT SNAPSHOT>
MySQL复制监控
MySQL常见错误类型
1005:创建表失败
1006:创建数据库失败
1007:数据库已存在,创建数据库失败
1008:数据库不存在,删除数据库失败
1009:不能删除数据库文件导致删除数据库失败
1010:不能删除数据目录导致删除数据库失败
1011:删除数据库文件失败
1012:不能读取系统表中的记录
1020:记录已被其他用户修改
1021:硬盘剩余空间不足,请加大硬盘可用空间
1022:关键字重复,更改记录失败
1023:关闭时发生错误
1024:读文件错误
1025:更改名字时发生错误
1026:写文件错误
1032:记录不存在
1036:数据表是只读的,不能对它进行修改
1037:系统内存不足,请重启数据库或重启服务器
1038:用于排序的内存不足,请增大排序缓冲区
1040:已到达数据库的最大连接数,请加大数据库可用连接数
1041:系统内存不足
1042:无效的主机名
1043:无效连接
1044:当前用户没有访问数据库的权限
1045:不能连接数据库,用户名或密码错误
1048:字段不能为空
1049:数据库不存在
1050:数据表已存在
1051:数据表不存在
1054:字段不存在
1065:无效的SQL语句,SQL语句为空
1081:不能建立Socket连接
1114:数据表已满,不能容纳任何记录
1116:打开的数据表太多
1129:数据库出现异常,请重启数据库
1130:连接数据库失败,没有连接数据库的权限
1133:数据库用户不存在
1141:当前用户无权访问数据库
1142:当前用户无权访问数据表
1143:当前用户无权访问数据表中的字段
1146:数据表不存在
1147:未定义用户对数据表的访问权限
1149:SQL语句语法错误
1158:网络错误,出现读错误,请检查网络连接状况
1159:网络错误,读超时,请检查网络连接状况
1160:网络错误,出现写错误,请检查网络连接状况
1161:网络错误,写超时,请检查网络连接状况
1062:字段值重复,入库失败
1169:字段值重复,更新记录失败
1177:打开数据表失败
1180:提交事务失败
1181:回滚事务失败
1203:当前用户和数据库建立的连接已到达数据库的最大连接数,请增大可用的数据库连接数或重启数据库
1205:加锁超时
1211:当前用户没有创建用户的权限
1216:外键约束检查失败,更新子表记录失败
1217:外键约束检查失败,删除或修改主表记录失败
1226:当前用户使用的资源已超过所允许的资源,请重启数据库或重启服务器
1227:权限不足,您无权进行此操作
1235:MySQL版本过低,不具有本功能
复制监控脚本
参考原文修改。
原脚本
#!/bin/bash # #check_mysql_slave_replication_status # # # parasum=2 help_msg(){ cat << help +---------------------+ +Error Cause: +you must input $parasum parameters! +1st : Host_IP +2st : Host_Port help exit } [ $# -ne ${parasum} ] && help_msg #若参数不够打印帮助信息并退出 export HOST_IP=$1 export HOST_PORt=$2 MYUSER="root" MYPASS="123456" MYSQL_CMD="mysql -u$MYUSER -p$MYPASS" MailTitle="" #邮件主题 Mail_Address_MysqlStatus="root@localhost.localdomain" #收件人邮箱 time1=$(date +"%Y%m%d%H%M%S") time2=$(date +"%Y-%m-%d %H:%M:%S") SlaveStatusFile=/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo "--------------------Begin at: "$time2 > $SlaveStatusFile echo "" >> $SlaveStatusFile #get slave status ${MYSQL_CMD} -e "show slave status\G" >> $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus=$(cat $SlaveStatusFile|grep Slave_IO_Running|awk '{print $2}') SQLStatus=$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk '{print $2}') Errno=$(cat $SlaveStatusFile|grep Last_Errno | awk '{print $2}') Behind=$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk '{print $2}') echo "" >> $SlaveStatusFile if [ "$IOStatus" == "No" ] || [ "$SQLStatus" == "No" ];then #判断错误类型 if [ "$Errno" -eq 0 ];then #可能是salve线程未启动 $MYSQL_CMD -e "start slave io_thread;start slave sql_thread;" echo "Cause slave threads doesnot's running,trying start slsave io_thread;start slave sql_thread;" >> $SlaveStatusFile MailTitle="[Warning] Slave threads stoped on $HOST_IP $HOST_PORT" elif [ "$Errno" -eq 1007 ] || [ "$Errno" -eq 1053 ] || [ "$Errno" -eq 1062 ] || [ "$Errno" -eq 1213 ] || [ "$Errno" -eq 1032 ]\ || [ "Errno" -eq 1158 ] || [ "$Errno" -eq 1159 ] || [ "$Errno" -eq 1008 ];then #忽略此些错误 $MYSQL_CMD -e "stop slave;set global sql_slave_skip_counter=1;start slave;" echo "Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter=1;slave start;" >> $SlaveStatusFile MailTitle="[Warning] Slave error on $HOST_IP $HOST_PORT! ErrNum: $Errno" else echo "Slave $HOST_IP $HOST_PORT is down!" >> $SlaveStatusFile MailTitle="[ERROR]Slave replication is down on $HOST_IP $HOST_PORT ! ErrNum:$Errno" fi fi if [ -n "$Behind" ];then Behind=0 fi echo "$Behind" >> $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo `date +"%Y-%m%d %H:%M:%S"` "slave is behind master $Bebind seconds!" >> $SlaveStatusFile MailTitle="[Warning]Slave delay $Behind seconds,from $HOST_IP $HOST_PORT" fi if [ -n "$MailTitle" ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s "$MailTitle" $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile > $SlaveStatusFile
修改后脚本
只做了简单的整理,修正了Behind为NULL的判断,但均未测试;
应可考虑增加:
对修复执行结果的判断;多条错误的循环修复、检测、再修复?
取消SlaveStatusFile临时文件。
Errno、Behind两种告警分别发邮件,告警正文增加show slave结果原文。
增加PATH,以便加到crontab中。
考虑crontab中周期执行(加锁避免执行冲突、执行周期选择)
增加执行日志?
#!/bin/sh # check_mysql_slave_replication_status # 参考:http://www.tianfeiyu.com/?p=2062 Usage(){ echo Usage: echo "$0 HOST PORT USER PASS" } [ -z "$1" -o -z "$2" -o -z "$3" -o -z "$4" ] && Usage && exit 1 HOST=$1 PORT=$2 USER=$3 PASS=$4 MYSQL_CMD="mysql -h$HOST -P$PORT -u$USER -p$PASS" MailTitle="" #邮件主题 Mail_Address_MysqlStatus="root@localhost.localdomain" #收件人邮箱 time1=$(date +"%Y%m%d%H%M%S") time2=$(date +"%Y-%m-%d %H:%M:%S") SlaveStatusFile=/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo "--------------------Begin at: "$time2 > $SlaveStatusFile echo "" >> $SlaveStatusFile #get slave status ${MYSQL_CMD} -e "show slave status\G" >> $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus=$(cat $SlaveStatusFile|grep Slave_IO_Running|awk '{print $2}') SQLStatus=$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk '{print $2}') Errno=$(cat $SlaveStatusFile|grep Last_Errno | awk '{print $2}') Behind=$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk '{print $2}') echo "" >> $SlaveStatusFile if [ "$IOStatus" = "No" -o "$SQLStatus" = "No" ];then case "$Errno" in 0) # 可能是slave未启动 $MYSQL_CMD -e "start slave io_thread;start slave sql_thread;" echo "Cause slave threads doesnot's running,trying start slsave io_thread;start slave sql_thread;" >> $SlaveStatusFile ;; 1007|1053|1062|1213|1032|1158|1159|1008) # 忽略这些错误 $MYSQL_CMD -e "stop slave;set global sql_slave_skip_counter=1;start slave;" echo "Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter=1;slave start;" >> $SlaveStatusFile MailTitle="[Warning] Slave error on $HOST:$PORT! ErrNum: $Errno" ;; *) echo "Slave $HOST:$PORT is down!" >> $SlaveStatusFile MailTitle="[ERROR]Slave replication is down on $HOST:$PORT! Errno:$Errno" ;; esac fi if [ "$Behind" = "NULL" -o -z "$Behind" ];then Behind=0 fi echo "Behind:$Behind" >> $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo `date +"%Y-%m%d %H:%M:%S"` "slave is behind master $Bebind seconds!" >> $SlaveStatusFile MailTitle="[Warning]Slave delay $Behind seconds,from $HOST $PORT" fi if [ -n "$MailTitle" ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s "$MailTitle" $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile > $SlaveStatusFile
위 내용은 온라인 MYSQL 동기화 오류 처리 방법 코드 요약에 대한 자세한 설명의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!