webgame中Mysql Deadlock ERROR 1213 (40001)错误的排查历程
案例发现: 从我们正在运营的一款webgame的异常日志中看到一些程序执行MYSQL 语句的报错信息。比较多的是 “Deadlock found when trying to get lock; try restarting transaction” ,少部分是“Error number: 1205:Lock wait timeout exceeded; try restarti
案例发现:
从我们正在运营的一款webgame的异常日志中看到一些程序执行MYSQL 语句的报错信息。比较多的是“Deadlock found when trying to get lock; try restarting transaction”,少部分是“Error number: 1205:Lock wait timeout exceeded; try restarting transaction”,如下:
001 --> 2012-11-22 06:05:36 --> ERROR -->system/database/Driver.php--777--log--Debug 002 --> 2012-11-22 06:05:36 --> ERROR -->system/database/Driver.php--295--error--JV_Driver 003 --> 2012-11-22 06:05:36 --> ERROR -->system/database/ActiveRecord.php--947--query--JV_Driver 004 --> 2012-11-22 06:05:36 --> ERROR -->server/models/MRoleMonster.php--84--update--JV_ActiveRecord 005 --> 2012-11-22 06:05:36 --> ERROR -->server/daemon/update.php--392--kill--MRoleMonster 006 --> 2012-11-22 06:05:36 --> ERROR --> DATABASE: xxx_roles_xxx(10.1.1.75) --> Error number: 1205:#####Lock wait timeout exceeded; try restarting transaction##### --> Error Message: #####db_query_error --> Query Error: UPDATE `monster` SET `kills` = kills + 1 WHERE `id` = '30036' AND `role_id` = '19863'.##### --> query elapsed counter: 184293;time 590.4272678 --> Database Connection has be closed:dbwRole 001 --> 2012-11-28 15:59:47 --> ERROR -->system/database/Driver.php--777--log--Debug 002 --> 2012-11-28 15:59:47 --> ERROR -->system/database/Driver.php--295--error--JV_Driver 003 --> 2012-11-28 15:59:47 --> ERROR -->system/database/ActiveRecord.php--948--query--JV_Driver 004 --> 2012-11-28 15:59:47 --> ERROR -->server/models/MRole.php--1143--update--JV_ActiveRecord 005 --> 2012-11-28 15:59:47 --> ERROR -->server/daemon/update_other.php--283--updateRoleState--MRole 006 --> 2012-11-28 15:59:47 --> ERROR --> DATABASE: xxx_roles_xxx(10.1.1.72) --> Error number: 1213:#####Deadlock found when trying to get lock; try restarting transaction##### --> Error Message: #####db_query_error --> Query Error: UPDATE `role_state` SET `state` = 1 WHERE `role_id` = '53016'.##### --> query elapsed counter: 4972;time 4.2417307 --> Database Connection has be closed:dbwRole 007 --> 2012-11-28 15:59:47 --> ERROR -->system/database/Driver.php--616--log--Debug 008 --> 2012-11-28 15:59:47 --> ERROR -->server/daemon/combat_update.php--308--transComplete--JV_Driver 009 --> 2012-11-28 15:59:47 --> ERROR --> DB Transaction Failure.
从报错的英文上理解,大约是发生了“死锁”,以及“事务锁等待超时”两个错误异常。而且,都是我们后台PHP常驻进程遇到的问题。异常的代码对应行数上,大约可理解为执行SQL语句的一个指令,并无特殊的东西。有经验的程序员,很容易看出来,这不是程序的异常,这是MYSQL事务中,锁竞争的异常,客户端(PHP常驻进程)是没有语法上的错误的。那该如何排查呢?
一串疑问:
这是什么问题?如何排查?什么时候发生死锁? 我怎么知道他发生了? 发生之后去哪里排查?如何排查?怎么确定他们对应的事务中的所有SQL语句? 分别在哪几个事务中? 谁先锁的?谁后锁的?谁没锁到?谁报的死锁错误? 死锁是什么?为什么发生了? 如何避免?还有哪些因素影响?
毫无头绪:
程序间数据交互,上strace神器?
跟踪谁? 客户端(php)?你知道哪个客户端会发生这个问题?你知道啥时候会发生?在你开始抓包到抓到死锁的期间,这得是多大的数据量?
跟踪谁? 服务端(Mysql)?玩笑开大了吧?mysql以进程模式来处理客户端请求,每次都是一个新的进程,strace -ff参数的话,想想日志文件得被创建多少个,数据量会小么?
“万军之中取上将首级”这本事我可没…strace排查这种错误,还是算了吧。
这是谁报的错?显然是mysql,那就从根源找起–MYSQL server。
抓获现场:
我们要还原案发现场,有幸的时,我们有监控记录BINLOG以及SHOW ENGINE INNODB STATUS。到对应mysql服务器上,执行“show engine innodb status”获取INNODB引擎当前信息,大约如下:
...... ------------------------ LATEST DETECTED DEADLOCK ------------------------ 121128 15:59:46 *** (1) TRANSACTION: TRANSACTION AC512256, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1 MySQL thread id 122562823, OS thread handle 0x7fa5c4fbe700, query id 7457663621 10.1.1.8 s001_gamedb Updating UPDATE `role_state` SET `state` = 1 WHERE `role_id` = '53016' *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 477 page no 1386 n bits 128 index `PRIMARY` of table `xxx_roles_xxx`.`role_state` trx id AC512256 lock_mode X locks rec but not gap waiting Record lock, heap no 17 PHYSICAL RECORD: n_fields 80; compact format; info bits 0 0: len 3; hex 00cf18; asc ;; ...... ...... *** (2) TRANSACTION: TRANSACTION AC512250, ACTIVE 0 sec inserting, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2 MySQL thread id 122679850, OS thread handle 0x7fac007ff700, query id 7457663711 10.1.1.8 s001_gamedb update REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES ('53016', 4967, 3291, 350, 174, ***) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 477 page no 1386 n bits 128 index `PRIMARY` of table `xxx_roles_xxx`.`role_state` trx id AC512250 lock_mode X locks rec but not gap Record lock, heap no 17 PHYSICAL RECORD: n_fields 80; compact format; info bits 0 ...... ...... *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 427 page no 488 n bits 192 index `PRIMARY` of table `xxx_roles_xxx`.`role_fight` trx id AC512250 lock_mode X locks rec but not gap waiting Record lock, heap no 64 PHYSICAL RECORD: n_fields 51; compact format; info bits 0 ...... *** WE ROLL BACK TRANSACTION (1) ......
这是我精简之后的信息,我抓去了LATEST DETECTED DEADLOCK部分的数据,这部分的数据是INNODB的最后一次发生死锁的信息,更详细的说明见MYSQL官方手册对Standard Monitor and Lock Monitor Output返回结果的解释。
OK,发现一场案例,保存这个INNODB的状态数据备用。迅速到程序异常日志中查看相同时间点是否有死锁发生。果然,我们程序异常日志中记录了这起案例(文章开头的日志)。
再到binlog中抓去这个时间段前后10分钟(大约范围)的mysql sql语句执行日志。
案情分析:
engine status中,大约看出MYSQL记录了两个事务之间发生锁竞争时,遗留的数据,
事务1“执行”(注意,这里加了双引号)
UPDATE `role_state` SET `state` = 1 WHERE `role_id` = '53016'
发现被修改资源已经被lock_mode X locks了(详情见:INNODB锁模式),准备等待该资源锁被释放。
事务2执行
REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES ('53016', 4967, 3291, 350, 174, ***)
也发现该资源被lock_mode X locks了。
最后部分,mysql给了很重要的一个数据“WE ROLL BACK TRANSACTION (1)” MYSQL回滚了事物1。既然mysql回滚了1,那么肯定是事务1的语句触发了死锁,被mysql回滚了,也就是应该为程序中的异常日志所记录的那部分。同时,MYSQL执行了事务2,那么事务2的SQL语句肯定被记录在BINLOG中了。
抽丝剥茧:
如何确定事务1、事务2执行了哪些SQL语句呢?
根据show engine innodb status的结果,确定事务2被执行的
- SQL语句(业务逻辑的role_id唯一标识): REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES (’53016′, 4967, 3291, 350, 174, ***)
- 线程ID(mysql的唯一标识): MySQL thread id 122679850
- 执行时间(时间线):121128 15:59:46
根据这三个标识,以及BINLOG的起始表示“BEGIN、COMMIT”,几乎可以100%确定该事务所包含的SQL语句。
BINLOG信息大约如下:
# at 511750764 #121128 15:59:46 server id 1 end_log_pos 511750843 Query thread_id=122679850 exec_time=0 error_code=0 SET TIMESTAMP=1354089586/*!*/; BEGIN /*!*/; # at 511750843 #121128 15:59:46 server id 1 end_log_pos 511751090 Query thread_id=122679850 exec_time=0 error_code=0 use xxx_roles_xxx/*!*/; SET TIMESTAMP=1354089586/*!*/; UPDATE `role_pet` SET `in_supporting` = 0, `levelup_pause_time` = 1354089587, `auto_feed` = 0, `supporting_pause_time` = 1354089587 WHERE `role_id` = '53016' AND `id` = 9234 /*!*/; # at 511751090 #121128 15:59:46 server id 1 end_log_pos 511751240 Query thread_id=122679850 exec_time=0 error_code=0 SET TIMESTAMP=1354089586/*!*/; UPDATE `role_state` SET `pet` = 0, `pet_level` = 0 WHERE `role_id` = '53016' /*!*/; # at 511751240 #121128 15:59:46 server id 1 end_log_pos 511751885 Query thread_id=122679850 exec_time=0 error_code=0 SET TIMESTAMP=1354089586/*!*/; REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`, `defend_physical`, `defend_internal`, `dodge_rate`, `critical_rate`, `hit_rate`, `speed`, `defend_physical_plus`, `defend_internal_plus`, `dodge_level`,*****) VALUES ('53016', 4967, 3291, 350, 174, 518, 254, 500, 300, 9500, 913, 668, 668, 261, 700, 97, 133, 40.9, 34, *****) /*!*/; # at 511751885 #121128 15:59:46 server id 1 end_log_pos 511751912 Xid = 7457663579 COMMIT/*!*/;
OK,事务2的SQL语句全部找齐了。那么事务1的呢?如何找?
根据php的异常报错,确定主要包含的语句SQL信息,以及程序跟踪的代码行数,根据代码逻辑去确定该事务的所有SQL语句。再去BINLOG中找到该用户的该业务的类似BINLOG:
# at 511805324 #121128 15:59:53 server id 1 end_log_pos 511805403 Query thread_id=122562823 exec_time=0 error_code=0 SET TIMESTAMP=1354089593/*!*/; BEGIN /*!*/; # at 511805403 #121128 15:59:53 server id 1 end_log_pos 511805560 Query thread_id=122562823 exec_time=0 error_code=0 use xxx_roles_xxx/*!*/; SET TIMESTAMP=1354089593/*!*/; UPDATE `role_fight` SET `last_update_life` = '1354089587' WHERE `role_id` = '53016' /*!*/; # at 511805560 #121128 15:59:53 server id 1 end_log_pos 511805695 Query thread_id=122562823 exec_time=0 error_code=0 SET TIMESTAMP=1354089593/*!*/; UPDATE `role_state` SET `state` = 1 WHERE `role_id` = '53016' /*!*/; # at 511805695 #121128 15:59:53 server id 1 end_log_pos 511805889 Query thread_id=122562823 exec_time=0 error_code=0 use xxx_roles_xxx/*!*/; SET TIMESTAMP=1354089593/*!*/; DELETE FROM `queue_combats_update_roles` WHERE `combat_id` = 'f27d62dad8efcaeb04cd8f5d7c0424db' AND `role_id` = '53016' /*!*/; # at 511805889 #121128 15:59:53 server id 1 end_log_pos 511805916 Xid = 7457670215 COMMIT/*!*/;
(请勿过于纠结上面binlog的thread_id跟show engine innodb status的thread_id一致的问题,这是因为我们程序是常驻进程,mysql连接不断开,不销毁,故一致了。而且,此日志是程序发现死锁之后,被mysql回滚之后,又重新提交的事务,算是不同时间点的相同事务)
案情还原:
根据案发现场的两个MYSQL INNODB事务的全部SQL语句,以及形成MYSQL INNODB 死锁的原因(感谢DBA组大雄哥的纠正),我们大约可以这么还原案情:
事务1:
UPDATE `role_fight` SET `last_update_life` = ’1354089587′ WHERE `role_id` = ’53016′
UPDATE `role_state` SET `state` = 1 WHERE `role_id` = ’53016′
事务2:
UPDATE `role_state` SET `pet` = 0, `pet_level` = 0 WHERE `role_id` = ’53016′
REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES (’53016′, 4967, 3291, 350, 174, ***)
这四条语句构成了本次事务死锁的全部原因。
执行顺序肯定如下:
时间点 | 事务1 | 事务2 | 备注 |
1 | begin | ||
2 | begin | ||
3 | UPDATE `role_state` SET `pet` = 0, `pet_level` = 0 WHERE `role_id` = ’53016′ | 事务2 给 role_state表 role_id 53016记录上 X 锁 | |
4 | UPDATE `role_fight` SET `last_update_life` = ’1354089587′ WHERE `role_id` = ’53016′ | 事务1 给 role_fight表 role_id 53016记录上 X 锁 | |
5 | REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES (’53016′, 4967, 3291, 350, 174, ***) | 这里是重点,事务2给role_fight表role_id的记录上 X 锁,发现被其他人(事务1)上锁了,开始等待他人提交事务…等待… | |
6 | UPDATE `role_state` SET `state` = 1 WHERE `role_id` = ’53016′ | 事物1打算给role_state表role_id为53016记录上 X 排它锁,发现被其他事务上了,而且此事务居然还在等他提交,这时MYSQL立刻回滚事务1…(php发现MYSQL返回死锁信息,随记录该信息到异常日志…发送回滚指令…mysql已经“帮”他回滚了…) | |
7 | 【执行成功…】 | 事务2发现别人释放锁了,OK,获取X锁,修改成功 | |
8 | commit | PHP程序发现上一条指令执行完毕,且执行无错,即,发送commit指令,提交事务。 |
好像有个参数%^:
innodb_lock_wait_timeout参数是干啥的呢?从mysql官方手册上看,此参数是针对锁等待时,一个限定等待时间的参数。跟死锁并无关系,一旦mysql发现死锁,立刻回滚导致死锁的语句。并不会用到该参数。
规避方式:
- 缩小事务的语句数量
- 调整SQL语句执行顺序,变“死锁”为“锁等待”,等待一会,总比整个事务回滚掉,全部重新再执行这个流程要强.
- 其他.请补充
关于锁等待:
缩小事务间SQL语句的数量,减小规模吧。当然,提高检索速度,提高查询时间也是首要因素,我们就发现我们的SQL语句中,有几个没有用到索引,导致锁表,导致锁等待发生…
备注:
年底了,冲KPI的,各位见笑了.
原文地址:webgame中Mysql Deadlock ERROR 1213 (40001)错误的排查历程, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Big data structure processing skills: Chunking: Break down the data set and process it in chunks to reduce memory consumption. Generator: Generate data items one by one without loading the entire data set, suitable for unlimited data sets. Streaming: Read files or query results line by line, suitable for large files or remote data. External storage: For very large data sets, store the data in a database or NoSQL.

MySQL query performance can be optimized by building indexes that reduce lookup time from linear complexity to logarithmic complexity. Use PreparedStatements to prevent SQL injection and improve query performance. Limit query results and reduce the amount of data processed by the server. Optimize join queries, including using appropriate join types, creating indexes, and considering using subqueries. Analyze queries to identify bottlenecks; use caching to reduce database load; optimize PHP code to minimize overhead.

Backing up and restoring a MySQL database in PHP can be achieved by following these steps: Back up the database: Use the mysqldump command to dump the database into a SQL file. Restore database: Use the mysql command to restore the database from SQL files.

How to insert data into MySQL table? Connect to the database: Use mysqli to establish a connection to the database. Prepare the SQL query: Write an INSERT statement to specify the columns and values to be inserted. Execute query: Use the query() method to execute the insertion query. If successful, a confirmation message will be output.

One of the major changes introduced in MySQL 8.4 (the latest LTS release as of 2024) is that the "MySQL Native Password" plugin is no longer enabled by default. Further, MySQL 9.0 removes this plugin completely. This change affects PHP and other app

To use MySQL stored procedures in PHP: Use PDO or the MySQLi extension to connect to a MySQL database. Prepare the statement to call the stored procedure. Execute the stored procedure. Process the result set (if the stored procedure returns results). Close the database connection.

Creating a MySQL table using PHP requires the following steps: Connect to the database. Create the database if it does not exist. Select a database. Create table. Execute the query. Close the connection.

Oracle database and MySQL are both databases based on the relational model, but Oracle is superior in terms of compatibility, scalability, data types and security; while MySQL focuses on speed and flexibility and is more suitable for small to medium-sized data sets. . ① Oracle provides a wide range of data types, ② provides advanced security features, ③ is suitable for enterprise-level applications; ① MySQL supports NoSQL data types, ② has fewer security measures, and ③ is suitable for small to medium-sized applications.
