MySQL: Permanent solution to "Waiting for table metadata lock" issue

Question

My MySQL database serves three web applications as a storage backend. However, I recently encountered the error "Waiting for table metadata lock" permanently. This happens almost all the time and I don't understand why. mysql>showprocesslist->;+------+-----------+------------------+------ ------+---------+------+-------------------------- -----

P粉647449444 · Answer

Unfortunately, the accepted solution is wrong. That's absolutely correct

This is indeed (almost certainly; see below) the thing to do. But then it shows,

...And 1398 is not the connection with the lock. how so? 1398 is a connection waiting to be locked. This means it hasn't acquired the lock yet, so killing it has no effect. The process holding the lock will still hold the lock, and the next thread trying to perform some operation will therefore also stop and enter the "wait for metadata lock" in the appropriate order.

You can't guarantee that a process "waiting for a metadata lock" (WFML) won't block as well, but you can be sure that just killing the WFML process will do nothing .

The real reason is that another process is holding the lock, and more importantly, SHOW FULL PROCESSLIST will not directly tell you which process.

One thing you can be sure of is that there are no processes marked "waiting for metadata lock". It can be said that these people are victims.

SHOW FULL PROCESSLIST WILL Tells you if the process is doing something, yes. Usually it will work. Here, the process holding the lock does nothing and is hidden in other threads that also do nothing and report as "sleeping".

If SHOW FULL PROCESSLIST shows you a process running DML, or in a "Send Data" state, then that is almost certainly the culprit. Other processes are waiting for it to release the lock (they can be implicit locks; the process does not need to issue LOCK TABLE at all, which actually locks differently). But a process can hold a lock while not performing any operations and be appropriately marked as "sleeping".

In the OP's case, the culprit is almost certainly process 1396, which was started before process 1398, is now in sleeping state, and has been Lasted 46 seconds. Since 1396 has apparently done everything it needs to do (it turns out it's sleeping now, and has been doing so for 46 seconds, as far as MySQL is concerned), no threads have entered at It can hold the lock and still hold it before sleeping (otherwise 1396 would also stop).

Due to MySQL's "deadlock-free" locking policy, no process can hold a lock, release the lock, and restore the lock again; therefore, lock waits are always performed by a process that still holds the lock and has never held the lock before. caused by the process. This is useful (we'll exploit this fact below) because it guarantees that the lock "queue" is ordered.

IMPORTANT: If you connect to MySQL as a restricted user, SHOW FULL PROCESSLIST will not show all processes. So the lock may be held by a process that you don't see.

So: If

SHOW FULL PROCESSLIST shows you everything and shows a Running process, then that process is probably responsible and you need to wait for it to finish what it's doing Anything (or you can kill it - at your own risk).

The rest of this answer deals with a confusing situation where a process is waiting

for no apparent reason and no one seems to be doing anything.

Better`Display process list`

SELECT ID, TIME, USER, HOST, DB, COMMAND, STATE, INFO
    FROM INFORMATION_SCHEMA.PROCESSLIST WHERE DB IS NOT NULL
    AND (`INFO` NOT LIKE '%INFORMATION_SCHEMA%' OR INFO IS NULL)
    ORDER BY `DB`, `TIME` DESC

The above can be adjusted to only show processes that are in SLEEP state, and it will sort them in descending time order anyway, so it's easier to find hung processes (which are usually due to the ordering in "Waiting for metadata" sleep one immediately before locking"; and it is always one of more sleeps than any pending sleep time.

important things

Keep any "waiting for metadata lock" processes separate .

Quick and dirty solution, not really recommended, but quick

Kill all processes in the "sleeping" state on the same database that are older than the oldest thread in the "waiting for metadata lock" state. This is what Arnaud Amaury would do:

For each database with at least one thread in WaitingForMetadataLock:
- The oldest connection in WFML on this database has existed for Z seconds
- All "sleeping" threads on this database and older than Z must die. Start with the freshest, just in case.
- If there is an old non-sleeping connection on that database, it may be the one holding the lock, but it is doing something . You can of course kill it, but especially if it's an update/insert/delete, do so at your own risk.
- After each KILL, re-evaluate the situation and restart the process accordingly. The waiting processes may be running now, or they may have been running briefly and are now sleeping. They may even now hold new metadata locks.

Ninety-nine times out of a hundred, the thread to be killed is the youngest ## thread that is sleeping and older than the older thread waiting for the metadata lock. #the rout:

TIME     STATUS
319      Sleep
205      Sleep
 19      Sleep

(*) The TIME sequence actually has milliseconds, or I'm told, it just doesn't show them. So while both processes have a time value of 19, the lowest process should be younger.

More focused repair

Run

SHOW ENGINE INNODB STATUS

and look at the "TRANSACTION" section. Among other things, you'll find something like

TRANSACTION 1701, ACTIVE 58 sec;2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 1396, OS thread handle 0x7fd06d675700, query id 1138 hostname 1.2.3.4 whatever;

Now you use

SHOW FULL PROCESSLIST

to check what thread id 1396 is doing with its #1701 transaction. It is most likely in a "sleeping" state. So: an active transaction with an active lock (#1701), it even made some changes because it has an undo log entry... but is currently idle. This is the thread you need to kill. These changes are lost. Remember that doing nothing in MySQL does not mean doing nothing in general. If you get some records from MySQL and build a CSV for FTP upload, the MySQL connection is idle during the FTP upload.

Actually, if the process using MySQL and the MySQL server are on the same computer, that computer is running Linux, and you have root privileges, there is a way to find out which

process

owns the requested Connect lock. This in turn allows determining (based on CPU usage, or at worst strace -ff -p pid) whether the process is really doing something, to help determine Whether it is safe to kill someone.

Why does this happen?

I've seen this happen with web applications that use "persistent" or "pooled" MySQL connections, now usually with very little time savings: the web application instance terminates, but the connection does not , so it's locked alive... and blocked from others.

Another interesting approach I've found is, in the hypothetical above, to run a query that returns some rows, and retrieve only some of them . If the query is not set to "auto-clean" (but the underlying DBA does), it will keep the connection open and prevent a full lock on the table. I ran into this with a piece of code that was verifying that a row existed by selecting it and verifying if it had an error (it doesn't exist) or not (it must exist), but didn't actually retrieve the row .

PHP and PDO

PDO has persistent connection capabilities. This is how I make sure PDO doesn't pool the connections and close each connection. Very messy.

When opening, set options (the fourth option is new PDO()):

PDO::ATTR_PERSISTENT => false

When disconnected:

// We should have no transactions and no locks. // So we discard them. try { $pdo->exec('ROLLBACK WORK'); $pdo->exec('UNLOCK TABLES'); } catch (Exception $err) { // Send a mail } // No cooperative locks. So this will not hurt a bit. try { $pdo->exec('DO RELEASE_ALL_LOCKS()'); } catch (Exception $err) { // Send a mail } // Ensure the connection withers on the vine, but not too soon. $pdo->exec('SET wait_timeout = 5'); // $pdo->setAttribute(PDO::ATTR_TIMEOUT, 5); // If nothing else works! // try { // $pdo->exec('KILL CONNECTION_ID()'); // } catch (Exception $err) { // // Exception here is expected: "Query execution was interrupted" // } // Invoke the garbage collector $pdo = NULL;

Ask the database

If you have the latest MySQL, but not too new , as this will be deprecated , another way to find the culprit is (again you will need the permissions info mode )

SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS WHERE LOCK_TRX_ID IN (SELECT BLOCKING_TRX_ID FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS);

Actual solution, takes time and work

Problems are often caused by this architecture:

When the web application terminates or a web application lightweight thread instance terminates, the container/connection pool may not . It is the container that keeps the connection open, so obviously the connection is not closed. As expected, MySQL does not consider the operation complete.

If the web application does not clean up after itself (no ROLLBACK or COMMIT of transactions, no UNLOCK TABLES, etc.), then the web application starts executing Any operations will still exist and may still block everyone else.

Then there are two solutions. Even worse is lowering the idle timeout. But guess what happens if you wait too long between queries (literally: "MySQL server has gone away"). You can then use mysql_ping if available (soon to be deprecated. PDO has a workaround . Alternatively you can check for the error if Reopen the connection if this occurs (that's the Python way). So - with a small performance cost - this is doable.

Better, smarter solutions aren’t that simple to implement. Try to let the script clean up after itself, making sure to retrieve all rows or release all query resources, catch all exceptions and handle them correctly, or, skip persistent connections altogether if possible . Let each instance create its own connection or use a smart pool driver (in PHP PDO, use PDO::ATTR_PERSISTENT set explicitly to false).

Alternatively (like in PHP), you can have the destructor and exception handler force the connection to be cleaned up by committing or rolling back the transaction (which should be enough), or maybe even issue an explicit table unlock and RELEASE_ALL_LOCKS(), or submit connection suicide ( KILL CONNECTION_ID()) for good results.

I don't know of a way to query existing result set resources to release them; the only way is to save these resources into a private array.

BetterDisplay process list

important things

Quick and dirty solution, not really recommended, but quick

Run

Why does this happen?

PHP and PDO

Ask the database

Actual solution, takes time and work

Better`Display process list`