There are also many introductions to the MySQL transaction model on the Internet. Before writing this article, I also read a lot of information as a reference, hoping to make my understanding more in-depth and comprehensive. After reading most of the introductory articles, I found that some of them are incomplete. For example, some only introduce the performance of MySQL under several isolation levels, and do not explain it from a technical perspective. Some articles are very comprehensive, but lack some organization and are not easy to understand. This is what the author hopes to bring you something different, to explain it from a technical perspective, and to facilitate understanding.
Transaction atomicity requires that a series of operations in the transaction must be completed completely or no operation must be performed, and cannot be performed only half of the time. Atomicity is easy to implement for atomic operations, just like the atomicity implementation of row-level transactions in HBase is relatively simple. But for a transaction consisting of multiple statements, if an exception occurs during the execution of the transaction, if atomicity needs to be ensured, the only option is to roll back to the state before the transaction started, as if the transaction had never happened at all. How to achieve this?
MySQL's implementation of rollback operations completely relies on undo log. One more thing, undo log is used in MySQL not only to achieve atomicity guarantees, but also to implement MVCC, which will be covered below. Use undo to achieve atomicity. Before operating any data, the data before modification will first be recorded in the undo log, and then the actual modification will be made. If an exception occurs and a rollback is required, the system can use the backup in undo to restore the data to the state before the transaction started. The following figure is the basic data structure representing transactions in MySQL. The fields related to undo are insert_undo and update_undo, which respectively point to the undo log generated by this transaction.
Transaction rollback finds the corresponding undo log based on update_undo (or insert_undo) and performs the reverse operation. For the data that has been marked for deletion, clean up and delete the mark, and directly roll back the update for the updated data; the insertion operation is slightly more complicated, not only need to delete the data, but also need to delete the related clustered index and secondary index records.
undo log is a very important piece of content in the MySQL kernel. It involves a lot of knowledge and is complex, such as:
1. The undo log must be persisted before the data is modified. The persistence of the undo log requires No need to record redo to prevent outage exceptions? If necessary, it will involve downtime recovery...
2. How to implement MVCC through undo log?
3. In what scenarios can those undo logs be recycled and cleaned? How to clean it?
The write concurrency implementation mechanism is no different from that of HBase, which uses a two-phase lock protocol to add row locks to the corresponding records. However, the row lock mechanism in MySQL is relatively complex. There are various locking situations depending on whether the row record is a primary key index, a unique index, a non-unique index, or no index.
1. If the id column is a primary key index, MySQL will only lock clustered index records.
2. If the id column is the only secondary index, MySQL will lock the secondary index leaf nodes and clustered index records.
3. If the id column is a non-unique index, MySQL will lock all secondary index leaf nodes that meet the condition (id = 15) and the corresponding clustered index records.
4. If the id column is unindexed, SQL will perform a clustered index full table scan and load the scan results to the SQL Server layer for filtering. Therefore, InnoDB will first add Lock, if the SQL Server layer filter does not meet the conditions, InnoDB will release the lock. Therefore, InnoDB will lock all the records scanned, which is scary!
Whether it is RC, RR, or Serialization, the above mechanism is used for write concurrency control, so I won’t go into details. Next, we will focus on analyzing the read and write concurrency control mechanisms in the RC and RR isolation levels.
Before introducing RC and RR in detail, it is necessary to introduce the MVCC mechanism in MySQL first, because both RC and RR use the MVCC mechanism to achieve read and write concurrency between transactions. It’s just that there are some differences between the two in implementation details. The specific differences will be discussed next.
MVCC in MySQL
The first four columns are the actual column values recorded in the row. What needs to be focused on are the two hidden columns DB_TRX_ID and DB_ROLL_PTR (invisible to the user). Among them, DB_TRX_ID represents the transaction ID of the transaction that modifies the row, and DB_ROLL_PTR represents the pointer to the rollback segment of the row. All version data recorded on this row are organized in the form of a linked list in undo. This value actually points to the history of the row in undo. Record linked list.
Now assume that there is a transaction trx2 that modifies the row of data, and the row record will change to the following figure. DB_TRX_ID is the transaction ID (trx2) of the transaction that recently modified the row, and DB_ROLL_PTR points to the undo history record linked list:
After understanding the MySQL row records, let’s take a look at the basic structure of the transaction. The following figure is the transaction data structure of MySQL, which we mentioned above. After the transaction is started, a data structure will be created to store transaction-related information, lock information, undo log and very important read_view information.
read_view saves the list of all active transactions in the entire MySQL when the current transaction is started, as shown in the figure below. When the current transaction is started, the active transactions in the system include trx4, trx6, trx7 and trx10. In addition, up_trx_id represents the smallest transaction ID in the current transaction list when the current transaction is started; low_trx_id represents the largest transaction ID in the current transaction list when the current transaction is started.
read_view is a key point in implementing MVCC. It is used to determine which version of the record is visible to the current transaction. If the current transaction wants to read a certain row of records, and the version number (transaction ID) of the row record is trxid, then:
1. If trxid
2. If trxid > low_trx_id, it means that the transaction in which the row is located was opened after the current transaction was created, so the row record is not visible to the current transaction.
3. If up_trx_id
Take the following row record as an example. There are multiple versions of this row record (trx2, trx5, trx7 and trx12), among which trx12 is the latest version. See which version of the row is visible to the current transaction.
1. The latest version recorded in this row is trx12. Comparing it with the current transaction read_view, it is found that trx12 is greater than the largest transaction trx10 in the current active transaction list, which means that trx12 was opened after the current transaction was created, so Invisible.
2. Check that the second latest version of the row record is trx7. Compared with the current transaction read_view, it is found that trx7 is between the minimum transaction ID and the maximum transaction ID of the current active transaction list, indicating that the row record The transaction was active when the current transaction was created. Traversing the active list found that trx7 does exist, indicating that the transaction has not yet been submitted, so it is not visible to the current transaction.
3. Continue to view the third latest version of the record, trx5, which is also between the minimum transaction ID and the maximum transaction ID of the current active transaction list, indicating that the transaction in which the row record is located was in the current transaction when the current transaction was created. Active status, but traversal found that this version is not in the active transaction list, indicating that the transaction corresponding to trx5 has been submitted (Note: There is no correlation between transaction submission time and transaction number. It is possible that transactions with larger transaction numbers are submitted first, and transactions with smaller transaction numbers are submitted later. commit), so the trx5 version row record is visible to the current transaction and is returned directly.
Introduced above The MVCC technology implementation mechanism in MySQL, but to understand the transaction visibility under the RC isolation level, you also need to get a core point: transactions under the RC isolation level will generate a latest read_view to replace the original read_view every time select is executed.
#As shown in the figure above, the left side is transaction No. 1, and the record with id=1 was queried three times at different points in time. On the right is transaction No. 2, which updates the record with id=1. Before the update, there was only one version of the record, but after the update, it became two versions.
Transaction No. 1 will generate a latest read_view every time it executes a select request under the RC isolation level. The global transaction active list generated by the first two queries contains trx2, so the record found according to MVCC regulations is old. Version; the time point of the last query is after the submission of transaction No. 2, so the generated global active transaction list does not include trx2. At this time, the record found according to the MVCC regulations is the latest version record.
Different from RC mode, transactions in RR mode The latest read_view will not be generated every time a select is executed, but the read_view will be generated when the transaction is selected for the first time, and will not be changed again until the current transaction ends. This can effectively avoid non-repeatable reads and make the data read by the current transaction consistent throughout the transaction process. The schematic diagram is as follows:
This is easy to understand. The global active transaction list used in the three queries is the same, and they are all read_view generated for the first time. After that The records found must be consistent with the records found for the first time.
If you don’t know about phantom reading, you can refer to the first article in this series. As shown in the figure below, transaction No. 1 performed three queries on the filter condition for id>1, and transaction No. 2 performed an insertion. The inserted record just met the condition of id>1. It can be seen that the data obtained by the three queries are consistent, which is guaranteed by the MVCC mechanism of the RR isolation level. From this point of view, phantom reading is avoided. However, when the last transaction No. 1 inserts a record at id=2, MySQL will return a Duplicate entry error. It can be seen that avoiding phantom reading is an illusion.
For all RR-level select statements mentioned before, we Called snapshot read, snapshot read can guarantee non-repeatable reads, but it cannot avoid phantom reads. So MySQL proposed the concept of "current read". Common current read statements are:
1. select for update
2. select lock in share mode
3. update / delete
and stipulates that the current read statement at the RR level will add a special lock to the record - Gap lock. Gap lock does not lock a specific record, but locks the interval between records to ensure that no new records will be inserted in this interval. The following figure is a schematic diagram:
In the above figure, transaction No. 1 first executes a currently read select statement. This statement will add all intervals with id > 0. Gap lock, when transaction No. 2 executes the insertion at id = 3, the system will return a Lock wait timeout execcded exception. Of course, other transactions can insert successfully under the condition of id
In fact, the real data writing of MySQL is divided into two writes, one is written to a file called DoubleWrite place, after the writing is successful, the data is actually written to the disk where it is located. Why write it twice? This is because the MySQL data page size is inconsistent with the size of an atomic operation on the disk, and partial writes may occur. For example, the default InnoDB data page size is 16K, and the size of an atomic write on the disk is 512 bytes (sector size) , writing a data page requires multiple IOs, so if an exception occurs in the middle, data will be lost. In addition, it should be noted that DoubleWrite performance will not have a big impact, because writing to DoubleWrite is sequential writing, which does not have a great impact on performance.
2. redolog persistence strategy (innodb_flush_log_at_trx_commit)redolog is the WAL of InnoDB. The data is first written to redolog and dropped to disk, and then updated to the bufferpool. The persistence strategy of redolog is consistent with the persistence strategy of hlog in HBase. The default is 1, which means that the log will be persisted to the disk after each transaction is submitted; the value of 0 means that the asynchronous thread will persist it to disk every 1 second or so. Disk, in this case MySQL may be down and some data may be lost. The value of 2 means that after each transaction is submitted, the log will be flushed to the operating system buffer, and then the operating system will flush it to the disk asynchronously. In this case, MySQL will not lose data if it goes down, but some data may be lost if the machine goes down. data.
3. Binlog persistence strategy (sync_binlog) As a log system at the server layer, binlog mainly records various operations of the database sequentially in the form of events, and can also record the time spent on each operation. The MySQL official documentation mainly introduces the two most basic core functions of Binlog: backup and replication. Therefore, the persistence of Binlog will affect the integrity of data backup and replication to a certain extent. The same as the redo persistence strategy, the possible values are 0, 1, N. The default is 0, which means writing to the operating system buffer and flushing to disk asynchronously. A value of 1 indicates synchronous writes to disk. If N, it means that a refresh operation is performed every N times the operating system buffer is written. To summarize, this article is the third in a series of database transaction articles. It introduces the core of MySQL's single-machine cross-row transaction model, and provides a more detailed explanation of the lock technology and MVCC mechanism involved in isolation. The related features such as transaction atomicity and durability are also briefly analyzed and explained. Next, the author will take you to talk about the distributed transaction model and see how it is different from the stand-alone transaction model. Recommended learning: MySQL tutorial
The above is the detailed content of MySQL cross-row transaction model (detailed graphic explanation). For more information, please follow other related articles on the PHP Chinese website!