|
The above two situations are problems that may arise when multiple transactions operate on a piece of data at the same time. The operation of a certain transaction may be overwritten, resulting in data loss.
LBCC solves data loss
LBCC, Lock Based Concurrency Control.
Using the lock mechanism, when the current transaction needs to modify the data, the current transaction is locked. Only one transaction is allowed to modify the current data at the same time, and other transactions must wait for the lock to be released. operate.
MVCC solves data loss
MVCC, multi-version concurrency control, Multi-Version Concurrency Control.
Use versions to control data problems in concurrency situations. When transaction B starts to modify the account and the transaction is not submitted, when transaction A needs to read the account balance, transaction B will be read at this time. Modify the copy data of the account balance before the operation, but if transaction A needs to modify the account balance data, it must wait for transaction B to commit the transaction.
MVCC enables database reading without locking data, and ordinary SELECT requests without locking, improving the concurrent processing capabilities of the database. With the help of MVCC, the database can implement isolation levels such as READ COMMITTED and REPEATABLE READ. Users can view the previous or previous historical versions of the current data, ensuring the I feature (isolation) in ACID.
InnoDB’s MVCC implementation logic
The MVCC data saved by the InnoDB storage engine
InnoDB’s MVCC saves two records after each row Hidden columns are implemented. A transaction ID (DB_TRX_ID) that saves the row, and a rollback pointer (DB_ROLL_PT) that saves the row . Every time a new transaction is started, a new transaction ID will be automatically incremented. At the beginning of the transaction, the transaction ID will be placed in the row transaction ID affected by the current transaction. When querying, the current transaction ID needs to be compared with the transaction ID recorded in each row.
Let’s take a look at how MVCC operates under the REPEATABLE READ isolation level.
SELECT
InnoDB will check each row of records according to the following two conditions:
InnoDB only looks for versions earlier than the current one The transaction version of the data row (that is, the transaction number of the row is less than or equal to the transaction number of the current transaction), which ensures that the rows read by the transaction either already exist before the transaction starts, or have been inserted or modified by the transaction itself of.
The deleted rows need to be judged by the transaction ID, and the version of the state before the transaction is read. Only records that meet the above two conditions can be returned as query results.
INSERT
InnoDB saves the current transaction number as the row version number for each newly inserted row.
DELETE
InnoDB saves the current transaction number as the row deletion identification for each deleted row.
UPDATE
InnoDB inserts a new row of records, saves the current transaction number as the row version number, and saves the current transaction number to the original row as the row deletion identifier.
Save these two additional transaction numbers so that most read operations can be done without locking. This design makes the data reading operation very simple, the performance is very good, and it also ensures that only rows that meet the standards are read. The disadvantages are that each row of records requires additional storage space, more row checking, and some additional maintenance work.
MVCC only works under two isolation levels: REPEATABLE READ and READ COMMITIED. The other two isolation levels are incompatible with MVCC because READ UNCOMMITIED always reads the latest data row, not the data row that conforms to the current transaction version. SERIALIZABLE will lock all rows read.
The implementation of MVCC in mysql relies on undo log and read view.
undo log
According to different behaviors, undo log is divided into two types: insert undo log and update undo log
# The undo log generated during the insert operation, because the insert operation is recorded only for the current transaction itself. , this record is not visible to other transactions, so insert undo log can be deleted directly after the transaction is committed without performing a purge operation.
The main task of purge is to delete the data that has been marked del in the database. In addition, it will also recycle undo pages in batches
The initial state of the data when inserting the database:
-
update undo log:
The undo log generated during the update or delete operation. Because it will affect existing records, in order to provide the MVCC mechanism, the update undo log cannot be deleted when the transaction is submitted. Instead, it is placed on the history list when the transaction is submitted, waiting for the purge thread to perform the final deletion operation.
When the data is modified for the first time:
When another transaction modifies the current data for the second time:
In order to ensure that no conflicts occur when writing respective undo logs during concurrent transaction operations, InnoDB uses rollback segments to maintain concurrent writing and persistence of undo logs. The rollback segment is actually a way of organizing Undo files.
ReadView
For RU(READ UNCOMMITTED) isolation level, all transactions can directly read the latest value of the database, and SERIALIZABLE isolation level, all requests will be locked and executed synchronously . So in these two cases, there is no need to use version control of Read View.
For RC(READ COMMITTED) and RR(REPEATABLE READ) the isolation level is implemented through the above version control. The core processing logic under the two isolation sectors is to determine which version among all versions is visible to the current transaction. In order to solve this problem, InnoDB added the ReadView design to the design. ReadView mainly contains the active read and write transactions in the current system, and puts their transaction IDs into a list. , we name this list m_ids.
Judgment logic for whether the version chain data is visible during query:
If the trx_id attribute value of the accessed version is less than the smallest transaction id in the m_ids list, it indicates that generation This version of the transaction has been committed before generating ReadView, so this version can be accessed by the current transaction.
If the trx_id attribute value of the accessed version is greater than the largest transaction id in the m_ids list, it indicates that the transaction that generated this version was generated after the ReadView was generated, so this version cannot be used by the current transaction access.
If the trx_id attribute value of the accessed version is between the largest transaction id and the smallest transaction id in the m_ids list, then you need to determine whether the trx_id attribute value is in the m_ids list. If it is, it means that the transaction that generated this version when ReadView was created is still active, and this version cannot be accessed; if not, it means that the transaction that generated this version when ReadView was created has been committed, and this version can be accessed.
For example:
READ COMMITTED ReadView under isolation level
Before each time reading data All generate a ReadView (m_ids list)
Time |
Transaction 777 | Transaction 888 |
Trasaction 999 |
##T1 | begin; |
|
|
T2 |
| begin; | begin; |
T3 | UPDATE user SET name = 'CR7' WHERE id = 1; |
|
|
T4 |
##... |
|
|
T5
UPDATE user SET name = 'Messi' WHERE id = 1; |
|
SELECT * FROM user where id = 1; |
|
T6
#commit; |
|
|
| ##T7
|
UPDATE user SET name = 'Neymar' WHERE id = 1;
|
|
| T8
|
|
SELECT * FROM user where id = 1;
|
| T9
| ##UPDATE user SET name = 'Dybala' WHERE id = 1;
|
|
T10 |
| commit;
|
|
##T11
|
|
SELECT * FROM user where id = 1; |
|
Here is an analysis of the ReadView in the above situation
The SELECT statement at time point T5:
The version chain at the current time point:
This When the SELECT statement is executed, the current data version chain is as above. Because the current transaction 777 and transaction 888 have not been submitted, the list of ReadView of active transactions at this time m_ids: [777, 888], Therefore, the query statement will be based on the largest version data in the current version chain that is less than m_ids, that is, Mbappe is queried.
The SELECT statement at time point T8:
The version chain situation at the current time:
The SELECT statement is executed at this time, and the version of the current data The chain is as above, because the current transaction 777 has been submitted, and the transaction 888 has not been submitted, so the list of ReadView of active transactions at this time m_ids: [888] , so the query statement will be based on the current version of the chain that is less than The largest version data in m_ids, that is, Messi is queried.
SELECT statement at time point T11:
Version chain information at the current time point:
The SELECT statement is executed at this time, and the current data The version chain is as above. Because the current transaction 777 and transaction 888 have been submitted, the ReadView list of active transactions at this time is empty, so the query statement will directly query the latest data of the current database, that is, Dybala is queried.
Summary: A transaction using READ COMMITTED isolation level will generate an independent ReadView at the beginning of each query.
ReadView under REPEATABLE READ isolation level
Generates a ReadView (m_ids list) when reading data for the first time after the transaction starts
Time |
Transaction 777 |
Transaction 888 |
Trasaction 999 |
T1 |
begin; |
|
|
T2 |
|
begin; |
begin; |
T3 |
UPDATE user SET name = 'CR7' WHERE id = 1; |
|
|
##T4 |
| ... |
|
T5 | UPDATE user SET name = 'Messi' WHERE id = 1; |
##SELECT * FROM user where id = 1; |
|
T6
commit; |
|
|
| ##T7
|
UPDATE user SET name = 'Neymar' WHERE id = 1;
|
|
##T8 |
|
| SELECT * FROM user where id = 1;
|
T9 |
| UPDATE user SET name = 'Dybala' WHERE id = 1;
|
|
T10 |
| commit;
|
|
##T11
|
|
## SELECT * FROM user where id = 1;
|
| #SELECT statement at time point T5: | Current version chain:
A ReadView is generated when the select statement is currently executed. At this time, the content of m_ids is: [777,888], so the data queried based on the visible version of ReadView is Mbappe.
SELECT statement at time point T8: Current version chain:
At this time in the current Transaction 999 transaction. Since ReadView has been generated at time point T5, ReadView will only be generated once in the current transaction, so the m_ids at T5 are still used at this time: [777,999] , so the query data at this time is still Mbappe .
SELECT statement at time point T11: Current version chain:
The situation at this time is exactly the same as T8. Since ReadView has been generated at time point T5, ReadView will only be generated once in the current transaction, so the m_ids at T5 are still used at this time: [777,999] , so the query data at this time is still Mbappe .
MVCC summary: The so-called MVCC (Multi-Version Concurrency Control, multi-version concurrency control) refers to the use of READ COMMITTD , REPEATABLE READ These two isolation level transactions access the recorded version chain when performing ordinary SEELCT operations, so that read-write, Write-Read Operations are executed concurrently to improve system performance. Recommended learning: mysql video tutorial
|