The content of this article is a detailed explanation of distributed transactions with pictures and texts. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
This time I learned some distributed transaction knowledge while using the distributed transaction framework, so in this article we will talk about distributed transactions. First, let’s review what a transaction is.
Transaction
What is a transaction? This is a back-end development. As long as there is interaction with the database in daily development, transactions will definitely be used. Now excerpt an explanation from wiki to explain what a transaction is.
is a logical unit in the execution process of the database management system, consisting of a limited sequence of database operations.
The database system has transaction characteristics, which is an important characteristic that distinguishes it from the file system. In a traditional file system, if a file is being written and the operating system suddenly crashes, the file may be damaged. The database system introduces transaction features to ensure that the database changes from one state to another. When submitting your work, you can be sure that either all changes are saved or none are saved.
Usually a transaction consists of multiple read and write operations.
Transactions have four basic characteristics, commonly known as ACID.
A(Atomicity): Atomicity. The transaction will be treated as a whole. Either all statements succeed or all fail. There cannot be a situation where some statements succeed and some fail.
C(Consistenc): Consistency. When the state of the database changes from one state to another, the database integrity constraints remain unchanged before the transaction starts and after the transaction ends. What does database integrity constraint mean? For example, if the name field of a table is a unique constraint, if the name field becomes non-unique after the transaction is committed or rolled back, this will destroy the integrity constraint of the database.
I(Isolation): Isolation. Multiple concurrent transactions are executed without affecting each other.
D(Durability): Durability. After the transaction is committed, its modifications to the database can be permanently saved in the database. Therefore, this feature requires the database system to be able to submit data without losing it when it needs to be restored when it crashes.
Therefore, in the early days, our system only had one data source. At this time, we could rely on database system transactions to ensure the correctness of the business.
However, as the business continues to expand, a single table of our business may contain tens of millions of data. When using another database instance, there may be performance issues. At this time we will consider sub-databases and tables. However, this may lead to a single application connecting to multiple data sources. See the example below.
In the above purchase process, the merchant balance table and the user balance table are in two separate database instances, so that separate transactions can ensure deductions. Deduction of merchant balance or user balance either succeeds or fails. But we cannot guarantee that both transactions will succeed or fail at the same time.
There is another situation. As the system becomes larger and larger, we will choose to split the system application into multiple microservices, so that a single application can only operate one data source. At this time, we will encounter a situation where one business call will call multiple applications, and each application operates the data source independently, as shown below.
In this case we cannot guarantee that all calls will be successful.
We can see from the above example that with the development of business, traditional stand-alone transactions can no longer meet the needs of our business. At this time, we need distributed transactions to ensure it.
Extract an excerpt from the wiki explanation.
A distributed transaction is a database transaction in which two or more network hosts are involved.
Let’s first talk about some theoretical basis for implementing distributed transactions.
CAP theorem. In a distributed system (a collection of nodes that are connected to each other and share data), when it comes to read and write operations, only Consistence, Availability, and Partition Tolerance can be guaranteed. Of the two, the other must be sacrificed.
Excerpt from Geek Time Learning Architecture from 0 Chapter 22 Explanation
Although the theoretical definition of CAP is that only two of the three elements can be taken, but when thinking about it in a distributed environment, we will find that the P (partition tolerance) element must be selected because the network itself cannot achieve 100% It is reliable and may fail, so partitioning is an inevitable phenomenon. If we choose CA and abandon P, then when partitioning occurs, in order to ensure C, the system needs to prohibit writing. When there is a write request, the system returns an error (for example, the current system does not allow writing), which conflicts with A, because A requires that no error be returned. and no timeout. Therefore, it is theoretically impossible to choose CA architecture for distributed systems, and can only choose CP or AP architecture
BASE theory, which are the abbreviations of the following three words.
Basically Available: When a distributed system fails, it allows the loss of some available functions to ensure that core functions are available.
Soft state (soft state): Allows the existence of intermediate states in the system. This state does not affect system availability. This refers to inconsistencies in CAP.
Eventually consistent (eventually consistent): Eventually consistent means that after a period of time, all node data will be consistent.
BASE is a supplement to the AP scheme in CAP. Use soft state and eventual consistency in BASE to ensure delayed consistency. BASE and ACID are opposite. ACID is a strong consistency model, but BASE sacrifices this strong consistency, allowing data to be inconsistent in a short period of time and eventually consistent.
Next let’s take a look at the implementation options for distributed transactions.
Based on database resource level
2PC two-phase submission protocol
3PC three-phase submission protocol
Based on the business level
TCC
Based on the database resource level implementation plan, since there are multiple transactions, we need to have a role to manage the status of each transaction. We call this role the coordinator, and the transaction participants are called participants. Participants and coordinators are generally based on a specific protocol, currently the more famous one is the XA interface protocol. Based on the ideological settings of the coordinator and participants, 2PC and 3PC were proposed to implement XA distributed transactions.
As the name knows, this process is mainly divided into two steps.
In the first phase, the coordinator (transaction manager) will pre-submit the transaction involved. At this time, the database resources begin to be locked. Participants write undo and redo to the transaction log.
In the second phase, the participant (resource manager) commits the transaction or uses the undo log to roll back the transaction and release resources.
The whole process is as shown below.
Distributed transaction submission success scenario:
Distributed transaction rollback scenario:
The advantages of this solution are: relatively simple implementation, supported by mainstream databases, and strong consistency. MySQL 5.5 and later will be implemented based on the XA protocol.
The corresponding solution also has shortcomings:
The single point problem of the coordinator. If the coordinator goes down during the submission phase and the participants are waiting, the resources will be locked and blocked. Although it is possible to re-elect the coordinator, this does not solve the problem.
The synchronization blocking time is too long. The entire execution process transaction is blocked until the submission is completed and resources are released. If during the submission process/rollback process, due to network delay, the participants have been blocked. If no instructions are received, the participant remains blocked.
The data is inconsistent. In the second stage, when the coordinator sends the first commit signal and then crashes, the first participant submits the transaction, and the second participant cannot commit the transaction because it has not received the coordinator signal.
So based on the shortcomings of 2PC, an improvement plan was proposed, 3PC.
Three-phase submission, based on the two-phase submission, improves the two phases. The three-stage steps are as follows.
CanCommit, the coordinator asks the participant whether the transaction can be committed.
PreCommit, if all participants can commit the transaction, the coordinator issues the PreCommit command, and the participants lock the resources and wait for the final command.
#All participants return confirmation information, and the coordinator issues transaction execution notifications to each transaction, locks resources, and returns the execution status.
Some participants returned denial information or the coordinator timed out. In this case, the coordinator believes that the transaction cannot be executed normally, issues an interrupt command, and each participant exits the preparation state
Do Commit, if all responses to ack in the second phase are issued, Do Commit will be issued for final submission of the transaction. Otherwise, an interrupt transaction command will be issued, and all participants will roll back the transaction.
All participants execute the transaction normally, and the coordinator issues the final commit instruction to release the locked resources.
Some participants failed to execute the transaction, the coordinator timed out, and the coordinator issued a rollback command to release the locked resources.
See the picture below for details.
Three-phase submission compares with two-phase, introducing a timeout mechanism to reduce transaction blocking and solve single points of failure. In the third phase, once the participant fails to receive the coordinator signal, after waiting for timeout, the participant executes commit by default and releases resources.
The three stages still cannot solve the data consistency problem. If the coordinator issues a rollback command, but due to network problems, the participants cannot receive it within the waiting time. At this time, the participants commit the transaction by default, and other transactions are rolled back, resulting in transaction inconsistency.
TCC Transaction
In order to solve the problem of large-granularity resource locking during transaction running, the industry proposes a new transaction model, which is based on the business level transaction definition. The lock granularity is completely controlled by the business itself. It is essentially a compensation idea. It divides the transaction running process into two stages: Try and Confirm/Cancel. The logic at each stage is controlled by business code. In this way, the lock granularity of the transaction can be completely controlled freely. Businesses can achieve higher performance at the expense of isolation.
TCC is the abbreviation of Trying, Confirm and Cancel respectively. Unlike 2PC and 3PC which are based on the database level, TCC is based on the application level.
TCC three actions are:
Trying:
Complete all business checks (consistency)
Reserve necessary business resources (quasi-isolation)
Confirm:
Really execute the business
Confirm operation must satisfy idempotence
But such a payment process calls multiple sub-services, we cannot guarantee that all services can be successful, for example, when we call the red envelope system to deduct the red envelope system fail. At this time, we encountered an embarrassing scene. Due to the failure of the red envelope service, the method exited abnormally. At this time, the order status was the initial status, but the user balance had been deducted. This is very unfriendly to the user experience. Therefore, during this payment process, we must have a mechanism to treat this process as a whole behavior, and we must ensure that all service calls in this process either succeed or fail, and become a whole transaction.
At this time we can introduce the TCC transaction and treat the entire order process as a whole. After the introduction, because the balance system deduction failed, we rolled back the order system and red envelope system at this time. The whole process is shown below.
Due to the failure of the balance system, we need to undo all changes in this process, so we send a cancellation notice to the order system and a red envelope system Notice of revocation. So after the system introduces TCC transactions, we need to transform our calling process. How to introduce TCC transactions into the systemAccording to the three steps of TCC transactions, at this time we must transform each service into three steps of Try Confirm Cancle, TCC TRY:
According to the above business, the order system adds a try method to modify the order status to PAYING. The balance system adds a try method, which first checks whether the balance is sufficient, then deducts the balance, and then adds the deducted balance to the frozen amount. The red envelope system is the same as the balance system. It can be seen from the transformation process that the TCC try method needs to check each business resource, and this process needs to introduce intermediate states. Let’s look at the entire process based on the picture below.
TCC Confirm:
TCC Step 1 TRY If all sub-service calls are successful, at this time we need to confirm each Serve. Add confirm method to each service. For example, the confirm method of the balance system is used to set the frozen amount to 0, and the red envelope system is as above. The order system changes the order status to SUCCESS. The confirm method needs to pay attention to achieving idempotence. For example, before updating the order system, you must first determine that the order status is PAYING before you can update the order. The whole process is shown below.
Speaking of which, the TCC transaction framework must be used to promote each service. After the TCC transaction manager senses the end of the TRY method, it automatically calls the confirm method provided by each service to modify the status of each service to the final state.
TCC Cancle:
If the freezing red envelope method fails during the TCC Try process, then we need to undo all previous modifications and modify them to their initial state. The cancel method also needs to be idempotent such as the confirm method as shown below:
Seeing this, we can see that TCC Try is successful and confirm must be Success, try fails, and cancel must succeed. Because confirm is the key to updating the system to the final state. But the reality is so ruthless, there is definitely a chance that confirm or cancel in the production system will fail. In this case, the TCC framework is required to record the result of calling confirm. If the confirm call fails, the TCC framework needs to record it and then call it again at a certain interval.
Summary and Thoughts
After reading the full text, you must basically understand distributed transactions.
We summarize this based on this. To use distributed transactions, we need to apply it in conjunction with our actual scenarios.
If the business is still in the initial stage, we can actually choose database transactions to ensure rapid online iteration.
When the business reaches a certain stage, the system begins to be split, and the database is also split. At this time, if the business needs to ensure consistency, distributed transactions must be used. When using distributed transactions at this time, we need to consider which one to use based on the business.
Using the distributed framework implemented by 2PC or 3PC, the business application layer does not need to be modified, and access is relatively simple. However, the corresponding performance is low and the data resources are locked for a long time. Not suitable for high-concurrency business scenarios such as the Internet.
The use of TCC-based distributed framework has higher performance than 2PC and can ensure the final consistency of data. But for the application layer, one method must be transformed into three methods, and some intermediate states need to be introduced into the business. Relatively speaking, the degree of application transformation is relatively large.
The above is the detailed content of Detailed graphic explanation of distributed transactions. For more information, please follow other related articles on the PHP Chinese website!