This article brings you relevant knowledge about the master-slave delay processing scheme in MySQL. MySQL master-slave replication and read-write separation are common database architectures on the Internet. The most criticized part of this architecture is that in In scenarios where the amount of data is large and the amount of concurrency is large, the master-slave delay will be serious. I hope everyone has to help.
Why is the master-slave delay so large?
Answer: MySQL uses a single thread to replay the RelayLog.
How should we optimize and shorten the replay time?
Answer: Multi-threaded parallel replay of RelayLog can shorten the time.
What are the problems with multi-threaded parallel replay of RelayLog?
Answer: You need to consider how to split the RelayLog so that multiple database instances and multiple threads can replay the RelayLog in parallel without inconsistency.
Why are there inconsistencies?
Answer: If the RelayLog is randomly assigned to different replay threads, assume that there are three serial modification records in the RelayLog:
update account set money=100 where uid=58;
update account set money=150 where uid=58;
update account set money=200 where uid=58;
If single thread serial Replay: It can ensure that the execution sequence of all slave libraries and the master library is consistent.
Voiceover: In the end, the money will be 200.
If multiple threads are randomly assigned to replay: multiple replay threads execute these three statements concurrently, it is uncertain who executes them last, and the final slave database data may be different from the main database.
Voiceover: Multiple slave libraries may have money of 100, 150, 200, not sure.
How to allocate, replay multiple slave libraries and multi-threads, and get consistent data?
Answer: For write operations on the same library, use the same thread to replay the RelayLog; for write operations on different libraries, multiple threads can be used to replay the RelayLog concurrently.
How to do it?
Answer: Design a hash algorithm, hash(db-name) % thread-num, hash the library name and then modulate the number of threads, you can easily do it, the same Write operations on the library are executed serially by the same replay thread.
Voiceover: The playback on different libraries is parallel, which accelerates the playback.
What are the shortcomings of this plan?
Answer: Many companies use "single database with multiple tables" for MySQL. If this is the case, there is still only one database and the replay speed of RelayLog cannot be improved.
Enlightenment: Upgrade the DB architecture model of "single database and multiple tables" to the DB architecture model of "multiple databases and multiple tables".
Voiceover: In Internet business scenarios with large amounts of data and large concurrency, the "multi-database" model also has many other advantages, such as:
(1) Very convenient instance expansion: DBA is very It is easy to extend different libraries to different instances;
(2) Library isolation according to business: business decoupling, business isolation, reducing coupling and mutual influence;
(3) It is very convenient to split microservices: it is convenient for each service to have its own instance;
#In the "single database, multiple tables" scenario, how can multi-threaded parallel replay of RelayLog be optimized?
Answer: Even if there is only one database, transactions are executed concurrently on the main database. Since they can be executed in parallel on the main database, they should also be able to be executed in parallel on the slave database?
New idea: Divide the transactions that are executed in parallel on the main database into a group and number them. The playback of these transactions on the slave database can be executed in parallel (the execution of transactions on the main database all enters prepare phase, indicating that there is no conflict between transactions, otherwise it would be impossible to submit), yes, MySQL does exactly this.
Solution: GTID-based parallel replication.
Starting from MySQL5.7, the information submitted by the group is stored in the GTID. Using the mysqlbinlog tool, you can see the internal information submitted by the group:
20181014 23:52 server_id 58 XXX GTID last_committed=0 sequence_numer=1 20181014 23:52 server_id 58 XXX GTID last_committed=0 sequence_numer=2 20181014 23:52 server_id 58 XXX GTID last_committed=0 sequence_numer=3 20181014 23:52 server_id 58 XXX GTID last_committed=0 sequence_numer=4
Compared with the original log, there are more last_committed and sequence_number.
What is last_committed?
Answer: It is the number of the last transaction submitted when the transaction is submitted. If they have the same last_committed, it means that they are in a group and can be replayed and executed concurrently.
Summary
MySQL parallel replication, the method of shortening the master-slave synchronization delay, embodies some of the following architectural ideas:
Multi-threading is a common way to shorten execution time Methods;
Voiceover: For example, many crontabs can use multi-threading to split data and execute in parallel.
When multi-threads dispatch tasks concurrently, idempotence must be ensured: MySQL provides two methods: "idempotent according to the library" and "idempotent according to commit_id", which are worth learning;
Voiceover : For example, group messages can be idempotent according to group_id; user messages can be idempotent according to user_id.
Specific to the MySQL master-slave synchronization delay:
mysql5.5: Parallel replication is not supported, everyone should upgrade the MySQL version;
mysql5.6: Parallel replication according to the library , it is recommended to use the "multi-database" architecture;
mysql5.7: parallel replication according to GTID;
Recommended learning: mysql video tutorial
The above is the detailed content of Let's talk about the solution for MySQL master-slave delay. For more information, please follow other related articles on the PHP Chinese website!