java basic tutorial Column ensures the consistency of double writing between cache and database
Please raise your head, my princess, or the crown will fall off.
Distributed cache is an indispensable component in many distributed applications. However, when using distributed cache, it may involve double storage and double writing of cache and database. You only need to double write. , there will definitely be data consistency problems, so how do you solve the consistency problem?
Cache Aside Pattern
The most classic cache database read and write pattern is the Cache Aside Pattern.
When reading, read the cache first. If there is no cache, read the database, then take out the data and put it into the cache, and return the response at the same time.
When updating, update the database first and then delete the cache.
Why delete the cache instead of updating the cache?
The reason is very simple. In many cases, in complex caching scenarios, the cache is not just the value taken directly from the database.
For example, a field of a certain table may be updated, and then the corresponding cache needs to query the data of the other two tables and perform operations to calculate the latest value of the cache.
In addition, the cost of updating the cache is sometimes very high. Does it mean that every time the database is modified, the corresponding cache must be updated? This may be the case in some scenarios, but for more complex cached data calculation scenarios, this is not the case. If you frequently modify multiple tables involved in a cache, the cache will also be updated frequently. But the question is, will this cache be accessed frequently?
For example, if a field of a table involved in a cache is modified 20 times or 100 times in 1 minute, then the cache will be updated 20 times or 100 times; but this cache will only be updated 20 times or 100 times in 1 minute. It has been read once and has a lot of cold data. In fact, if you just delete the cache, then the cache will only be recalculated once in 1 minute, and the overhead will be greatly reduced. The cache is only used to calculate the cache.
In fact, deleting the cache instead of updating the cache is a lazy calculation idea. Don't re-do complex calculations every time, regardless of whether it will be used, but let it be used when it needs to be used. Recalculate again. Like mybatis and hibernate, they all have the idea of lazy loading. When querying a department, the department brings a list of employees. There is no need to say that every time you query the department, the data of the 1,000 employees in it will also be found at the same time. In 80% of cases, checking this department only requires access to the information of this department. First check the department, and at the same time you need to access the employees inside. Then only when you want to access the employees inside, you will query the database for 1,000 employees.
The most basic cache inconsistency problem and solution
Problem: Modify the database first, and then delete the cache. If deletion of the cache fails, it will result in new data in the database and old data in the cache, resulting in data inconsistency.
Solution: Delete the cache first, and then modify the database. If the database modification fails, then there is old data in the database and the cache is empty, so the data will not be inconsistent. Because there is no cache when reading, the old data in the database is read and then updated to the cache.
Analysis of more complex data inconsistency issues
The data has changed, the cache has been deleted first, and then the database has to be modified, but it has not been modified yet. When a request comes in, I read the cache and find that the cache is empty. I query the database and find the old data before modification and put it in the cache. Then the data change program completes the modification of the database.
It’s over, the data in the database and cache are different. . .
Why does the cache have this problem in a high-concurrency scenario with hundreds of millions of traffic?
This problem may only occur when a piece of data is read and written concurrently. In fact, if your concurrency is very low, especially if the read concurrency is very low, with only 10,000 visits per day, then in rare cases, the inconsistent scenario just described will occur. But the problem is, if there are hundreds of millions of traffic every day and tens of thousands of concurrent reads per second, as long as there are data update requests every second, the above-mentioned database cache inconsistency may occur.
The solution is as follows:
When updating data, route the operation based on the unique identifier of the data and send it to a jvm internal queue. When reading data, if it is found that the data is not in the cache, the data will be re-read and the cache operation will be updated. After routing based on the unique identifier, it will also be sent to the same jvm internal queue.
A queue corresponds to a worker thread. Each worker thread obtains the corresponding operations serially and then executes them one by one. In this case, for a data change operation, the cache is first deleted and then the database is updated, but the update is not completed yet. At this time, if a read request comes and the empty cache is read, the cache update request can be sent to the queue first. At this time, it will be backlogged in the queue, and then it can wait for the cache update to be completed synchronously.
There is an optimization point here. In a queue, it is meaningless to string multiple update cache requests together, so filtering can be done. If it is found that there is already a request to update the cache in the queue, then There is no need to put another update request operation in, just wait for the previous update operation request to be completed.
After the worker thread corresponding to that queue completes the modification of the database for the previous operation, it will perform the next operation, which is the cache update operation. At this time, the latest value will be read from the database. , and then write to the cache.
If the request is still within the waiting time range and continuous polling finds that the value can be obtained, then it will return directly; if the request waits for more than a certain period of time, then this time the current value will be read directly from the database. old value.
In high concurrency scenarios, issues that need to be paid attention to with this solution:
Because the read requests are very slightly asynchronous , so be sure to pay attention to the read timeout issue. Each read request must be returned within the timeout time range.
The biggest risk point of this solution is that the data may be updated frequently, resulting in a large backlog of update operations in the queue, and then a large number of timeouts will occur in the read requests, and finally a large number of requests will be sent directly. database. Be sure to run some simulated real-life tests to see how often the data is updated.
Another point, because there may be a backlog of update operations for multiple data items in a queue, you need to test it according to your own business conditions. You may need to deploy multiple services, and each service will share some data. Update operation. If a memory queue actually squeezes the inventory modification operations of 100 products, and each inventory modification operation takes 10ms to complete, then the read request for the last product may wait for 10 * 100 = 1000ms = 1s before the data can be obtained. , this will lead to long-term blocking of read requests.
Be sure to conduct some stress tests and simulate the online environment based on the actual operation of the business system to see how many update operations the memory queue may squeeze during the busiest times, which may lead to How long will the read request corresponding to the last update operation hang? If the read request returns in 200ms, if you calculate it, even at the busiest time, there will be a backlog of 10 update operations, and the maximum wait time is 200ms, then it is okay.
If there are a lot of update operations that may be backlogged in a memory queue, then you need to add machines to allow the service instances deployed on each machine to process less data. Then the backlog of updates in each memory queue will There will be fewer operations.
In fact, based on previous project experience, generally speaking, the frequency of data writing is very low, so in fact, normally, the backlog of update operations in the queue should be very small. For projects like this that target high read concurrency and read caching architecture, generally speaking, there are very few write requests. It would be good if the QPS per second can reach a few hundred.
Actual rough calculation
If there are 500 write operations per second, if divided into 5 time slices, there will be 100 write operations every 200ms, divided into 20 In the memory queue, each memory queue may have a backlog of 5 write operations. After each write operation performance test, it is usually completed in about 20ms, so the read request for the data in each memory queue will only hang for a while at most, and it will definitely be returned within 200ms.
After the simple calculation just now, we know that the writing QPS supported by a single machine is no problem in the hundreds. If the writing QPS is expanded by 10 times, then the machine will be expanded, and the machine will be expanded by 10 times. Each machine 20 queues.
You must also do a stress test here to ensure that when the above situation happens to happen, there is another risk, that is, a large number of read requests will suddenly occur. The delay of tens of milliseconds hangs on the service. It depends on whether the service can handle it and how many machines are needed to handle the peak of the maximum extreme situation.
But because not all data is updated at the same time, the cache will not expire at the same time, so the cache of a small number of data may be invalid each time, and then the read requests corresponding to those data come over and concurrently The quantity shouldn't be particularly large.
Maybe this service deploys multiple instances, then it must be ensured that requests for data update operations and cache update operations are passed through The Nginx server is routed to the same service instance.
For example, all read and write requests for the same product are routed to the same machine. You can do your own hash routing between services according to a certain request parameter, or you can use Nginx's hash routing function, etc.
If the read and write requests of a certain product are particularly high, and they are all sent to the same queue of the same machine, it may cause The pressure on a certain machine is too high. That is to say, because the cache will only be cleared when the product data is updated, and then read and write concurrency will occur, so it actually depends on the business system. If the update frequency is not too high, the impact of this problem is not particularly large. But it is true that the load of some machines may be higher.
The above is the detailed content of Java implementation ensures the consistency of double writing between cache and database. For more information, please follow other related articles on the PHP Chinese website!