As we all know, one of the bottlenecks of computers is IO. In order to solve the problem of mismatch between memory and disk speed, cache is generated to put some hot data in the memory for use at any time. Take it at will, reduce the request connection to the database, and prevent the database from hanging. It should be noted that whether it is breakdown or the penetration and avalanche mentioned later, it is all under the premise of high concurrency, such as when a certain hot key in the cache fails.
There are two main reasons:
1. Key expired;
2. Key is eliminated by page replacement.
For the first reason, in Redis, Key has an expiration time. If the key expires at a certain moment (if the mall is doing activities, starting at zero o'clock), then after zero o'clock, a certain key will expire. All product query requests will be pressed onto the database, causing the database to collapse.
For the second reason, because the memory is limited, new data must be cached all the time and old data must be eliminated. Therefore, in a certain page replacement strategy (illustration of common page replacement algorithms), data must be eliminated , if no one cares about certain products before the event, they will definitely be eliminated.
The normal processing request is as shown in the figure:
Since key expiration is inevitable, high When traffic comes to Redis, according to the single-threaded characteristics of Redis, it can be considered that tasks are executed sequentially in the queue. When the request reaches Redis and finds that the Key has expired, an operation is performed: setting a lock.
The process is roughly as follows:
The request reaches Redis, and it is found that the Redis Key has expired. Check whether there is a lock. If there is no lock, return to the back of the queue.
Set the lock. Note that this should be setnx(), not set(), because other threads may have set the lock.
Get the lock, take When the lock is reached, go to the database to fetch the data, and release the lock after the request returns.
# But it raises a new question. What if the request to get the data hangs after getting the lock? That is, the lock is not released. Other processes are waiting for the lock. The solution is:
Set an expiration time for the lock. If it reaches the expiration time and has not been released, it will be automatically released. The problem comes again. It is easy to say that the lock is hung, but if it is a lock What about timeout? That is, the data is not retrieved within the set time, but the lock expires. The common idea is that the lock expiration time value increases, but it is unreliable because the first request may timeout. If the subsequent request It also times out. After multiple timeouts in a row, the lock expiration time value is bound to be extremely large. This has too many disadvantages.
Another idea is to start another thread and monitor it. If the thread fetching data does not hang up, appropriately delay the expiration time of the lock.
The main reason for penetration is that many requests are accessing data that does not exist in the database. For example, a mall selling books has been requested. To query tea products, since Redis cache is mainly used to cache hot data, data that does not exist in the database cannot be cached. This abnormal traffic will directly reach the database and return "none" query results.
To deal with this kind of request, the solution is to add a layer of filters to the access request, such as Bloom filter, enhanced Bloom filter, and cuckoo filter.
In addition to Bloom filters, you can add some parameter checks. For example, database data IDs are generally increasing. If you request a parameter such as id = -10, it will inevitably be bypassed. Redis, to avoid this situation, can perform operations such as user authenticity verification.
Avalanche is similar to breakdown. The difference is that breakdown means that a hotspot key fails at a certain moment, while avalanche means that a large number of hotspot keys fail in an instant. There are many hotspot keys on the network. Blogs are emphasizing that the strategy to solve the avalanche is to randomize the expiration time. This is very inaccurate. For example, when a bank is doing activities, the interest coefficient was 2% before, but after zero point, the coefficient is changed to 3%. This situation can change the user's interest rate. Will the corresponding key be changed to expire randomly? If past data is used, it is called dirty data.
Obviously not possible, save money too f0c;You save 3 million in interest until the end of the year, and the next door only has 2 million, there is no need to fight, just kidding~
The correct idea is, first of all, you must look at Check whether the key expiration is related to time. If it is not related to time, it can be solved by random expiration time.
If it is related to time point, for example, the bank just mentioned changes a certain coefficient on a certain day, then a strong dependency breakdown solution must be used. The strategy is to update all keys with the past thread first.
While updating the hotspot key in the background, the business layer delays the incoming requests, such as briefly sleeping for a few milliseconds or seconds, to spread the pressure on subsequent hotspot key updates. .
The above is the detailed content of What causes Redis breakdown avalanche and how to solve it. For more information, please follow other related articles on the PHP Chinese website!