The Redis hybrid storage product is a hybrid storage product independently developed by Alibaba Cloud that is fully compatible with the Redis protocol and features.
By storing part of the cold data on the disk, while ensuring that most access performance does not decrease, user costs are greatly reduced and the memory limit on the data volume of a Redis single instance is exceeded.
Among them, the identification and exchange of hot and cold data are key factors in the performance of hybrid storage products.
In Redis hybrid storage, the ratio of memory to disk is freely selectable by the user:
Redis Hybrid storage instances consider all Keys as hot data, ensuring that the performance of all Key access requests is efficient and consistent at the expense of a small amount of memory. For the Value part, when there is insufficient memory, the instance itself will select part of the value based on recent access time, access frequency, Value size and other dimensions as cold data and asynchronously store it on the disk in the background until the memory is less than the specified threshold.
In the Redis hybrid storage instance, we consider all Keys as hot data and save them in memory for the following two considerations:
Key access Frequency is much higher than Value.
As a KV database, normal access requests need to first search for the Key to confirm whether the Key exists. To confirm that a key does not exist, you need to check the set of all Keys in some form. Retaining all key values for in-memory data structures can ensure that the search speed is exactly the same as that of pure memory data structures.
The size ratio of Key is very low.
In a general business model, even if it is an ordinary string type, its Value is generally several times larger than the Key. For collection objects such as Set, List, Hash, etc., the value formed by the sum of all members is several orders of magnitude larger than the key.
Therefore, there are two main applicable scenarios for Redis hybrid storage instances:
Uneven data access and the existence of hotspot data;
The memory is not enough to store all the data, and the Value is large (relative to the Key)
When there is insufficient memory, the instance will calculate the weight of the value based on recent access time, access frequency, value size and other dimensions, store the value with the lowest weight on the disk and delete it from the memory.
The pseudo code is as follows:
In the most ideal situation, we would like to be able to accurately calculate the current lowest value. However, the hot and cold degree of a value changes dynamically according to the access situation, and the time consumption of recalculating the hot and cold weights of all values every time is completely unacceptable.
When the memory is full, Redis itself will eliminate data according to the elimination strategy set by the user, and writing hot data from memory to disk can also be considered an "elimination" process. Considering performance, accuracy and user understanding, we use an approximate calculation method similar to Redis when identifying hot and cold data. We support multiple strategies and reduce CPU and memory consumption by randomly sampling a small part of the data, and utilize sampling through the eviction pool. historical information to help improve accuracy.
The schematic diagram of the hit rate of Redis's approximate elimination algorithm is shown under different versions and configurations of different numbers of sampling samples. Data points that have been eliminated are colored light gray, data points that have not been eliminated are gray, and data points added during the test are colored green.
Redis mixed storage hot and cold data exchange process is completed in the background IO thread.
Hot data->Cold data
Asynchronous method:
The main thread generates data when the memory is close to the maximum A series of data swapping tasks;
The background thread executes these data swapping tasks and notifies the main thread after completion;
The main thread updates Release the value in the memory and update the value in the data dictionary in the memory to a simple meta-information;
Synchronization method:
When writing If the incoming traffic is too large, the asynchronous method cannot swap out the data in time, which may cause the memory to exceed the maximum specification. The main thread will directly perform the data swapping task to achieve the purpose of current limiting in disguise.
Cold data->Hot data
Asynchronous method:
The main thread first judges the command before executing it Whether all the values involved are in memory;
If not, generate a data loading task, suspend the client, and the main thread continues to process other client requests;
The background thread performs the data loading task and notifies the main thread after completion;
The main thread updates the value in the data dictionary in memory and wakes up the previously suspended client. end, processing its request.
Synchronization method:
In the Lua script, during the specific command execution phase, if a value is found to be stored on the disk, the main thread will execute it directly Data loading tasks ensure that the semantics of Lua scripts and commands remain unchanged.
The above is the detailed content of How to realize the identification and exchange of hot and cold data in Redis. For more information, please follow other related articles on the PHP Chinese website!