First of all, this application that reversely checks user UID through picture ID has the following requirements:
The query speed must be fast enough
All data must be stored in the memory, preferably an EC2 high-memory model (17GB or 34GB, 68GB is too wasteful)
Support persistence ization, so that there is no need to warm up after the server is restarted
First of all, they rejected the database storage solution, and they maintained the KISS principle (Keep It Simple and Stupid), because this application does not use It does not have the update function, transaction function, related query and other awesome functions of the database, so there is no need to choose and maintain a database for these unused functions.
So they chose Redis. Redis is an in-memory database that supports persistence. All data is stored in memory (forget VM), and the simplest implementation is to use the String structure of Redis. A key-value store will do. Like this:
SET media:1155315 939 GET media:1155315 > 939
Among them, 1155315 is the picture ID and 939 is the user ID. We use each picture ID as the key and the user uid as the value to save it as a key-value pair. Then they conducted a test and stored the data according to the above method. 1,000,000 data will use 70MB of memory, and 300,000,000 photos will use 21GB of memory. Compared with the budget of 17GB, it is still overspending.
(NoSQLFan: In fact, we can see an optimization point here. We can remove the same media in front of the key value and only store the numbers. This will reduce the length of the key and reduce the memory overhead of the key value [ Note: The key value of Redis will not be converted from string to number, so what is saved here is only the overhead of the 6 bytes of media:]. After experiments, the memory usage will be reduced to 50MB, and the total memory usage is 15GB , it meets the needs, but subsequent improvements of Instagram are still necessary)
So the developers of Instagram asked Pieter Noordhuis, one of the developers of Redis, about the optimization plan, and the reply was to use the Hash structure. The specific method is to segment the data and use a Hash structure to store each segment. Since the Hash structure will compress and store a single Hash element when it is less than a certain number, it can save a lot of memory. This does not exist in the above String structure. The "hash-zipmap-max-entries" parameter in the configuration file controls a certain number. After experiments by developers, when hash-zipmap-max-entries is set to 1000, the performance is better. After exceeding 1000, the HSET command will cause the CPU consumption to become very large.
So they changed the plan and stored the data in the following structure:
HSET "mediabucket:1155" "1155315" "939" HGET "mediabucket:1155" "1155315" > "939"
By taking the first four digits of the 7-digit picture ID as the key value of the Hash structure, it ensures that each Hash internal It only contains 3-digit keys, which is 1,000.
After conducting another experiment, it was found that only 16MB of memory was consumed for every 1,000,000 keys. Total memory usage has also been reduced to 5GB, which meets application requirements.
(NoSQLFan: Similarly, we can still optimize here. The first is to change the key value of the Hash structure into a pure number, so that the key length is reduced by 12 bytes. The second is to change the key value in the Hash structure. The subkey value becomes three digits, which reduces the overhead by 4 bytes, as shown below. After experimentation, the memory usage will be reduced to 10MB, and the total memory usage is 3GB)
HSET "1155" "315" "939" HGET "1155" "315" > "939"
The above is the detailed content of How Redis saves memory. For more information, please follow other related articles on the PHP Chinese website!