redis is a database, but unlike traditional databases, redis data is stored in memory, so the read and write speed is very fast, so redis is widely used in the cache direction. Memcached is a high-performance distributed memory cache server. The general purpose of use is to increase the speed and scalability of dynamic web applications by caching database query results and reducing the number of database accesses.
Salvatore Sanfilippo, the author of Redis, once compared these two memory-based data storage systems:
Redis supports server-side data operations: Compared with Memcached, Redis has more data structures and supports richer data operations. Usually in Memcached, you need to get the data to the client to perform similar operations. Make the modifications and then set them back. This greatly increases the number of network IOs and data volume. Compared with general GET/SET, these complex operations are usually equally efficient in Redis. Therefore, if you need the cache to support more complex structures and operations, Redis will be a good choice.
Comparison of memory usage efficiency: If you use simple key-value storage, Memcached's memory utilization is higher, and if Redis uses a hash structure for key-value storage, due to its combination With this type of compression, its memory utilization will be higher than Memcached.
Performance comparison: Since Redis only uses a single core, while Memcached can use multiple cores, on average, Redis has higher performance than Memcached when storing small data on each core. For data of more than 100k, the performance of Memcached is higher than that of Redis. Although Redis has recently been optimized for the performance of storing big data, it is still slightly inferior to Memcached.
Specifically why the above conclusion appears, the following is the collected information:
and Memcached only support simple The data records of the key-value structure are different, and the data types supported by Redis are much richer. The most common data types include String, Hash, List, Set and Sorted Set. Redis uses redisObject objects to represent all keys and values. The most important information of redisObject is shown in the figure:
type represents the specific data type of a value object, and encoding is the way different data types are stored inside redis. For example: type=string represents the value stored An ordinary string, then the corresponding encoding can be raw or int. If it is int, it means that the actual redis internally stores and represents the string according to the numerical class. Of course, the premise is that the string itself can be represented by numerical values, such as :"123″ "456". Only when the virtual memory function of Redis is turned on, the vm field will actually allocate memory. This function is turned off by default.
1) String
Commonly used commands: set/get/decr/incr/mget, etc.;
Application scenarios: String is the most commonly used data type, and ordinary key/value storage is It can be classified into this category;
Implementation method: String is stored inside redis as a string by default, which is referenced by redisObject. When encountering incr, decr and other operations, it will be converted into a numerical type for calculation. At this time The encoding field of redisObject is int.
2) Hash
Common commands: hget/hset/hgetall, etc.
Application scenario: We want to store a User information object data, including user ID, user name, age and birthday. Through the user ID, we hope to obtain the user's name, age or birthday;
Implementation method: Redis's Hash is actually the internally stored Value It is a HashMap and provides an interface for direct access to the Map members. As shown in the figure, Key is the user ID, and value is a Map. The key of this Map is the attribute name of the member, and the value is the attribute value. In this way, the data is Modification and access can be done directly through the Key of its internal Map (the key of the internal Map is called field in Redis), that is, the corresponding attribute data can be operated through the key (user ID) field (attribute label). The current implementation of HashMap is Two ways: When there are relatively few members of HashMap, Redis will use a one-dimensional array-like method to compactly store it in order to save memory, instead of using the real HashMap structure. At this time, the encoding of the redisObject of the corresponding value is zipmap. When the number of members When it is increased, it will automatically be converted into a real HashMap, and the encoding is ht at this time.
3) List
Common commands: lpush/rpush/lpop/rpop/lrange etc.;
Application scenarios: Redis list has many application scenarios, and it is also one of the most important data structures of Redis. For example, twitter's follow list, fan list, etc. can be implemented using Redis's list structure;
Implementation method: The implementation of Redis list is a two-way linked list, which can support reverse search and traversal, making it more convenient to operate. However, it brings some additional memory overhead, many implementations inside Redis, including sending buffer queues, etc. This data structure is also used.
4) Set
Common commands: sadd/spop/smembers/sunion, etc.;
Application scenarios: The external functions provided by Redis set are similar to the functions of list. The special thing is that set can automatically deduplicate. When you need to store a list data and do not want duplicate data to appear, set It is a good choice, and set provides an important interface for judging whether a member is in a set collection, which is also something that list cannot provide;
Implementation method: The internal implementation of set is a value forever A null HashMap is actually quickly sorted out by calculating hash. This is why set can provide a way to determine whether a member is in the set.
5) Sorted Set
Common commands: zadd/zrange/zrem/zcard, etc.;
Application scenarios: Redis sorted set usage scenarios and Set is similar, the difference is that set is not automatically ordered, while sorted set can sort members by providing an additional priority (score) parameter by the user, and is insertion ordered, that is, automatically sorted. When you need an ordered and non-duplicate set list, you can choose a sorted set data structure. For example, Twitter's public timeline can be stored with the publication time as the score, so that it will be automatically sorted by time when retrieved.
Implementation method: Redis sorted set internally uses HashMap and skip list (SkipList) to ensure the storage and ordering of data. HashMap stores the mapping from members to scores, while the skip list stores all The members are sorted based on the score stored in the HashMap. Using the jump table structure can achieve higher search efficiency and is relatively simple to implement.
In Redis, not all data is always stored in memory. This is the biggest difference compared with Memcached. When physical memory runs out, Redis can swap some values that have not been used for a long time to disk. Redis will only cache all key information. If Redis finds that the memory usage exceeds a certain threshold, it will trigger the swap operation. Redis calculates which keys correspond to the value based on "swappability = age*log(size_in_memory)" swap to disk. Then the values corresponding to these keys are persisted to disk and cleared in memory. This feature allows Redis to maintain data that exceeds the memory size of its machine itself. Of course, the machine's memory capacity must be sufficient to store all key data, since this data will not be exchanged. At the same time, when Redis swaps the data in the memory to the disk, the main thread that provides the service and the sub-thread that performs the swap operation will share this part of the memory, so if the data that needs to be swapped is updated, Redis will block the operation until the sub-thread Modifications can only be made after completing the swap operation. When reading data from Redis, if the value corresponding to the read key is not in the memory, then Redis needs to load the corresponding data from the swap file and then return it to the requester. There is an I/O thread pool problem here. By default, Redis will block, that is, it will not respond until all swap files are loaded. This strategy is more suitable when the number of clients is small and batch operations are performed. If you want to use Redis in a large-scale website application with high concurrency, it is obviously not enough to meet the needs. Therefore, when running Redis, we set the size of the I/O thread pool and perform concurrent operations on read requests that need to load corresponding data from the swap file to reduce blocking time.
For memory-based database systems like Redis and Memcached, the efficiency of memory management is a key factor affecting system performance. The malloc/free function in the traditional C language is the most commonly used method to allocate and release memory, but this method has major flaws: first, for developers, mismatched malloc and free can easily cause memory leaks; second, Frequent calls will cause a large amount of memory fragments that cannot be recycled and reused, reducing memory utilization; finally, as a system call, its system overhead is much greater than that of ordinary function calls. Therefore, in order to improve memory management efficiency, efficient memory management solutions will not directly use malloc/free calls. Both Redis and Memcached use their own memory management mechanisms, but their implementation methods are very different. The memory management mechanisms of the two will be introduced separately below.
Memcached uses the Slab Allocation mechanism by default to manage memory. Its main idea is to divide the allocated memory into blocks of specific lengths according to the predetermined size to store key-value data records of corresponding lengths to completely solve the memory fragmentation problem. The Slab Allocation mechanism is only designed to store external data, which means that all key-value data is stored in the Slab Allocation system, while other memory requests for Memcached are applied for through ordinary malloc/free, because the number of these requests and The frequency determines that they will not affect the performance of the entire system. The principle of Slab Allocation is quite simple. As shown in the figure, it first applies for a large block of memory from the operating system, divides it into chunks of various sizes, and divides chunks of the same size into groups of slab classes. Chunk is used as the smallest unit for storing key-value data. The size of each Slab Class can be controlled by specifying the Growth Factor when Memcached is started. Assume that the value of Growth Factor in the figure is 1.25. If the size of the first group of Chunks is 88 bytes, the size of the second group of Chunks is 112 bytes, and so on.
When Memcached receives the data sent by the client, it will first select the most appropriate Slab Class based on the size of the received data, and then query the list of free chunks in the Slab Class saved by Memcached. Find a Chunk that can be used to store data. When a database expires or is discarded, the Chunk it is located in can be recycled and re-added to the free list.
From the above process, we can see that Memcached's memory management system is highly efficient and will not cause memory fragmentation, but its biggest disadvantage is that it leads to a waste of space. Variable-length data cannot fully utilize the specific length of memory space allocated for each Chunk. As shown in the figure, 100 bytes of data are cached in a 128-byte Chunk, and the remaining 28 bytes are wasted.
The way Redis implements memory management mainly involves the two files zmalloc.h and zmalloc.c in the source code. In order to facilitate memory management, Redis will store the size of this memory in the head of the memory block after allocating a piece of memory. real_ptr points to the memory block returned after redis calls malloc. Redis stores the size of the memory block size in the header. The memory size occupied by size is known and is the length of size_t type, and then returns ret_ptr. When memory needs to be released, ret_ptr is passed to the memory manager. Through ret_ptr, the program can easily calculate the value of real_ptr, and then pass real_ptr to free to release the memory.
Redis records all memory allocations by defining an array. The length of this array is ZMALLOC_MAX_ALLOC_STAT. Each number represents the number of memory blocks currently allocated by the program, and the size of each memory block is equal to the array index in which it is located. In the source code, this array is zmalloc_allocations. zmalloc_allocations[16] represents the number of allocated memory blocks with a length of 16 bytes. There is a static variable used_memory in zmalloc.c to record the total size of currently allocated memory. Therefore, in general, Redis uses packaged mallc/free, which is much simpler than Memcached's memory management method.
Although Redis is a memory-based storage system, it itself supports the persistence of memory data and provides two main persistence strategies: RDB Snapshots and AOF logs. Memcached does not support data persistence operations.
1) RDB snapshot
RDB snapshot is a persistence mechanism of Redis, which allows users to store the current data snapshot as a data file. Redis uses the copy on write mechanism of the fork command to generate a snapshot that is continuously written to the database. When generating a snapshot, use the fork operation to create a child process, and loop through all the data in the child process and write it to the RDB file. We can configure the timing of RDB snapshot generation through Redis's save command. For example, we can configure the snapshot to be generated in 10 minutes, or we can configure it to generate a snapshot after 1,000 writes, or we can implement multiple rules together. The definition of these rules is in the Redis configuration file. You can also set the rules while Redis is running through the Redis CONFIG SET command without restarting Redis.
The Redis RDB file will not be damaged because the writing operation is performed in a new process. When a new RDB file is generated, the child process generated by Redis will first write the data to a temporary file, and then rename the temporary file to an RDB file through the atomic rename system call, so that if a failure occurs at any time, the Redis RDB file is always available. In the internal implementation of Redis master-slave synchronization, RDB files also play an important role. RDB has its shortcomings, that is, once there is a problem with the database, the data saved in our RDB file is not brand new. All the data from the last RDB file generation to Redis shutdown will be lost. In some businesses, this is tolerable.
2) AOF log
The full name of the AOF log is "Append Write File", which is a log file that is continuously appended and written. Different from the binlog of general databases, AOF files are identifiable plain text, and their contents are Redis standard commands one by one. Only commands that will cause data to be modified will be appended to the AOF file. Each command to modify data generates a log, and the AOF file will become larger and larger, so Redis provides another function called AOF rewrite. Its function is to regenerate an AOF file. There will only be one operation on a record in the new AOF file, unlike an old file, which may record multiple operations on the same value. AOF is generated in a similar way to RDB, by forking a process, directly traversing the data and writing it to a new temporary AOF file. While data is being written to the new file, all write operation logs will still be recorded in the original AOF file and will be recorded in the memory buffer at the same time. After completing important operations, all buffer logs will be written to temporary files in batches. Next, use the atomic "rename" command to replace the old AOF file with the new AOF file.
AOF is a file writing operation. Its purpose is to write the operation log to the disk, so it will also encounter the writing operation process we mentioned above. After calling write on AOF in Redis, use the appendfsync option to control the time it takes to call fsync to write it to the disk. The security strength of the three settings of appendfsync below gradually becomes stronger.
appendfsync no When appendfsync is set to no, Redis will not actively call fsync to synchronize the AOF log content to the disk, so all this depends entirely on the debugging of the operating system. In most Linux operating systems, an fsync operation is performed every 30 seconds to write the buffer data to disk.
appendfsync everysec When appendfsync is set to everysec, Redis will make an fsync call by default every second to write the data in the buffer to disk. But when this fsync call lasts longer than 1 second. Redis will adopt the strategy of delaying fsync and wait for another second. That is to say, fsync will be performed after two seconds. This time fsync will be performed regardless of how long it takes to execute. The current write operation will be blocked because the file descriptor will be blocked while the fsync operation is in progress. Therefore, under normal circumstances, Redis will perform an fsync operation every second. In the worst case, an fsync operation occurs every two seconds. This operation is called group commit in most database systems. It combines the data of multiple write operations and writes the log to the disk at once.
appednfsync always When appendfsync is set to always, fsync will be called once for every write operation. At this time, the data is the most secure. Of course, since fsync will be executed every time, its performance will also be affected.
For general business needs, it is recommended to use RDB for persistence. The reason is that the overhead of RDB is much lower than that of AOF logs. For those applications that cannot tolerate data loss , it is recommended to use AOF log.
Memcached is a full-memory data buffering system. Although Redis supports data persistence, full-memory is the essence of its high performance after all. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data that the system can accommodate. In order to expand storage capabilities, when the amount of data to be processed exceeds the physical memory limit of a single machine, a distributed cluster needs to be established.
Memcached itself does not support distribution, so Memcached's distributed storage can only be implemented on the client through distributed algorithms such as consistent hashing. The figure below shows the distributed storage implementation architecture of Memcached. Before the client sends data to the Memcached cluster, the target node of the data will first be calculated through the built-in distributed algorithm, and then the data will be sent directly to the node for storage. When the client queries data, it must first calculate the node where the data to be queried is located, and then send a query request to the node to obtain the data.
Compared with Memcached, which can only use the client to implement distributed storage, Redis prefers to build distributed storage on the server side. The latest version of Redis already supports distributed storage functions. Redis Cluster is an advanced version of Redis that implements distribution and allows single points of failure. It has no central node and has linear scalability. The figure below shows the distributed storage architecture of Redis Cluster, in which nodes communicate with each other through the binary protocol, and between nodes and clients communicate through the ascii protocol. In terms of data placement strategy, Redis Cluster divides the entire key value field into 4096 hash slots, and each node can store one or more hash slots. That is to say, the maximum number of nodes currently supported by Redis Cluster is 4096. The distributed algorithm used by Redis Cluster is also very simple: crc16(key) % HASH_SLOTS_NUMBER.
Redis Cluster introduces Master node and Slave node to ensure that data is still available in the event of a single point of failure. In Redis Cluster, each Master node has two corresponding Slave nodes for redundancy. In this way, in the entire cluster, the downtime of any two nodes will not cause data unavailability. Once the Master node goes offline, the cluster automatically selects a new Master node from the Slave nodes.
The above is the detailed content of What are the differences between redis and Memcached?. For more information, please follow other related articles on the PHP Chinese website!