This article will share with you some interview questions about redis cache. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to everyone.
Difference:
memcached can cache images and videos. redis supports more data structures besides k/v;
redis can use virtual memory, redis can be persisted and aof disaster recovery, redis supports data backup through master-slave;
3.redis can be used as a message queue.
Reason: The memcached multi-threading model introduces cache consistency and locking, and locking brings performance losses.
Master-slave replication implementation: the master node takes a snapshot of the data in its own memory, sends the snapshot to the slave node, and the slave node restores the data to the memory. After that, every time new data is added, the master node sends the statement to the slave node in a binary log format similar to mysql, and the slave node gets the statement sent by the master node for replay.
Sharding method:
Client-side sharding
Proxy-based sharding
Twemproxy
codis
Routing Query Sharding
Redis-cluster The body provides the ability to automatically disperse data to different nodes of RedisCluster. Which node a certain data subset of the entire data collection is stored is transparent to the user)
redis- Cluster fragmentation principle: There is a 16384-length slot (virtual slot) in the Cluster, numbered 0-16383. Each Master node will be responsible for a part of the slots. When a certain key is mapped to a slot that a Master is responsible for, then the Master is responsible for providing services for this key. As for which Master node is responsible for which slot, it can be specified by the user, or It is automatically generated during initialization, and only the Master has ownership of the slot. The Master node maintains a 16384/8-byte bit sequence. The Master node uses bits to identify whether it owns a certain slot. For example, for the slot numbered 1, the Master only needs to determine whether the second bit of the sequence (index starting from 0) is 1. This structure makes it easy to add or remove nodes. For example, if I want to add a new node D, I need to get some slots from nodes A, B, and C to D.
[Related recommendations: Redis video tutorial]
redis:
Thread Asetnx (timestamp tl when the locked object times out), if true is returned, the lock is obtained.
Thread B uses get to obtain t1, compares it with the current timestamp, and determines whether it has timed out. If not, it is false. If it times out, execute step 3;
Calculate the new timeout t2, use the getset command to return t3 (this value may have been modified by other threads), if t1==t3, obtain the lock, if t1!=t3, the lock has been acquired by other threads.
After acquiring the lock, process the business logic, and then determine whether the lock has timed out. If it has not timed out, delete the lock. If it has timed out, there is no need to process it (to prevent the locks of other threads from being deleted).
zk:
When the client locks a method, it is in the directory of the specified node corresponding to the method on zk , generate a unique instantaneous ordered node node1;
The client obtains all the child nodes that have been created under the path. If it finds that the serial number of node1 created by itself is the smallest, it will This client acquired the lock.
If it is found that node1 is not the smallest, it will listen to the largest node with a smaller serial number than the node it created and wait.
After acquiring the lock, finish processing the logic and delete the node1 you created. Difference: The performance of zk is worse, the overhead is high, and the implementation is simple.
RDB (RedisDataBase: synchronizes snapshots generated by redis data to disks and other media at different points in time): Snapshots from memory to hard disk, updated regularly. Disadvantages: time-consuming, performance-consuming (fork io operation), easy to lose data.
AOF (AppendOnlyFile: Record all instructions executed by redis. When redis restarts next time, you only need to execute the instructions): Write log. Disadvantages: Large size, slow recovery speed.
bgsave does full image persistence, and aof does incremental persistence. Because bgsave will take a long time and is not real-time enough. It will cause a lot of data loss during shutdown and requires aof to cooperate. When the redis instance is restarted, aof will be used first to restore the memory state. If there is no aof log, it will Use rdb file to restore. Redis will regularly rewrite AOF and compress the AOF file log size. After Redis 4.0, there is a hybrid persistence function, which integrates the full amount of bgsave and the increment of aof, which not only ensures the efficiency of recovery but also takes into account the security of the data. The principle of bgsave, fork and cow, fork means that redis performs bgsave operation by creating a child process, and cow means copyonwrite. After the child process is created, the parent and child processes share the data segment, and the parent process continues to provide read and write services and write dirty pages. The data will gradually be separated from the child process.
Expiration strategy:
Scheduled expiration (one key has a timer), lazy expiration: only when the key is used, it is judged whether the key has expired, and it is cleared when it expires. Periodic expiration: a compromise between the first two.
LRU: newLinkedHashMap
LRU algorithm implementation:
Implemented through a two-way linked list, new data is inserted into the head of the linked list;
Whenever the cache hits (that is, the cached data is accessed), the data is moved to the head of the linked list;
When the linked list is full, the data at the end of the linked list is discarded.
LinkedHashMap: The combination of HashMap and doubly linked list is LinkedHashMap. HashMap is unordered, and LinkedHashMap ensures the iteration order by maintaining an additional doubly linked list. The iteration order can be insertion order (default) or access order.
** Cache penetration: ** refers to querying a data that must not exist. If the data cannot be found from the storage layer, it will not be written to the cache. This will cause the non-existent data to be All requests must go to the DB for query, which may cause the DB to hang.
Solution:
The data returned by the query is empty, the empty result is still cached, but the expiration time will be shorter;
Bloom filter: Hash all possible data into a bitmap that is large enough. Data that must not exist will be intercepted by this bitmap, thus avoiding DB queries.
**Cache breakdown: **For a key with an expiration time set, when the cache expires at a certain point in time, there happens to be a large number of concurrent requests for this key at this point in time. In the past, when these requests found that the cache had expired, they would usually load data from the back-end DB and reset it to the cache. At this time, large concurrent requests may instantly overwhelm the DB.
Solution:
Use a mutex lock: When the cache fails, do not go to Ioaddb immediately. First use setnx such as Redis to set a mutex lock. When When the operation returns successfully, perform the Ioaddb operation and restore the cache. Otherwise, retry the get cache method.
Never expires: Physical does not expire, but logic expires (background asynchronous thread refreshes). Cache avalanche: The same expiration time is used when setting up the cache, causing the cache to expire at the same time at a certain moment, all requests are forwarded to the DB, and the DB is under instantaneous pressure and causes an avalanche. The difference from cache breakdown: avalanche is a lot of keys, breakdown is a certain key cache.
Solution:
Spread the cache expiration time. For example, you can add a random value to the original expiration time, such as 1-5 minutes randomly, so The repetition rate of each cache's expiration time will be reduced, making it difficult to cause a collective failure event.
Situations when choosing redis:
Complex data structure. In this case, the data of value is hash, list, set, ordered set, etc., redis will be chosen because memcache cannot satisfy these data structures. The most typical usage scenario is user order list, user Messages, post comments, etc.
Need to persist data, but be careful not to use redis as a database. If redis hangs, the memory can quickly restore hot data and will not put pressure on the database instantly. On, there is no cache warm-up process. For scenarios where read-only and data consistency requirements are not high, persistent storage can be used
for high availability. Redis supports clusters and can achieve active replication and read-write separation. For memcache, if you want To achieve high availability, secondary development is required.
The stored content is relatively large, and the maximum value stored in memcache is 1M.
Scenarios for choosing memcache:
Pure KV, for businesses with very large amounts of data, memcache is more suitable for the following reasons:
The memory allocation of memcache adopts the management method of pre-allocated memory pool, which can save the time of memory allocation. Redis is a temporary application space, which may lead to fragmentation.
Using virtual memory, memcache stores all data in physical memory. Redis has its own vm mechanism, which can theoretically store more data than physical memory. When the data is exceeded When, swap is triggered and the cold data is refreshed to the disk. From this point, when the amount of data is large, memcache is faster
Network model, memcache uses a non-blocking 10 reuse model , redis also uses non-blocking I. Reuse model, but redis also provides some sorting, aggregation functions, and complex CPU calculations other than KV storage, which will block the entire I0 scheduling. From this point of view, since redis provides more functions, memcache is faster
Threading model, memcache uses multi-threading, the main thread listens, and the worker sub-thread accepts requests and performs reading and writing. There may be lock conflicts in this process. Although the single thread used by redis has no lock conflicts, it is difficult to use the characteristics of multi-core to improve throughput.
Assuming that the main memory is separated and the read-write separated database is used,
If a thread A first deletes the cached data and then writes the data to the main library, at this time, the main library and The slave library synchronization is not completed. Thread B fails to read data from the cache. It reads the old data from the slave library and then updates it to the cache. At this time, the cache contains the old data.
The reason for the above inconsistency is that the master-slave database data is inconsistent. After the cache is added, the master-slave inconsistency time is lengthened.
Processing idea: After the data is updated from the database, the data in the cache will also be updated at the same time. That is, when the data is updated from the database, delete it from the cache and eliminate the old data written during this period. data.
Scenario description: For the master-slave database, reading and writing are separated. If there is a time difference in the master-slave database update synchronization, it will lead to inconsistency in the master-slave database data
Ignore this data inconsistency. In businesses with low data consistency requirements, time-to-time consistency may not be necessary.
Force reading from the main library, use a highly available main library, and database reading and writing All in the main library, add a cache to improve the performance of data reading.
Selectively read the main library, add a cache to record the data that must be read from the main library, use which library, which table, and which primary key as the cache key, and set cache invalidation The time is the synchronization time between the master and slave libraries. If there is this data in the cache, read the master library directly. If there is no primary key in the cache, read it from the corresponding slave library.
master is best not to do persistence work, such as RDB memory snapshot and AOF log file
If the data is important, a slave enables AOF backup, and the policy is set to synchronize once per second
For the speed of master-slave replication and the stability of the connection, it is best for the master and slave to be in a local area network
Try to avoid adding slave libraries to the stressed master library
Do not use a mesh structure for master-slave replication, try to use a linear structure, Master<–Slave1<—Slave2…
voltile-lru selects the least recently used data from the data set that has set the expiration time and eliminates it
voltile-ttl selects the data to be used from the database set that has the expiration time set. Expired data
voltile-random selects and eliminates data from the data set that has set expiration time
allkeys-lru selects the least recently used data from the data set to eliminate
allkeys -random randomly selects the eliminated data from the data set
no-eviction prohibits eviction of data
String String, dictionary Hash, list, set, ordered set SortedSet. If you are a high-level user, there will be more. If you are an intermediate or advanced Redis user, you will also need to add the following data structures HyperLogLog, Geo, and Pub/Sub.
Use the keys command to scan out the key list of the specified mode.
The other party then asked: If this redis is providing services to online businesses, what are the problems with using the keys command?
At this time you have to answer one of the key features of redis: redis's single thread. The keys instruction will cause the thread to block for a period of time and the online service will pause. The service cannot be restored until the instruction is executed. At this time, you can use the scan command. The scan command can extract the key list of the specified mode without blocking, but there will be a certain probability of duplication. Just do it once on the client, but the overall time spent will be longer than using it directly. The keys command is long.
Use the list type to save data information, rpush produces messages, and lpop consumes messages. When lpop has no messages, You can sleep for a period of time and then check whether there is any information. If you don't want to sleep, you can use blpop. When there is no information, it will block until the information arrives. Redis can implement one producer and multiple consumers through the pub/sub topic subscription model. Of course, there are certain shortcomings. When the consumer goes offline, the produced messages will be lost.
Use sortedset, use timestamp as score, message content as key, call zadd to produce messages, and consumers use zrangbyscore Get the data n seconds ago for polling processing.
Redis is essentially a Key-Value type in-memory database, much like memcached. The entire database is loaded into the memory for operation, and the database data is flushed to the hard disk for storage through asynchronous operations on a regular basis.
Because it is a pure memory operation, Redis has excellent performance and can handle more than 100,000 read and write operations per second. It is the fastest Key-ValueDB known to perform.
The excellence of Redis is not just performance. The biggest charm of Redis is that it supports saving a variety of data
structures. In addition, the maximum limit of a single value is 1GB, unlike memcached, which can only save 1MB of data. Therefore Redis can be used to implement many useful functions.
For example, use its List to make a FIFO doubly linked list to implement a lightweight high-performance message queue service, and use its Set to make a high-performance tag system, etc.
In addition, Redis can also set the expire time for the stored Key-Value, so it can also be used as an enhanced version of memcached. The main disadvantage of Redis is that the database capacity is limited by physical memory and cannot be used for high-performance reading and writing of massive data. Therefore, the scenarios suitable for Redis are mainly limited to high-performance operations and calculations of smaller amounts of data.
All values in memcached are simple strings, and redis, as its replacement, supports richer data types
Redis is much faster than memcached
redis can persist its data
String, List, Set, SortedSet, hashes
Memory.
Remote Dictionary Server
noeviction: Returns an error when the memory limit is reached and the client attempts to execute a command that would cause more memory to be used (most write commands, but DEL and a few exceptions)
allkeys-lru: Try to recycle the least used keys (LRU) so that there is space for newly added data.
volatile-lru: Try to recycle least used keys (LRU), but only keys in the expired set, so that there is room for newly added data to be stored.
allkeys-random: Recycle random keys so that there is space for newly added data to be stored.
volatile-random: Recycle random keys so that there is space for newly added data, but only for keys in the expired set.
volatile-ttl: Recycle the keys in the expired set, and give priority to recycling keys with a shorter survival time (TTL), so that there is space for newly added data to be stored.
Because the current Linux version is quite stable and has a large number of users, there is no need to develop a windows version, which will cause compatibility and other problems.
512M
In order to achieve the fastest reading and writing speed, Redis reads all the data into the memory and writes the data to the disk asynchronously.
So redis has the characteristics of fast speed and data persistence. If the data is not placed in memory, disk I/O speed will seriously affect the performance of redis.
As memory becomes cheaper and cheaper today, redis will become more and more popular. If the maximum memory used is set, new values cannot be inserted after the number of existing data records reaches the memory limit.
codis.
The most commonly used cluster solution at present has basically the same effect as twemproxy, but it supports the recovery of old node data to new hash nodes when the number of nodes changes.
The cluster that comes with rediscluster3.0 is characterized by its distributed algorithm not consistent hashing, but the concept of hash slots, and its own support for node settings from slave nodes. See the official documentation for details.
Implemented at the business code layer, create several unrelated redis instances. At the code layer, perform hash calculation on the key, and then operate the data on the corresponding redis instance. This method has relatively high requirements for the hash layer code. Considerations include alternative algorithm solutions after node failure, automatic script recovery after data shock, instance monitoring, etc.
In a cluster with three nodes A, B, and C, without a replication model, if node B fails, the entire cluster will think that there is a lack of slots in the range of 5501-11000. unavailable.
When the size of the redis memory data set increases to a certain size, the data elimination strategy will be implemented.
Session Cache(SessionCache)
One of the most commonly used scenarios for using Redis is session cache (sessioncache). The advantage of using Redis to cache sessions over other storage (such as Memcached) is that Redis provides persistence. When maintaining a cache that does not strictly require consistency, most people would be unhappy if all the user's shopping cart information was lost. Now, would they still be?
Fortunately, as Redis has improved over the years, it is easy to find how to properly use Redis to cache session documents. Even the well-known commercial platform Magento provides Redis plug-ins.
Taking Magento as an example again, Magento provides a plug-in to use Redis as a full-page cache backend.
In addition, for WordPress users, Pantheon has a very good plug-in wp-redis, which can help you load the pages you have browsed as quickly as possible.
If you quickly search "Redisqueues" in Google, you will immediately find a large number of open source projects. The purpose of these projects is to use Redis to create very good back-end tools to meet various queue needs. For example, Celery has a backend that uses Redis as a broker. You can view it from here.
Rankboard/Counter
Redis implements the operation of incrementing or decrementing numbers in memory very well. Sets (Sets) and ordered sets (SortedSet) also make it very simple for us to perform these operations. Redis just provides these two data structures. So, we want to get the top 10 users from the sorted set - we call them "user_scores", we just need to do it like the following:
Of course , this assumes that you are sorting in ascending order based on your users' scores. If you want to return the user and the user's score, you need to execute it like this:
ZRANGEuser_scores010WITHSCORES
AgoraGames is a good example, implemented in Ruby, and its rankings use Redis to store data. You can Seen here.
Redisson, Jedis, lettuce, etc., the official recommendation is to use Redisson.
Redisson is an advanced distributed coordination Redis client that can help users easily implement some Java objects (Bloomfilter, BitSet, Set, SetMultimap, ScoredSortedSet, SortedSet, Map, ConcurrentMap, List, ListMultimap, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, ReadWriteLock, AtomicLong, CountDownLatch, Publish/Subscribe, HyperLogLog).
Jedis is a client implemented by Redis in Java. Its API provides relatively comprehensive support for Redis commands;
Redisson implements a distributed and scalable Java data structure, which is similar to Jedis. Compared with Redis, the function is relatively simple, does not support string operations, and does not support Redis features such as sorting, transactions, pipelines, and partitions. The purpose of Redisson is to promote the separation of users' concerns from Redis, so that users can focus more on processing business logic.
Set password: config set require pass 123456 Authorization password: auth123456
The Redis cluster does not use consistent hashing, but introduces the concept of hash slots. The Redis cluster has 16384 hash slots. Each key is determined by taking the modulo 16384 after passing the CRC16 check. Which slot to place, each node in the cluster is responsible for a part of the hash slot.
In order to make the cluster still available when some nodes fail or most nodes cannot communicate, the cluster uses a master-slave replication model, and each node will have N-1 replicas.
Redis does not guarantee strong consistency of data, which means that in practice, the cluster may lose write operations under certain conditions.
Asynchronous replication
16384.
Redis cluster currently cannot select database, and defaults to database 0.
ping
A request/response server can handle new requests even if old requests have not yet been responded to. This makes it possible to send multiple commands to the server without waiting for a reply, and finally read that reply in one step.
This is pipelining, a technology that has been widely used for decades. For example, many POP3 protocols have implemented support for this function, which greatly speeds up the process of downloading new emails from the server.
A transaction is a single isolated operation: all commands in the transaction will be serialized and executed in order. During the execution of the transaction, it will not be interrupted by command requests sent by other clients. A transaction is an atomic operation: either all of the commands in the transaction are executed, or none of them are executed.
MULTI, EXEC, DISCARD, WATCH
EXPIRE and PERSIST commands.
Use hash tables (hashes) as much as possible. Hash tables (meaning that the number stored in the hash table is small) use very small memory, so you should abstract your data model as much as possible Inside a hash table. For example, if there is a user object in your web system, do not set a separate key for the user's name, surname, email, and password. Instead, store all the user's information in a hash table.
A client ran a new command and added new data.
Redi checks the memory usage. If it is greater than the limit of maxmemory, it will be recycled according to the set policy. A new command is executed, etc.
So we keep crossing the boundary of the memory limit by constantly reaching the boundary and then constantly recycling back below the boundary.
If the result of a command causes a large amount of memory to be used (such as saving the intersection of a large set to a new key), it will not take long for the memory limit to be exceeded by this memory usage.
For more programming related knowledge, please visit: Programming Video! !
The above is the detailed content of Summarize and share some interview questions about redis cache. For more information, please follow other related articles on the PHP Chinese website!