Let's talk about the reasons why Redis is slow and how to troubleshoot it.-Redis-php.cn

This article brings you relevant knowledge about Redis, which mainly introduces related issues about the reasons why Redis slows down and troubleshooting methods. Let’s take a look at it together. I hope it will be helpful to everyone. help.

Let's talk about the reasons why Redis is slow and how to troubleshoot it.

Recommended learning: Redis video tutorial

Cause 1: The instance memory has reached the upper limit

Troubleshooting Idea

If your Redis instance sets a memory upper limit of maxmemory, it may also cause Redis to slow down.

When we use Redis as a pure cache, we usually set a memory upper limit maxmemory for this instance, and then set a data elimination strategy. When the memory of the instance reaches maxmemory, you may find that the operation delay increases every time you write new data after that.

Causes of slowdown

When the Redis memory reaches maxmemory, Redis must kick out some data from the instance each time before writing new data. Keep the entire instance's memory below maxmemory before new data can be written.

This logic of kicking out old data also takes time, and the specific length of time depends on the elimination strategy you configured:

allkeys-lru: regardless of key Whether expiration is set, eliminate the least recently accessed key
volatile-lru: Only eliminate the least recently accessed key with an expiration time set
allkeys-random: Regardless of whether the key is set to expire , Randomly eliminate keys
volatile-random: Only keys with expiration time set will be randomly eliminated
allkeys-ttl: Regardless of whether the key is set to expire, keys that are about to expire will be eliminated
noeviction: No keys will be eliminated. After the instance memory reaches maxmeory, new data will be written and an error will be returned directly
allkeys-lfu: Regardless of whether the key is set to expire, the key with the lowest access frequency will be eliminated (supported in version 4.0)
volatile-lfu: Only eliminate the lowest access frequency and set the expiration time key (supported by version 4.0)

Which strategy to use depends on the specific business scenario. configuration. The most commonly used elimination strategy is allkeys-lru / volatile-lru. Their processing logic is to randomly take out a batch of keys from the instance each time (this number is configurable), then eliminate one of the least accessed keys, and then use the remaining keys. Temporarily store the keys in a pool, continue to randomly select a batch of keys, compare them with the keys in the previous pool, and then eliminate the least accessed key. Repeat this until the instance memory drops below maxmemory.

It should be noted that the logic of eliminating data in Redis is the same as that of deleting expired keys. It is also executed before the command is actually executed. In other words, it will also increase the delay in our operation of Redis. Moreover, writing OPS The higher it is, the more noticeable the delay will be.

In addition, if bigkey is also stored in your Redis instance at this time, it will also take a long time to eliminate and delete bigkey to release memory.

Did you see it? The dangers of bigkey are everywhere, which is why I reminded you before to try not to store bigkey.

Solution

Avoid storing bigkey and reduce the time of releasing memory
The elimination strategy is changed to random elimination, and random elimination is better than LRU It is much faster (depending on the business situation)
Split the instance and distribute the pressure of key elimination to multiple instances
If you are using Redis 4.0 or above, enable the layz-free mechanism , put the operation of eliminating keys and releasing memory into the background thread (configuration lazyfree-lazy-eviction = yes)

Cause 2: Turn on large memory pages

Troubleshooting Idea

We all know that when an application applies for memory from the operating system, it applies by memory page, and the conventional memory page size is 4KB.
Starting from 2.6.38, the Linux kernel supports the memory huge page mechanism, which allows applications to apply for memory from the operating system in 2MB units.
The memory unit that the application applies to the operating system each time becomes larger, but this also means that the time it takes to apply for memory becomes longer.

Causes of slowdown

When Redis is executing background RDB and AOF rewrite, it uses fork child processes to handle it. However, after the main process forks the child process, the main process at this time can still receive write requests, and incoming write requests will use Copy On Write (copy on write) method to operate memory data.
In other words, once the main process has data that needs to be modified, Redis will not directly modify the data in the existing memory. Instead, it will first copy the memory data out and then modify the data in the new memory. , this is called "copy-on-write".
Copy on write can also be understood as, whoever needs to write needs to copy first and then modify.
The advantage of this is that any write operations by the parent process will not affect the data persistence of the child process (the child process only persists all the data in the entire instance at the moment of fork, and does not care new data changes, because the child process only needs a memory snapshot, which is then persisted to disk).
But please note that when the main process is copying memory data, this stage involves the application of new memory. If the operating system enables large memory pages at this time, then during this period, even if the client If only 10B of data is modified, Redis will also apply to the operating system in units of 2MB when applying for memory. The time taken to apply for memory will become longer, which will increase the delay of each write request and affect Redis performance.
Similarly, if this write request operates on a bigkey, then when the main process copies the bigkey memory block, the memory requested at one time will be larger and the time will be longer. It can be seen that bigkey affects performance here again.

Solution

Turn off the huge page mechanism.

First of all, you need to check whether the Redis machine has large memory pages enabled:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

Copy after login

If the output option is always, it means that the large memory page mechanism is currently enabled, and we need to turn it off:

$ echo never > /sys/kernel/mm/transparent_hugepage/enabled

Copy after login

In fact, the advantage of the large memory page mechanism provided by the operating system is that it can reduce the number of application memory requests for certain programs.

But for a database like Redis that is extremely sensitive to performance and latency, we hope that Redis will take as short a time as possible each time it applies for memory, so I do not recommend that you enable this mechanism on the Redis machine.

Cause 3: Use Swap

Troubleshooting ideas

If you find that Redis suddenly becomes very slow, each operation takes up to Hundreds of milliseconds or even seconds, then you need to check whether Redis uses Swap. In this case, Redis is basically unable to provide high-performance services.

Causes of slowdown

What is Swap? Why does using Swap cause Redis performance degradation?

If you know something about the operating system, you will know that in order to alleviate the impact of insufficient memory on applications, the operating system allows a part of the data in the memory to be moved to the disk to buffer the memory usage of the application. , these memory data are swapped to an area on the disk, which is Swap.

The problem is that when the data in the memory is moved to the disk, when Redis accesses the data again, it needs to read it from the disk. The speed of accessing the disk is hundreds of times slower than accessing the memory! Especially for a database like Redis, which has extremely high performance requirements and is extremely sensitive to performance, this operation delay is unacceptable.

At this point, you need to check the memory usage of the Redis machine to confirm whether Swap is used. You can check whether the Redis process uses Swap in the following way:

# 先找到 Redis 的进程 ID
$ ps -aux | grep redis-server
 
# 查看 Redis Swap 使用情况
$ cat /proc/$pid/smaps | egrep '^(Swap|Size)'

Copy after login

The output result is as follows

Size:               1256 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              63488 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              65404 kB
Swap:                  0 kB
Size:            1921024 kB
Swap:                  0 kB
...

Copy after login

This result will list the memory usage of the Redis process.

Each row of Size represents the size of a piece of memory used by Redis. The Swap below Size represents the size of memory and how much data has been swapped to the disk. If these two values are equal, it means that this The data in the block memory has been completely swapped to the disk.

If only a small amount of data is swapped to the disk, for example, each Swap occupies a small proportion of the corresponding Size, the impact will not be great. If hundreds of megabytes or even GB of memory are swapped to the disk, then you need to be vigilant. In this case, the performance of Redis will definitely drop sharply.

Solution

Increase the memory of the machine so that Redis has enough memory to use
to organize the memory space and release enough The memory is used by Redis, and then the Swap of Redis is released so that Redis can reuse the memory.

The process of releasing the Swap of Redis usually requires restarting the instance. In order to avoid the impact of restarting the instance on the business, the master-slave process is usually performed first. Switch, then release the Swap of the old master node, restart the old master node instance, wait until the slave database data synchronization is completed, and then perform the master-slave switch.

It can be seen that when Redis uses Swap, the performance of Redis at this time basically cannot meet the high-performance requirements (you can understand that martial arts is abolished), so you also need to prevent this situation in advance.

The way to prevent it is that you need to monitor the memory and Swap usage of the Redis machine, alarm when the memory is insufficient or Swap is used, and handle it in time.

Cause 4: Network bandwidth overload

Troubleshooting ideas

If you have avoided the above scenarios that cause performance problems, and Redis is also stable It has been running for a long time, but after a certain point in time, the operation of Redis suddenly starts to slow down, and it continues. What is the reason for this situation?

At this point you need to check whether the network bandwidth of the Redis machine is overloaded, and whether there is an instance that fills up the network bandwidth of the entire machine.

Causes of slowdown

When the network bandwidth is overloaded, the server will experience packet sending delays, packet loss, etc. at the TCP layer and network layer.

The high performance of Redis, in addition to operating memory, lies in network IO. If there is a bottleneck in network IO, it will also seriously affect the performance of Redis.

Solution

Confirm in time that the Redis instance has filled up the network bandwidth. If it is a normal business access, you need to expand or migrate the instance in time to avoid Because the traffic of this instance is too large, it affects other instances of this machine.
At the operation and maintenance level, you need to increase monitoring of various indicators of the Redis machine, including network traffic. When the network traffic reaches a certain threshold, alarm in advance and confirm and expand capacity in a timely manner.

Reason 5: Other reasons

1) Frequent short connections

Your business application should use long connections to operate Redis to avoid Frequent short connections.

Frequent short connections will cause Redis to spend a lot of time establishing and releasing connections. TCP's three-way handshake and four-way wave will also increase access delays.

2) Operation and maintenance monitoring

I also mentioned before that if you want to predict the slowdown of Redis in advance, it is essential to do a good job in monitoring. .

Monitoring is actually the collection of various runtime indicators of Redis. The usual approach is for the monitoring program to regularly collect the INFO information of Redis, and then perform data display and alarm based on the status data in the INFO information.

What I need to remind you here is that you cannot take it lightly when writing some monitoring scripts or using open source monitoring components.

When writing monitoring scripts to access Redis, try to use long connections to collect status information to avoid frequent short connections. At the same time, you must also pay attention to controlling the frequency of access to Redis to avoid affecting business requests.

When using some open source monitoring components, it is best to understand the implementation principles of these components and configure these components correctly to prevent bugs in the monitoring components, resulting in a large number of Redis operations in a short period of time and affecting the performance of Redis. occur.

It happened to us that when the DBA was using some open source components, due to configuration and usage issues, the monitoring program frequently established and disconnected from Redis, causing Redis to respond slowly.

3) Other programs compete for resources

The last thing I need to remind you is that your Redis machine is best dedicated and only used to deploy Redis instances. Do not deploy For other applications, try to provide a relatively "quiet" environment for Redis to prevent other programs from occupying CPU, memory, and disk resources, resulting in insufficient resources allocated to Redis.

Recommended learning: Redis video tutorial

The above is the detailed content of Let's talk about the reasons why Redis is slow and how to troubleshoot it.. For more information, please follow other related articles on the PHP Chinese website!