The number of users has grown rapidly, and the number of visits has doubled in a short period of time. Due to the good early capacity planning, the hardware resources can support it, but there is a big problem in the software system: 40% of the requests will return HTTP 500: Internal Server Error
Problem Description
The number of users has grown rapidly, and the number of visits has doubled in a short period of time. Due to the good early capacity planning, the hardware resources can support it, but the software system A big problem has arisen:
40% of requests will return HTTP 500: Internal Server Error
By looking at the log, I found that the error was in the connection processing of PHP <-> Redis
Debugging Handling
The first time
The root cause was not found at the beginning, so we could only try various error-related methods, such as:
Add PHP connection number, and increase the timeout from 500ms to 2.5s
Disable default_socket_timeout in PHP settings
Disable SYN cookies in the host system
Check the number of file descriptors in Redis and Webservers
Increase the host System mbuffer
Adjust the number of TCP backlog
......
Tried many methods, but all ineffective
Second time
I want to pre-install I tried to reproduce this problem in the release environment. Unfortunately, it still failed. It should be because the traffic is not large enough to reproduce.
The third time
Could it be that Redis is not closed in the code? What about connections?
Normally speaking, PHP will automatically close the resource connection at the end of execution, but there will be memory leaks in older versions. To be on the safe side, modify the code and close the connection manually
The result is still invalid
The 4th time
Suspect target: phpredis client library
Do A/B testing, replace the predis library, and deploy it to 20% of the users in the data center
Thanks to the good code structure, the replacement work was completed quickly
But the result is still invalid, but there is also a good side, which can prove that phpredis is OK
5th time
I checked the Redis version and it was v2.6. The latest version at that time was v2.8.9
Try upgrading Redis. It still doesn’t work after the upgrade.
It’s okay. Stay optimistic. This is not convenient. Upgraded the Redis version to the latest
The 6th time
After searching a large number of documents, I found a good debugging method Redis Software Watchdog in the official documents. After opening it, execute:
$ redis-cli --latency -p 6380 -h 1.2.3.4 min: 0, max: 463, avg: 2.03 (19443 samples)
View the Redis log:
... [20398] 22 May 09:20:55.351 * 10000 changes in 60 seconds. Saving... [20398] 22 May 09:20:55.759 * Background saving started by pid 41941 [41941] 22 May 09:22:48.197 * DB saved on disk [20398] 22 May 09:22:49.321 * Background saving terminated with success [20398] 22 May 09:25:23.299 * 10000 changes in 60 seconds. Saving... [20398] 22 May 09:25:23.644 * Background saving started by pid 42027 ...
Found the problem:
Every other It only takes a few minutes to save data to the hard disk. Why does it take about 400ms to fork a background storage (you can see it from the time of the first and second logs above)
At this point, I finally found the root of the problem. Because there is a large amount of data in the Redis instance, it is very time-consuming to fork the background process for each persistence operation, and the keys are often modified in their business, which leads to frequent persistence triggers, which often causes problems with Redis. Blocking
Solution: Use a separate slave for persistence
This slave does not handle real traffic requests. Its only function is to handle persistence and perform persistence operations on the previous Redis instance. Transferring to this slave
The effect is very obvious and the problem is basically solved, but sometimes an error will still be reported
The 7th time
Troubleshoot the slowness that may block Redis Query and find that keys are used somewhere *
Because there is more and more data in Redis, this command will naturally cause serious blocking
You can use scan to replace it
The 8th time
After the previous adjustments, the problem has been solved. In the following months, even if the traffic continued to grow, it was able to withstand it
But they realized new problems :
The current method is to create a Redis connection when a request comes, execute a few commands, and then disconnect the connection. When the request volume is large, this method produces serious performance waste, more than half of which The commands are used to process connection operations, which exceed the processing of business logic and make Redis slow down.
Solution: Introduce proxy. They chose twitter's twemproxy, which only needs to be added to each webserver. When installing an agent on the computer, twemproxy is responsible for making persistent connections with Redis instances, which greatly reduces connection operations.
Twemproxy also has two convenient places:
Supports memcached
can be blocked Very time-consuming or dangerous commands, such as keys, flushall
The effect is naturally perfect, and you no longer have to worry about previous connection errors
9th time
Through data sharding Let’s continue the optimization:
Split and isolate data in different contexts
Consistent hash sharding of data in the same context
Effect:
Reduce the number of requests on each machine The request and load
improves the reliability of the cache and does not worry about node failure
The above is the entire content of this article, I hope it will be helpful to everyone's learning.
Related recommendations:
PHP method to obtain a 6-digit random number that does not exist in redisLaw
PHP implementationredisMessage queue publishing Weibo method
##CI framework (CodeIgniter) operationredis Step analysis
The above is the detailed content of How to troubleshoot HTTP 500: Internal Server Error with php+redis in actual projects. For more information, please follow other related articles on the PHP Chinese website!