Sometimes, the system load suddenly increases on a Linux server running Nginx and PHP-CGI (php-fpm) web services. Use the top command to check that the CPU usage of many php-cgi processes is close to 100%. Later, I discovered through tracking that the occurrence of this kind of situation is closely related to PHP's file_get_contents() function.
In large and medium-sized websites, API interface calls based on the HTTP protocol are commonplace. PHP programmers like to use the simple and convenient file_get_contents("http://example.com/") function to obtain the returned content of a URL. However, if the website http://example.com/ responds slowly, file_get_contents(" ) will always be stuck there and will not time out.
We know that in php.ini, there is a parameter max_execution_time that can set the maximum execution time of PHP scripts. However, in php-cgi (php-fpm), this parameter will not take effect. What can really control the maximum execution time of a PHP script is the following parameter in the php-fpm.conf configuration file:
The timeout (in seconds) for serving a single request after which the worker process will be terminated
Should be used when ' max_execution_time' ini option does not stop script execution for some reason
'0s' means 'off'
The default value is 0 seconds, which means that the PHP script will continue to execute. In this way, when all php-cgi processes are stuck in the file_get_contents() function, this Nginx+PHP WebServer can no longer handle new PHP requests, and Nginx will return "502 Bad Gateway" to the user. Modifying this parameter is necessary to set the maximum execution time of a PHP script, but it only treats the symptoms rather than the root cause. For example, if it is changed to
To achieve a complete solution, we can only let PHP programmers change the habit of using file_get_contents("http://example.com/") directly, but modify it slightly, add a timeout, and use the following method Implement HTTP GET requests. If you find it troublesome, you can encapsulate the following code into a function yourself.
<?php $ctx = stream_context_create(array( 'http' => array( 'timeout' => 1 //设置一个超时时间,单位为秒 ) ) ); file_get_contents("http://example.com/", 0, $ctx); ?>
Of course, this is not the only reason that causes the php-cgi process CPU to be 100%. So, how to determine whether it is caused by the file_get_contents() function?
First, use the top command to view the php-cgi process with high CPU usage.
top - 10:34:18 up 724 days, 21:01, 3 users, load average: 17.86, 11.16, 7.69
Tasks: 561 total, 15 running, 546 sleeping, 0 stopped, 0 zombie
Cpu( s): 5.9%us, 4.2%sy, 0.0%ni, 89.4%id, 0.2%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 8100996k total, 4320108k used, 3780888k free, 77257 2k buffers
Swap: 8193108k total, 50776k used, 8142332k free, 412088k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAN D 7 www 18 0 360m 22m 12m R 100.6 0.3 0:02.60 php- cgi 09 www 16 0 359m 28m 17m R 96.8 0.4 0:11.34 php-cgi
10745 www 18 0 360m 24m 14m R 94.8 0.3 0:39.51 php-cgi
10707 www 18 0 360m 25m 14m S 77.4 0.3 0:33.48 php-cgi
10782 www 20 0 360m 26m 15m R 75.5 0.3 0:10.93 php-cgi
10708 www 25 0 360m 22m 12m R 69.7 0.3 0:45.16 php-cgi
10683 www 25 0 362m 28m 15m R 54.2 0.4 0:32.65 php-cgi
10711 www 25 0 360m 25m 15m R 52.2 0.3 0:44.25 php-cgi
10688 www 25 0 359m 25m 15m R 38.7 0.3 0:10.44 php-cgi
10719 www 25 0 360m 26m 16m R 7.7 0.3 0:40.59 php-cgi
找其中一个 CPU 100% 的 php-cgi 进程的 PID,用以下命令跟踪一下:
strace -p 10747
如果屏幕显示:
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN }], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout)
select(7, [6], [6 ], [], {15, 0}) = 1 (out [6], left {15, 0})
poll([{fd=6, events=POLLIN}], 1, 0) = 0 (Timeout )
Then, you can be sure that the problem is caused by file_get_contents().