A certain sub-site is written in php. When accessing nginx, 502 errors will appear from time to time. It is more frequent during peak hours. Check The log of php-fpm found a large number of child exited on signal 7 (SIGBUS), which completely matched the 502 time in the access log, ruling out the possibility of overloading the php process, and then ruling out the suspicion of apc.
Since the php process dies after receiving the signal, try to grab some coredumps for analysis:
First set the save path of the coredump. Pay attention to the place with enough space, because the coredump may be more and larger (for example, if apc is turned on and set to 1G, then there will be 1G):
#echo "/tmp/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
Then modify ulimit to allow coredump:
#ulimit -c unlimited
Restart php-fpm. Before long, a bunch of coredump files will be generated in the /tmp/ directory. Very good. Pack them up and drag them back offline for analysis. Remember to close coredump and restart the program:
#ulimit -c 0
It is generally enough to use gdb to analyze coredump (for binary distribution, install the corresponding debug symbol package first):
gdb /usr/local/php/sbin/php-fpm core.php-fpm.10375.php.1365314990
Execute the bt command and look at the backtrace (I forgot to record the specific information). I found that it is hung in the lex_scan function. I looked at several coredumps and found that they are basically functions hung in the lex stage.
I don’t have much research on PHP source code. I searched Google for “php sigbus lex_scan” and the first two links basically gave me the answer:
The bug in the 2010 annual report has not been closed, because it does not seem to be a PHP bug. If you look carefully, there are examples of recurrences, and finally someone found a way to circumvent it.
This guy went through the same analysis process as me and gave clear reasons and solutions.
To put it simply, lex_scan is performing syntax analysis on PHP files. At this time, an included PHP file was rewritten, and tragedy occurred.
To confirm, I used strace to track the execution of the php process, and finally caught it:
<p>11670 lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</p> <p>11670 stat("/home/www/cache/default.php", {st_mode=S_IFREG|0644, st_size=68579, ...}) = 0</p> <p>11670 --- SIGBUS (Bus error) @ 0 (0)</p>
Source: http://blog.druggo.org/post/2013/05/02/A case of SIGBUS failure in php process