The project of digging for foreign goods cannot be filed because there is no company name, so the front-end machine is placed on Alibaba Cloud Hong Kong ECS, and an additional Alibaba Cloud Hangzhou ECS is used to run itcrontab——execute the crawler, save the picture Go to Alibaba Cloud OSS and so on. Recently, I felt that Hangzhou ECS was a bit redundant (there was originally a Hangzhou RDS, but it was moved to Hong Kong RDS), and I planned to remove it, so I moved all the crontabs on Hangzhou ECS back to Hong Kong ECS. This caused a lot of trouble. Fewer problems. What problem did it cause? The core problem is that Hong Kong ECS is in an international network environment, and network jitters often occur when accessing mainland servers, which is very unsolvable. To be more specific, for example, when Hong Kong ECS queries Alibaba Cloud Hangzhou open search
(open search
does not have a Hong Kong node, dear╥﹏╥...), it often reports an error; another example is that Hong Kong ECS captures the image and uploads it to Hangzhou OSS (OSS has a Hong Kong node, but the problem is that there is no image processing service. Don’t you think this is a rip-off?), secondly, it is slow. It often gets stuck for a while before reporting an error, which makes the upload efficiency extremely low (I will tell you Is it because of this reason that you have a backlog of thousands of crawled products waiting to upload pictures before they can be put on the shelves?)
The problem is still easy to solve. The SDK provides timeout configuration. I set the timeout limit a little larger (5 seconds), and basically no errors will be reported. The OSS SDK does not provide this configuration at all. In order to solve this problem, I decided to go deep into the SDK to modify the source code.
OSS’s SDK requests the API through
php-curl
. After investigation, I found that this SDK has a file named requestcore.class.php
that defines a RequestCore
class. Obviously, this The class is responsible for sending the request. Among them, prep_request()
is responsible for configuring curl, and send_request($parse = false)
is responsible for executing curl (that is, actually sending the request). First, let’s take a look at
prep_request()
, which contains two timeout configurations for php-curl
: CURLOPT_TIMEOUT
and CURLOPT_CONNECTTIMEOUT
curl_setopt($ curl_handle, CURLOPT_CONNECTTIMEOUT, 120);CURLOPT_TIMEOUT
is easy to understand, it is the timeout limit of the entire curl request process (http request & response), in seconds, if set to 0, there is no limit. CURLOPT_CONNECTTIMEOUT
is difficult to understand. It is currently confirmed that this is a small part of the curl request process, so it must be set smaller than CURLOPT_TIMEOUT
, otherwise CURLOPT_TIMEOUT
is meaningless. The information on the Internet says this:
Thiswaiting time before initiating a connection
is rather vague. I tend to refer to the time it takes to complete the TCP three-way handshake
process, or in other words, the entire process of TCP three-way handshake
must Complete within CURLOPT_CONNECTTIMEOUT
, otherwise it will time out. TCP three-way handshake
If it cannot be completed within the specified time, it means that the server is in a busy/crash state or the network is abnormal, which is consistent with the scenario mentioned in this article. Based on this conjecture, I set
CURLOPT_CONNECTTIMEOUT
to 3 seconds:
In this way, there is no need to wait 2 minutes when the network jitters (SDK setting TheCURLOPT_CONNECTTIMEOUT
is 120 seconds) before reporting an error. PS: If you want to set the timeout to less than 1 second, you need to use
, but according to Brother Niao, this configuration has bugs and has not been tested. Keep an eye out: "A "Bug" in Curl's millisecond timeout"
The above has introduced how to deal with harsh network environments, set timeout limits for php-curl to prevent the server from freezing, including crontab and image saving. I hope it will be helpful to friends who are interested in PHP tutorials.