Let me introduce to you the examples and principles of Curl multi-threading. Please tell me if I’m wrong
I believe many people are having headaches about the curl_multi family of functions that are unclear in the PHP manual. They have few documents and the examples given are so simple that you have no way to learn from them. I have also searched many web pages and found nothing. See a complete application example.
curl_multi_add_handle
curl_multi_close
curl_multi_exec
curl_multi_getcontent
curl_multi_info_read
curl_multi_init
curl_multi_remove_handle
curl_multi_select
Generally speaking, think of using these functions , the purpose should obviously be You need to request multiple URLs at the same time instead of requesting them one by one. Otherwise, it is better to loop and adjust curl_exec yourself.
The steps are summarized as follows:
Step 1: Call curl_multi_init
Step 2: Call curl_multi_add_handle in a loop
It should be noted in this step that the second parameter of curl_multi_add_handle is a subhandle derived from curl_init .
Step 3: Continue to call curl_multi_exec
Step 4: Call curl_multi_getcontent in a loop to obtain the results as needed
Step 5: Call curl_multi_remove_handle, and call curl_close for each word handle
Step 6: Call curl_multi_close
Here is an example from the PHP manual:
Features of this class:
The operation is very stable.
If you set a concurrency, you will always work with this concurrency number, even if you add tasks through the callback function, it will not be affected.
The CPU usage is extremely low, and most of the CPU is consumed in the user's callback function.
The memory utilization is high and the number of tasks is large (15W tasks will occupy more than 256M of memory). You can use the callback function to add tasks, and the number is customized.
Can occupy the bandwidth to the maximum extent.
Chained tasks, such as a task that needs to collect data from multiple different addresses, can be completed in one go through callbacks.
Able to make multiple attempts for CURL errors, the number of times can be customized (CURL errors are likely to occur at the beginning due to large concurrency, and CURL errors may also occur due to network conditions or the stability of the other party's server).
The callback function is quite flexible and can perform multiple types of tasks at the same time (such as downloading files, crawling web pages, and analyzing 404 can be performed simultaneously in one PHP process).
It is very easy to customize the task type, such as checking 404, getting the last URL of the redirect, etc.
You can set up cache to challenge product integrity.
Disadvantages:
Cannot make full use of multi-core CPU (it can be solved by opening multiple processes, and you need to handle logic such as task division by yourself).
The maximum concurrency is 500 (or 512?). After testing, it is an internal limit of CURL. Exceeding the maximum concurrency will always result in a failure.
Currently there is no resume function.
The current task is atomic, and it is not possible to divide a large file into several parts and open separate threads to download them.