1. What is CURL?
cURL is a tool that uses URL syntax to transfer files and data. It supports many protocols, such as HTTP, FTP, TELNET, etc. The best part is that PHP also supports the cURL library. Using PHP's cURL library can easily and effectively scrape web pages. You only need to run a script and analyze the web pages you crawled, and then you can get the data you want programmatically. Whether you want to retrieve partial data from a link, take an XML file and import it into a database, or even simply retrieve the content of a web page, cURL is a powerful PHP library.
2. CURL function library.
curl_close — Close a curl session
Curl_copy_handle — Copy all contents and parameters of a curl connection resource
Curl_errno — Return a numeric number containing the current session error message
Curl_error — Return a string containing the current session error message
curl_exec — Execute a curl session
Curl_getinfo — Get information about a curl connection resource handle
Curl_init — Initialize a curl session
Curl_multi_add_handle — Add a separate curl handle resource to the curl batch session
Curl_multi_ close — close a Batch handle resource
curl_multi_exec — Parse a curl batch handle
curl_multi_getcontent — Return the text stream of the obtained output
curl_multi_info_read — Get the relevant transmission information of the currently parsed curl
curl_multi_init — Initialize a curl batch handle resource
curl_multi_remove_handle — Remove a handle resource in the curl batch handle resource
curl_multi_select — Get all the sockets associated with the cURL extension, which can then be “selected”
curl_setopt_array — Set up a session for a curl in the form of an array Parameters
curl_setopt — Set session parameters for a curl
Curl_version — Get curl-related version information
The function of the curl_init() function initializes a curl session. The only parameter of the curl_init() function is optional and represents a url address.
The function of the curl_exec() function is to execute a curl session. The only parameter is the handle returned by the curl_init() function.
The curl_close() function is used to close a curl session. The only parameter is the handle returned by the curl_init() function.
3. Basic steps for setting up a CURL request in PHP
①: Initialization
curl_init()
②: Setting attributes
curl_setopt(). There is a long list of cURL parameters to set, which can specify various details of the URL request .
③: Execute and get the results
curl_exec()
④: Release the handle
curl_close()
IV. CURL implements GET and POST
①: GET method to implement
<?php //初始化 $curl = curl_init(); //设置抓取的url curl_setopt($curl, CURLOPT_URL, 'http://www.baidu.com'); //设置头文件的信息作为数据流输出 curl_setopt($curl, CURLOPT_HEADER, 1); //设置获取的信息以文件流的形式返回,而不是直接输出。 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); //执行命令 $data = curl_exec($curl); //关闭URL请求 curl_close($curl); //显示获得的数据 print_r($data); ?>
②:POST Ways to implement
<?php //初始化 $curl = curl_init(); //设置抓取的url curl_setopt($curl, CURLOPT_URL, 'http://www.baidu.com'); //设置头文件的信息作为数据流输出 curl_setopt($curl, CURLOPT_HEADER, 1); //设置获取的信息以文件流的形式返回,而不是直接输出。 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); //设置post方式提交 curl_setopt($curl, CURLOPT_POST, 1); //设置post数据 $post_data = array( "username" => "coder", "password" => "12345" ); curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data); //执行命令 $data = curl_exec($curl); //关闭URL请求 curl_close($curl); //显示获得的数据 print_r($data); ?>
③: If the data obtained is in json format, use the json_decode function to interpret it into an array.
$output_array = json_decode($output,true);
If you use json_decode($output) to parse, you will get object type data.
5. A function encapsulated by myself
//参数1:访问的URL,参数2:post数据(不填则为GET),参数3:提交的$cookies,参数4:是否返回$cookies function curl_request($url,$post='',$cookie='', $returnCookie=0){ $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)'); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($curl, CURLOPT_AUTOREFERER, 1); curl_setopt($curl, CURLOPT_REFERER, "http://XXX"); if($post) { curl_setopt($curl, CURLOPT_POST, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($post)); } if($cookie) { curl_setopt($curl, CURLOPT_COOKIE, $cookie); } curl_setopt($curl, CURLOPT_HEADER, $returnCookie); curl_setopt($curl, CURLOPT_TIMEOUT, 10); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); $data = curl_exec($curl); if (curl_errno($curl)) { return curl_error($curl); } curl_close($curl); if($returnCookie){ list($header, $body) = explode("\r\n\r\n", $data, 2); preg_match_all("/Set\-Cookie:([^;]*);/", $header, $matches); $info['cookie'] = substr($matches[1][0], 1); $info['content'] = $body; return $info; }else{ return $data; } }
Attached optional parameter description:
First category:
For the optional parameters of the following options, the value should be set to a bool Type value:
Option
Optional value
Remarks
CURLOPT_AUTOREFERER
When redirecting based on Location:, the Referer: information in the header is automatically set.
CURLOPT_BINARYTRANSFER
When CURLOPT_RETURNTRANSFER is enabled, return raw (Raw) output.
CURLOPT_COOKIESESSION
When enabled, curl will only pass one session cookie and ignore other cookies. By default, curl will return all cookies to the server. Session cookies refer to cookies that are used to determine whether the server-side session is valid.
CURLOPT_CRLF
When enabled, convert Unix line feed characters into carriage return and line feed characters.
CURLOPT_DNS_USE_GLOBAL_CACHE
When enabled, a global DNS cache will be enabled. This item is thread-safe and enabled by default.
CURLOPT_FAILONERROR
Display the HTTP status code. The default behavior is to ignore HTTP messages with numbers less than or equal to 400.
CURLOPT_FILETIME
When enabled, it will try to modify the information in the remote document. The result information will be returned through the CURLINFO_FILETIME option of the curl_getinfo() function. curl_getinfo().
CURLOPT_FOLLOWLOCATION
When enabled, the "Location:" returned by the server will be placed in the header and returned to the server recursively. Use CURLOPT_MAXREDIRS to limit the number of recursive returns.
CURLOPT_FORBID_REUSE
Forcibly disconnect after completing the interaction and cannot be reused.
CURLOPT_FRESH_CONNECT
Force to obtain a new connection to replace the connection in the cache.
CURLOPT_FTP_USE_EPRT
When enabled, use the EPRT (or LPRT) command when FTP downloads. When set to FALSE disables EPRT and LPRT, using the PORT command only.
CURLOPT_FTP_USE_EPSV
When enabled, the EPSV command is first tried before reverting to PASV mode during FTP transfers. Disables EPSV commands when set to FALSE.
CURLOPT_FTPAPPEND
When enabled append writes to the file instead of overwriting it.
CURLOPT_FTPASCII
An alias for CURLOPT_TRANSFERTEXT.
CURLOPT_FTPLISTONLY
When enabled, only the name of the FTP directory will be listed.
CURLOPT_HEADER
When enabled, the header file information will be output as a data stream.
CURLINFO_HEADER_OUT
The request string of the tracking handle when enabled.
Available starting from PHP 5.1.3. The CURLINFO_ prefix is intentional.
CURLOPT_HTTPGET
When enabled, the HTTP method will be set to GET. Because GET is the default, it is only used when it is modified.
CURLOPT_HTTPPROXYTUNNEL
When enabled, it will be transmitted through HTTP proxy.
CURLOPT_MUTE
When enabled, all modified parameters in the cURL function will be restored to their default values.
CURLOPT_NETRC
After the connection is established, access the ~/.netrc file to obtain the username and password information to connect to the remote site.
CURLOPT_NOBODY
When enabled, the BODY part in HTML will not be output.
CURLOPT_NOPROGRESS
Turn off the progress bar of curl transmission when enabled. The default setting of this item is enabled.
Note:
PHP automatically sets this option to TRUE, this option should only be changed for debugging purposes.
CURLOPT_NOSIGNAL
When enabled, ignore all signals passed by curl to php. This item is enabled by default during SAPI multi-threaded transmission.
Added in cURL 7.10.
CURLOPT_POST
When enabled, a regular POST request will be sent, type: application/x-www-form-urlencoded, just like the form submission.
CURLOPT_PUT
When enabled, HTTP is allowed to send files. CURLOPT_INFILE and CURLOPT_INFILESIZE must be set at the same time.
CURLOPT_RETURNTRANSFER
Return the information obtained by curl_exec() in the form of a file stream instead of outputting it directly.
CURLOPT_SSL_VERIFYPEER
When disabled, cURL will terminate verification from the server. Set the certificate using the CURLOPT_CAINFO option. Set the certificate directory using the CURLOPT_CAPATH option. If CURLOPT_SSL_VERIFYPEER (default 2) is enabled, CURLOPT_SSL_VERIFYHOST needs to be set to TRUE otherwise set to FALSE.
The default is TRUE since cURL 7.10. Starting with cURL 7.10, bundle installation is defaulted.
CURLOPT_TRANSFERTEXT
When enabled, use ASCII mode for FTP transfers. For LDAP, it retrieves plain text information rather than HTML. On Windows systems, the system does not set STDOUT to binary mode.
CURLOPT_UNRESTRICTED_AUTH
Continuously append username and password information to multiple locations in the header generated using CURLOPT_FOLLOWLOCATION, even if the domain name has changed.
CURLOPT_UPLOAD
Allow file uploads when enabled.
CURLOPT_VERBOSE
When enabled, all information will be reported and stored in STDERR or the specified CURLOPT_STDERR.
Second category:
For the optional parameters of the following options, value should be set to an integer type value:
Option
Optional value
Remarks
CURLOPT_BUFFERSIZE
Each time The size of the cache is read from the obtained data, but there is no guarantee that this value will be filled every time.
Added in cURL 7.10.
CURLOPT_CLOSEPOLICY
Either CURLCLOSEPOLICY_LEAST_RECENTLY_USED or CURLCLOSEPOLICY_OLDEST, there are three other CURLCLOSEPOLICY_, but cURL does not support it yet.
CURLOPT_CONNECTTIMEOUT
The time to wait before initiating a connection. If set to 0, it will wait indefinitely.
CURLOPT_CONNECTTIMEOUT_MS
The time to wait for a connection attempt, in milliseconds. If set to 0, wait infinitely.
Added in cURL 7.16.2. Available starting with PHP 5.2.3.
CURLOPT_DNS_CACHE_TIMEOUT
Set the time to save DNS information in memory, the default is 120 seconds.
CURLOPT_FTPSSLAUTH
FTP authentication method: CURLFTPAUTH_SSL (try SSL first), CURLFTPAUTH_TLS (try TLS first) or CURLFTPAUTH_DEFAULT (let cURL decide automatically).
Added in cURL 7.12.2.
CURLOPT_HTTP_VERSION
CURL_HTTP_VERSION_NONE (default value, let cURL decide which version to use), CURL_HTTP_VERSION_1_0 (force to use HTTP/1.0) or CURL_HTTP_VERSION_1_1 (force to use HTTP/1.1).
CURLOPT_HTTPAUTH
HTTP authentication method used, optional values are: CURLAUTH_BASIC, CURLAUTH_DIGEST, CURLAUTH_GSSNEGOTIATE, CURLAUTH_NTLM, CURLAUTH_ANY and CURLAUTH_ANYSAFE.
Multiple values can be separated using the | bitfield (or) operator, and cURL lets the server choose the one that supports the best value.
CURLAUTH_ANY is equivalent to CURLAUTH_BASIC | CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM.
CURLAUTH_ANYSAFE is equivalent to CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM.
CURLO PT_INFILESIZE
Set the size limit of uploaded files in bytes.
CURLOPT_LOW_SPEED_LIMIT
When the transmission speed is less than CURLOPT_LOW_SPEED_LIMIT (bytes/sec), PHP will use CURLOPT_LOW_SPEED_TIME to determine whether to cancel the transmission because it is too slow.
CURLOPT_LOW_SPEED_TIME
When the transmission speed is less than CURLOPT_LOW_SPEED_LIMIT (bytes/sec), PHP will use CURLOPT_LOW_SPEED_TIME to determine whether to cancel the transmission because it is too slow.
CURLOPT_MAXCONNECTS
The maximum number of connections allowed. If it exceeds, CURLOPT_CLOSEPOLICY will be used to determine which connections should be stopped.
CURLOPT_MAXREDIRS
Specify the maximum number of HTTP redirects. This option is used together with CURLOPT_FOLLOWLOCATION.
CURLOPT_PORT
Used to specify the connection port. (Optional)
CURLOPT_PROTOCOLS
Bit field refers to CURLPROTO_*. If enabled, the bitfield value limits which protocols libcurl can use during transfers. This will allow you to compile libcurl to support many protocols, but only to use a subset of them that are allowed to be used. By default libcurl will use all protocols it supports. See CURLOPT_REDIR_PROTOCOLS.
The available protocol options are: CURLPROTO_HTTP, CURLPROTO_HTTPS, CURLPROTO_FTP, CURLPROTO_FTPS, CURLPROTO_SCP, CURLPROTO_SFTP, CURLPROTO_TELNET, CURLPROTO_LDAP, CURLPROTO_LDAPS, CURLPROTO_DICT, CURLPROTO_FILE, CURLPROTO_TFTP, CURLPROTO_ALL
were added in cURL 7.19.4.
CURLOPT_PROXYAUTH
Verification method for HTTP proxy connection. Use the bitfield flags in CURLOPT_HTTPAUTH to set the corresponding options. For proxy authentication only CURLAUTH_BASIC and CURLAUTH_NTLM are currently supported.
Added in cURL 7.10.7.
CURLOPT_PROXYPORT
The port of the proxy server. The port can also be set in CURLOPT_PROXY.
CURLOPT_PROXYTYPE
Either CURLPROXY_HTTP (default) or CURLPROXY_SOCKS5.
Added in cURL 7.10.
CURLOPT_REDIR_PROTOCOLS
Bit field values in CURLPROTO_*. If enabled, the bitfield value will limit the protocols that the transport thread can use when following a redirect when CURLOPT_FOLLOWLOCATION is turned on. This will allow you to restrict the transport thread to a subset of allowed protocols when redirecting. By default libcurl will allow all protocols except FILE and SCP. This is slightly different from the 7.19.4 pre-release version which unconditionally follows all supported protocols. For protocol constants, please refer to CURLOPT_PROTOCOLS.
Added in cURL 7.19.4.
CURLOPT_RESUME_FROM
Pass a byte offset when resuming transmission (used to resume transmission from breakpoint).
CURLOPT_SSL_VERIFYHOST
1 Check whether there is a common name in the server SSL certificate. Translator's Note: Common Name generally means filling in the domain name (domain) or subdomain (sub domain) for which you are going to apply for an SSL certificate. 2 Check that the common name exists and matches the provided host name.
CURLOPT_SSLVERSION
The SSL version to use (2 or 3). By default PHP will detect this value by itself, although in some cases it may need to be set manually.
CURLOPT_TIMECONDITION
If it has been edited after a certain time specified by CURLOPT_TIMEVALUE, use CURL_TIMECOND_IFMODSINCE to return the page. If it has not been modified and CURLOPT_HEADER is true, a "304 Not Modified" header will be returned. CURLOPT_HEADER is false. Then use CURL_TIMECOND_IFUNMODSINCE, the default value is CURL_TIMECOND_IFUNMODSINCE.
CURLOPT_TIMEOUT
Set the maximum number of seconds cURL is allowed to execute.
CURLOPT_TIMEOUT_MS
Set the maximum number of milliseconds that cURL is allowed to execute.
Added in cURL 7.16.2. Available from PHP 5.2.3 onwards.
CURLOPT_TIMEVALUE
Set a timestamp used by CURLOPT_TIMECONDITION. By default, CURL_TIMECOND_IFMODSINCE is used.
The third category:
For the optional parameters of the following options, value should be set to a string type value:
Option
Optional value
Remarks
CURLOPT_CAINFO
Each holds 1 or multiple file names for certificates to be verified by the server. This parameter is only meaningful when used with CURLOPT_SSL_VERIFYPEER. .
CURLOPT_CAPATH
A directory that holds multiple CA certificates. This option is used with CURLOPT_SSL_VERIFYPEER.
CURLOPT_COOKIE
Set the "Cookie:" part of the HTTP request. Multiple cookies are separated by a semicolon followed by a space (for example, "fruit=apple; color=red").
CURLOPT_COOKIEFILE
The file name containing cookie data. The format of the cookie file can be Netscape format, or just pure HTTP header information can be stored in the file.
CURLOPT_COOKIEJAR
A file that saves cookie information after the connection is completed.
CURLOPT_CUSTOMREQUEST
Use a custom request message instead of "GET" or "HEAD" as the HTTP request. This is useful for performing "DELETE" or other more covert HTTP requests. Valid values are "GET", "POST", "CONNECT", etc. That is, don't enter the entire HTTP request here. For example, entering "GET /index.html HTTP/1.0rnrn" is incorrect.
Note:
Do not use this custom request method until you are sure that the server supports it.
CURLOPT_EGDSOCKET
Similar to CURLOPT_RANDOM_FILE, except for an Entropy Gathering Daemon socket.
CURLOPT_ENCODING
The value of "Accept-Encoding:" in the HTTP request header. Supported encodings are "identity", "deflate" and "gzip". If it is the empty string "", the request header will send all supported encoding types.
Added in cURL 7.10.
CURLOPT_FTPPORT
This value will be used to obtain the IP address required for the FTP "POST" command. The "POST" command tells the remote server to connect to the IP address we specified. This string can be a plain text IP address, a hostname, a network interface name (under UNIX) or just a '-' to use the default IP address.
CURLOPT_INTERFACE
Network sending interface name, which can be an interface name, IP address or a host name.
CURLOPT_KRB4LEVEL
KRB4 (Kerberos 4) security level. Any of the following values are valid (in order from lowest to highest): "clear", "safe", "confidential", "private". If the string matches none of these, "private" will be used. Setting this option to NULL disables KRB4 security authentication. Currently KRB4 security certification can only be used for FTP transfers.
CURLOPT_POSTFIELDS
All data is sent using the "POST" operation in the HTTP protocol. To send a file, prefix the file name with @ and use the full path. This parameter can be passed through a urlencoded string like 'para1=val1¶2=val2&...' or an array with the field name as the key and the field data as the value. If value is an array, the Content-Type header will be set to multipart/form-data.
CURLOPT_PROXY
HTTP proxy channel.
CURLOPT_PROXYUSERPWD
A string in the format of "[username]:[password]" used to connect to the proxy.
CURLOPT_RANDOM_FILE
A file name used to generate SSL random number seeds.
CURLOPT_RANGE
In the form of "X-Y", where X and Y are both optional options to obtain the range of data, measured in bytes. The HTTP transport thread also supports several such duplicates separated by commas such as "X-Y,N-M".
CURLOPT_REFERER
The content of "Referer:" in the HTTP request header.
CURLOPT_SSL_CIPHER_LIST
A list of SSL encryption algorithms. For example RC4-SHA and TLSv1 are both available encryption lists.
CURLOPT_SSLCERT
A file name containing a certificate in PEM format.
CURLOPT_SSLCERTPASSWD
The password required to use the CURLOPT_SSLCERT certificate.
CURLOPT_SSLCERTTYPE
Type of certificate. Supported formats are "PEM" (default), "DER" and "ENG".
Added in cURL 7.9.3.
CURLOPT_SSLENGINE
The encryption engine variable used for the SSL private key specified in CURLOPT_SSLKEY.
CURLOPT_SSLENGINE_DEFAULT
Variable used for asymmetric encryption operations.
CURLOPT_SSLKEY
The file name containing the SSL private key.
CURLOPT_SSLKEYPASSWD
The password of the SSL private key specified in CURLOPT_SSLKEY.
Note:
Since this option contains sensitive password information, remember to keep this PHP script safe.
CURLOPT_SSLKEYTYPE
The encryption type of the private key specified in CURLOPT_SSLKEY. The supported key types are "PEM" (default value), "DER" and "ENG".
CURLOPT_URL
The URL address to be obtained can also be set in the curl_init() function.
CURLOPT_USERAGENT
Contains a "User-Agent:" header string in the HTTP request.
CURLOPT_USERPWD
Pass the username and password required for a connection, in the format: "[username]:[password]".
The fourth category
For the optional parameters of the following options, value should be set to an array:
Option
Optional value value
Remarks
CURLOPT_HTTP200ALIASES
200 response code array, in the array response is considered a correct response, otherwise it is considered an error.
Added in cURL 7.10.3.
CURLOPT_HTTPHEADER
An array used to set HTTP header fields. Use an array in the following form to set: array('Content-type: text/plain', 'Content-length: 100′)
CURLOPT_POSTQUOTE
A set of FTP commands executed on the server after the FTP request is executed. .
CURLOPT_QUOTE
A set of FTP commands executed on the server before the FTP request.
For the optional parameters of the following options, value should be set to a stream resource (for example, using fopen()):
Option
Optional value
CURLOPT_FILE
Set the location of the output file, the value is a Resource type, default is STDOUT (browser).
CURLOPT_INFILE
The file address that needs to be read when uploading a file. The value is a resource type.
CURLOPT_STDERR
Set an error output address, the value is a resource type, replacing the default STDERR.
CURLOPT_WRITEHEADER
Set the file address where the header part is written, and the value is a resource type.
For the optional parameters of the following options, value should be set to a callback function name:
Option
Optional value
CURLOPT_HEADERFUNCTION
Set a callback function. This function has two parameters. The first One is the cURL resource handle, and the second is the output header data. The output of header data must rely on this function, which returns the size of the written data.
CURLOPT_PASSWDFUNCTION
Set a callback function with three parameters. The first is the cURL resource handle, the second is a password prompt, and the third parameter is the maximum allowed password length. Returns the value of the password.
CURLOPT_PROGRESSFUNCTION
Set a callback function with three parameters. The first is the cURL resource handle, the second is a file descriptor resource, and the third is the length. Returns the contained data.
CURLOPT_READFUNCTION
A callback function with two parameters. The first parameter is the session handle, and the second parameter is the string of HTTP response header information. Using this function, the returned data will be processed yourself. The return value is the data size in bytes. Returning 0 represents the EOF signal.
CURLOPT_WRITEFUNCTION
A callback function with two parameters. The first parameter is the session handle, and the second parameter is the string of HTTP response header information. Using this callback function, the response header information will be processed by itself. The response header information is the entire string. Set the return value to the exact length of the string written. The transport thread terminates when an error occurs.