Using PHP's cURL library can easily and effectively scrape web pages. You only need to run a script and analyze the web pages you crawled, and then you can get the data you want programmatically. Whether you want to retrieve partial data from a link, take an XML file and import it into a database, or even simply retrieve the content of a web page, cURL is a powerful PHP library.
CURL function library (Client URL Library Function) in PHP
curl_close — close a curl session
curl_copy_handle — Copy all contents and parameters of a curl connection resource
curl_errno — Returns a numeric number containing error information for the current session
curl_error — Returns a string containing error information for the current session
curl_exec — Execute a curl session
curl_getinfo — Get information about a curl connection resource handle
curl_init — Initialize a curl session
curl_multi_add_handle — Add individual curl handle resources to a curl batch session
curl_multi_close — Close a batch handle resource
curl_multi_exec — Parse a curl batch handle
curl_multi_getcontent — Returns a text stream of fetched output
curl_multi_info_read — Get the relevant transmission information of the currently parsed curl
curl_multi_init — Initialize a curl batch handle resource
curl_multi_remove_handle — Remove a handle resource in the curl batch handle resource
curl_multi_select — Get all the sockets associated with the cURL extension, which can then be "selected"
curl_setopt_array — Set session parameters for a curl as an array
curl_setopt — Set session parameters for a curl
curl_version — Get curl-related version information
The function of the curl_init() function initializes a curl session. The only parameter of the curl_init() function is optional and represents a URL address.
The curl_exec() function is used to execute a curl session, and the only parameter is the handle returned by the curl_init() function.
The curl_close() function is used to close a curl session. The only parameter is the handle returned by the curl_init() function.
Example 1: Basic example
Basic example﹤?php
//Initialize a cURL object
$curl = curl_init();
//Set the URL you need to crawl
curl_setopt($curl, CURLOPT_URL, 'http://www.cmx8.cn');
//Set header
curl_setopt($curl, CURLOPT_HEADER, 1);
//Set cURL parameters to require the results to be saved in a string or output to the screen.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Run cURL to request the web page
$data = curl_exec($curl);
// Close URL request
curl_close($curl);
// Display the obtained data
var_dump($data);
?>
Example 2: POST data
sendSMS.php, which can accept two form fields, one is the phone number and the other is the text message content.
POST data﹤?php
$phoneNumber ='13812345678';
$message='This message was generated by curl and php';
$curlPost ='pNUMBER=' .urlencode($phoneNumber) .'&MESSAGE=' .urlencode($message) .'&SUBMIT=Send';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.lxvoip.com/sendSMS.php');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec();
curl_close($ch);
?﹥
Example 3: Using a proxy server
Use a proxy server﹤?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.cmx8.cn');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, 'proxy.lxvoip.com:1080');
curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');
$data = curl_exec();
curl_close($ch);
?﹥
Example 4: Simulated login
Curl simulates login discuz program, suitable for DZ7.0, just change username to your username and userpass to your password.
Curl simulated login discuz program
/**
* Curl simulated login discuz program
* The forum login function with verification code enabled has not yet been implemented
*/
!extension_loaded('curl') && die('The curl extension is not loaded.');
$discuz_url = 'http://www.lxvoip.com';//Forum address
$login_url = $discuz_url .'/logging.php?action=login';//Login page address
$get_url = $discuz_url .'/my.php?item=threads'; //My post
$post_fields = array();
//The following two items do not need to be modified
$post_fields['loginfield'] = 'username';
$post_fields['loginsubmit'] = 'true';
//Username and password, must be filled in
$post_fields['username'] = 'lxvoip';
$post_fields['password'] = '88888888';
//Security Questions
$post_fields['questionid'] = 0;
$post_fields['answer'] = www.2cto.com '';
//@todo verification code
$post_fields['seccoverify'] = '';
//Get form FORMHASH
$ch = curl_init($login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
preg_match('/
if(!empty($matches)) {
$formhash = $matches[1];
} else {
Die('Not found the forumhash.');
}
//POST data, get COOKIE
$cookie_file = dirname(__FILE__) . '/cookie.txt';
//$cookie_file = tempnam('/tmp');
$ch = curl_init($login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_exec($ch);
curl_close($ch);
//Use the COOKIE obtained above to obtain the content of the page that needs to be logged in to view
$ch = curl_init($get_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
$contents = curl_exec($ch);
curl_close($ch);
var_dump($contents);
?>
Excerpted from: Fantasy Spring Continent