cURL is a powerful PHP library. Using PHP's cURL library, you can simply and effectively crawl web pages and collect content. Set cookies to simulate logging in to web pages. curl provides a wealth of functions. Developers can get more information from the PHP manual. More information about cURL. This article takes simulated login to open source China (oschina) as an example to share with you the use of cURL.
PHP's curl() is relatively efficient in crawling web pages and supports multi-threading, while file_get_contents() is slightly less efficient. Of course, you need to enable the curl extension when using curl.
Code actual combat
First let’s look at the login code:
//Simulate login
function login_post($url, $cookie, $post) {
$curl = curl_init();//Initialize curl module
curl_setopt( $curl, CURLOPT_URL, $url);//Login submitted address
curl_setopt($curl, CURLOPT_HEADER, 0);//Whether to display header information
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 0);//Whether to automatically display the returned Information
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie); //Set cookie information and save it in the specified file
curl_setopt($curl, CURLOPT_POST, 1); //Submit in post mode
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query( $post));//Information to be submitted
curl_exec($curl);//Execute cURL
curl_close($curl);//Close cURL resources and release system resources
}
The function login_post() is first initialized curl_init(), then use curl_setopt() to set relevant option information, including the url address to be submitted, saved cookie files, post data (user name and password information), whether to return information, etc., then curl_exec executes curl, and finally curl_close ()Release resources. Note that PHP's own http_build_query() can convert arrays into connected strings.
Next, if the login is successful, we need to obtain the page information after the login is successful.
//Get data after successful login
function get_content($url, $cookie) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0) ;
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //Read cookie
$rs = curl_exec($ch); //Execute cURL to capture page content
curl_close( $ch);
return $rs;
}
function get_content() also initializes curl first, then sets relevant options, executes curl, and releases resources. Among them, we set CURLOPT_RETURNTRANSFER to 1 to automatically return information, and CURLOPT_COOKIEFILE can read the cookie information saved when logging in, and finally return the page content.
Our ultimate goal is to obtain the information after simulated login, which is useful information that can only be obtained after successful normal login. Next, we take logging into the mobile version of Open Source China as an example to see how to capture the information after successful login.
//Set post data
$post = array (
'email' => 'oschina account',
'pwd' => 'oschina password',
'goto_page' => '/my',
'error_page ' => '/login',
'save_login' => '1',
'submit' => 'Log in now'
);
//Login address
$url = "http://m.oschina. net/action/user/login";
//Set cookie storage path
$cookie = dirname(__FILE__) . '/cookie_oschina.txt';
//The address to obtain information after logging in
$url2 = "http:/ /m.oschina.net/my";
//Simulate login
login_post($url, $cookie, $post);
//Get login page information
$content = get_content($url2, $cookie);
//Delete cookie file
@ unlink($cookie);
//Matching page information
$preg = "/
After running the above code, we will see that the avatar image of the logged in user is finally obtained.
Usage summary
1. Initialize curl;
2. Use curl_setopt to set the target url, and other options;
3. curl_exec, execute curl;
4. After execution, close curl;
5. Output data.