cURL It is a powerful PHP library. Using PHP's cURL library, you can simply and effectively crawl web pages and collect content. Set cookies to simulate logging in to web pages. Curl provides a wealth of functions. Developers can get more information from the PHP manual. cURL information. This article takes simulated login to open source China (oschina) as an example to share with you the use of cURL.
PHP's curl() is relatively efficient in crawling web pages and supports multi-threading, while file_get_contents() is slightly less efficient. Of course, you need to enable the curl extension when using curl.
Code actual combat
Let’s look at the login part of the code first:
//模拟登录 function login_post($url, $cookie, $post) { $curl = curl_init();//初始化curl模块 curl_setopt($curl, CURLOPT_URL, $url);//登录提交的地址 curl_setopt($curl, CURLOPT_HEADER, 0);//是否显示头信息 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 0);//是否自动显示返回的信息 curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie); //设置Cookie信息保存在指定的文件中 curl_setopt($curl, CURLOPT_POST, 1);//post方式提交 curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($post));//要提交的信息 curl_exec($curl);//执行cURL curl_close($curl);//关闭cURL资源,并且释放系统资源 }
The function login_post() first initializes curl_init(), and then uses curl_setopt() to set relevant option information, including the URL address to be submitted and the saved cookie File, post data (information such as user name and password), whether to return information, etc., then curl_exec executes curl, and finally curl_close() releases the resources. Note that PHP's own http_build_query() can convert arrays into connected strings.
Next, if the login is successful, we need to obtain the page information after the login is successful.
//登录成功后获取数据 function get_content($url, $cookie) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //读取cookie $rs = curl_exec($ch); //执行cURL抓取页面内容 curl_close($ch); return $rs; }
The function get_content() also initializes curl first, then sets relevant options, executes curl, and releases resources. Among them, we set CURLOPT_RETURNTRANSFER to 1 to automatically return information, and CURLOPT_COOKIEFILE can read the cookie information saved when logging in, and finally return the page content.
Our ultimate goal is to obtain the information after simulated login, which is useful information that can only be obtained after successful normal login. Next, we take logging into the mobile version of Open Source China as an example to see how to capture the information after successful login.
//设置post的数据 $post = array ( 'email' => 'oschina账户', 'pwd' => 'oschina密码', 'goto_page' => '/my', 'error_page' => '/login', 'save_login' => '1', 'submit' => '现在登录' ); //登录地址 $url = "http://m.oschina.net/action/user/login"; //设置cookie保存路径 $cookie = dirname(__FILE__) . '/cookie_oschina.txt'; //登录后要获取信息的地址 $url2 = "http://m.oschina.net/my"; //模拟登录 login_post($url, $cookie, $post); //获取登录页的信息 $content = get_content($url2, $cookie); //删除cookie文件 @ unlink($cookie); //匹配页面信息 $preg = "/<td class='portrait'>(.*)<\/td>/i"; preg_match_all($preg, $content, $arr); $str = $arr[1][0]; //输出内容 echo $str;
After running the above code, we will see that the avatar picture of the logged in user is finally obtained.
Usage summary
1. Initialize curl;
2. Use curl_setopt to set the target url, and other options;
3. curl_exec, execute curl;
4. After execution, close curl;
5 ,Output Data.
References
"Introduction to curl and curl in php", author unknown, http://www.2cto.com/kf/201208/147091.html
"POST data using PHP CURL", author: Veda , http://www.nowamagic.net/librarys/veda/detail/124
"php uses curl to simulate logging in to discuz and simulate posting", author: tianxin, http://www.cnblogs.com/tianxin2001x/archive/ 2009/10/28/1591311.html