In normal development, we often encounter grabbing the content of a certain page, but sometimes certain pages require login to access. The most common one is the forum. At this time, we need to use curl to simulate login. The general idea: you need to first request to extract cookies and save them, and then use the saved cookies to send a request again to get the page content. Let’s go directly to the code
<?php /** * @Brief PHP读取Curl模拟登陆, 获取cookie, 带cookie进行请求 * @Date: 2016/7/2 * @Time: 9:41 */ //设置cookie保存位置 $cookieFile = dirname(__FILE__).'cookie.curl.tmp'; //第一步:获取cookie $url = 'http://www.pythontab.com'; $data = array( 'username' => 'pythontab', 'password' => 'pythontab', ); //curl初始化 $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); //设置为post请求 curl_setopt($ch, CURLOPT_POST, true); //设置附带返回header信息为空 curl_setopt($ch, CURLOPT_HEADER, 0); //post数据 curl_setopt($ch, CURLOPT_POSTFIELDS, $data); //cookie保存文件位置 curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); //设置数据返回作为变量储存,而不是直接输出 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //执行请求 $ret = curl_exec($ch); //关闭连接 curl_close($ch); //第二步:附带cookie请求需要登陆的页面 $url = 'http://www.pythontab.com'; //curl初始化 $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); //设置为post请求 curl_setopt($ch, CURLOPT_POST, true); //设置附带返回header信息为空 curl_setopt($ch, CURLOPT_HEADER, 0); //设置cookie信息文件位置, 注意与第二步中的获取不同,这里是读取 curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); //设置数据返回作为变量储存,而不是直接输出 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //执行请求 $ret = curl_exec($ch); //关闭连接 curl_close($ch); //打印抓取内容 var_dump($ret);
so that we can capture the content that requires logging in to access the page. Pay attention to the above The address is just an example, you need to replace it with the address of the page you want to crawl. In this way we can do a lot of things, but don't do bad things!