


Use of single page collection function get_html based on curl data collection_PHP tutorial
This is a series and I can’t finish it in one or two days, so I will publish it one by one
General outline:
1.curl data collection series single page collection function get_html
2.curl data collection series multi-page parallel collection function get_htmls
3.curl data collection series regular processing function get _matches
4.Curl data collection series code separation
5. Curl data collection series parallel logic control function web_spider
Single page collection is the most commonly used function in the data collection process. Sometimes this collection method can only be used under server access restrictions. It is slow but can be easily controlled, so write a commonly used curl function call. It’s very important
We are familiar with Baidu and NetEase, so we will use the collection of homepages of these two websites as examples
The simplest way to write:
$ url = 'http://www.baidu.com';
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_TIMEOUT,5 );
$html = curl_exec($ch);
if($html !== false){
echo $html;
}
Due to frequent use You can use curl_setopt_array to write it in the form of a function:
function get_html($url,$options = array ()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
curl_close($ch);
if($html === false){
return false;
}
return $html ;
}
$url = 'http:/ /www.baidu.com';
echo get_html($url);
Sometimes you need to pass some specific parameters to get the correct page. For example, now you want to get the NetEase page:
$url = 'http://www.163.com';
echo get_html ($url);
You will see a blank with nothing, then use curl_getinfo to write a function to see what happens:
function get_info($url,$options = array()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
$info = curl_getinfo($ch) ;
curl_close($ch);
return $info;
}
$url = 'http://www.163.com';
var_dump(get_info($url)) ;
You can see http_code 302 Redirected. At this time, you need to pass some parameters:
$url = 'http://www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
echo get_html($url,$options);
You will find out why such a page is different from the one accessed by our computer? ? ?
It seems that the parameters are still not enough for the server to determine what device our client is on, so it returns a normal version
It seems that USERAGENT
$url = 'http: //www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
$options[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0';
echo get_html($url,$options);
OKNow the page has come out. Basically thisget_htmlfunction can basically achieve such extended functions
Of course there are other ways to achieve this. When you clearly know the NetEase webpage, you can simply collect it:
$url = 'http://www.163.com/index.html';
echo get_html($url);
This also works Normal collection

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Guide to Table Border in HTML. Here we discuss multiple ways for defining table-border with examples of the Table Border in HTML.

This is a guide to Nested Table in HTML. Here we discuss how to create a table within the table along with the respective examples.

Guide to HTML margin-left. Here we discuss a brief overview on HTML margin-left and its Examples along with its Code Implementation.

Guide to HTML Table Layout. Here we discuss the Values of HTML Table Layout along with the examples and outputs n detail.

Guide to Moving Text in HTML. Here we discuss an introduction, how marquee tag work with syntax and examples to implement.

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Guide to the HTML Ordered List. Here we also discuss introduction of HTML Ordered list and types along with their example respectively

Guide to HTML onclick Button. Here we discuss their introduction, working, examples and onclick Event in various events respectively.
