Home Backend Development PHP Tutorial Use of single page collection function get_html based on curl data collection_PHP tutorial

Use of single page collection function get_html based on curl data collection_PHP tutorial

Jul 21, 2016 pm 03:11 PM
curl get html one use function Method exist based on data collection of collection page

This is a series and I can’t finish it in one or two days, so I will publish it one by one

General outline:

1.curl data collection series single page collection function get_html

2.curl data collection series multi-page parallel collection function get_htmls

3.curl data collection series regular processing function get _matches

4.Curl data collection series code separation

5. Curl data collection series parallel logic control function web_spider


Single page collection is the most commonly used function in the data collection process. Sometimes this collection method can only be used under server access restrictions. It is slow but can be easily controlled, so write a commonly used curl function call. It’s very important

We are familiar with Baidu and NetEase, so we will use the collection of homepages of these two websites as examples


The simplest way to write:

Copy the code The code is as follows:

$ url = 'http://www.baidu.com';
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_TIMEOUT,5 );
$html = curl_exec($ch);
if($html !== false){
echo $html;
}

Due to frequent use You can use curl_setopt_array to write it in the form of a function:
Copy the code The code is as follows:

function get_html($url,$options = array ()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
curl_close($ch);
if($html === false){
return false;
}
return $html ;
}

Copy code The code is as follows:

$url = 'http:/ /www.baidu.com';
echo get_html($url);

Sometimes you need to pass some specific parameters to get the correct page. For example, now you want to get the NetEase page:
Copy code The code is as follows:

$url = 'http://www.163.com';
echo get_html ($url);

You will see a blank with nothing, then use curl_getinfo to write a function to see what happens:
Copy code The code is as follows:

function get_info($url,$options = array()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
$info = curl_getinfo($ch) ;
curl_close($ch);
return $info;
}
$url = 'http://www.163.com';
var_dump(get_info($url)) ;

You can see http_code 302 Redirected. At this time, you need to pass some parameters:

Copy code Code As follows:

$url = 'http://www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
echo get_html($url,$options);

You will find out why such a page is different from the one accessed by our computer? ? ?

It seems that the parameters are still not enough for the server to determine what device our client is on, so it returns a normal version

It seems that USERAGENT

Copy code The code is as follows:

$url = 'http: //www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
$options[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0';
echo get_html($url,$options);

OKNow the page has come out. Basically thisget_htmlfunction can basically achieve such extended functions

Of course there are other ways to achieve this. When you clearly know the NetEase webpage, you can simply collect it:

Copy the code The code is as follows:

$url = 'http://www.163.com/index.html';
echo get_html($url);

This also works Normal collection

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326895.htmlTechArticleThis is a series that cannot be written in a day or two, so I will publish it one by one. The general outline: 1. curl data collection series single page collection function get_html 2. curl data collection series multi-page...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Table Border in HTML Table Border in HTML Sep 04, 2024 pm 04:49 PM

Guide to Table Border in HTML. Here we discuss multiple ways for defining table-border with examples of the Table Border in HTML.

HTML margin-left HTML margin-left Sep 04, 2024 pm 04:48 PM

Guide to HTML margin-left. Here we discuss a brief overview on HTML margin-left and its Examples along with its Code Implementation.

Nested Table in HTML Nested Table in HTML Sep 04, 2024 pm 04:49 PM

This is a guide to Nested Table in HTML. Here we discuss how to create a table within the table along with the respective examples.

HTML Table Layout HTML Table Layout Sep 04, 2024 pm 04:54 PM

Guide to HTML Table Layout. Here we discuss the Values of HTML Table Layout along with the examples and outputs n detail.

HTML Input Placeholder HTML Input Placeholder Sep 04, 2024 pm 04:54 PM

Guide to HTML Input Placeholder. Here we discuss the Examples of HTML Input Placeholder along with the codes and outputs.

Moving Text in HTML Moving Text in HTML Sep 04, 2024 pm 04:45 PM

Guide to Moving Text in HTML. Here we discuss an introduction, how marquee tag work with syntax and examples to implement.

HTML Ordered List HTML Ordered List Sep 04, 2024 pm 04:43 PM

Guide to the HTML Ordered List. Here we also discuss introduction of HTML Ordered list and types along with their example respectively

HTML onclick Button HTML onclick Button Sep 04, 2024 pm 04:49 PM

Guide to HTML onclick Button. Here we discuss their introduction, working, examples and onclick Event in various events respectively.

See all articles