Home Backend Development PHP Tutorial Use of single page collection function get_html based on curl data collection_PHP tutorial

Use of single page collection function get_html based on curl data collection_PHP tutorial

Jul 21, 2016 pm 03:11 PM
curl get html one use function Method exist based on data collection of collection page

This is a series and I can’t finish it in one or two days, so I will publish it one by one

General outline:

1.curl data collection series single page collection function get_html

2.curl data collection series multi-page parallel collection function get_htmls

3.curl data collection series regular processing function get _matches

4.Curl data collection series code separation

5. Curl data collection series parallel logic control function web_spider


Single page collection is the most commonly used function in the data collection process. Sometimes this collection method can only be used under server access restrictions. It is slow but can be easily controlled, so write a commonly used curl function call. It’s very important

We are familiar with Baidu and NetEase, so we will use the collection of homepages of these two websites as examples


The simplest way to write:

Copy the code The code is as follows:

$ url = 'http://www.baidu.com';
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_TIMEOUT,5 );
$html = curl_exec($ch);
if($html !== false){
echo $html;
}

Due to frequent use You can use curl_setopt_array to write it in the form of a function:
Copy the code The code is as follows:

function get_html($url,$options = array ()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
curl_close($ch);
if($html === false){
return false;
}
return $html ;
}

Copy code The code is as follows:

$url = 'http:/ /www.baidu.com';
echo get_html($url);

Sometimes you need to pass some specific parameters to get the correct page. For example, now you want to get the NetEase page:
Copy code The code is as follows:

$url = 'http://www.163.com';
echo get_html ($url);

You will see a blank with nothing, then use curl_getinfo to write a function to see what happens:
Copy code The code is as follows:

function get_info($url,$options = array()){
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$html = curl_exec($ch);
$info = curl_getinfo($ch) ;
curl_close($ch);
return $info;
}
$url = 'http://www.163.com';
var_dump(get_info($url)) ;

You can see http_code 302 Redirected. At this time, you need to pass some parameters:

Copy code Code As follows:

$url = 'http://www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
echo get_html($url,$options);

You will find out why such a page is different from the one accessed by our computer? ? ?

It seems that the parameters are still not enough for the server to determine what device our client is on, so it returns a normal version

It seems that USERAGENT

Copy code The code is as follows:

$url = 'http: //www.163.com';
$options[CURLOPT_FOLLOWLOCATION] = true;
$options[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0';
echo get_html($url,$options);

OKNow the page has come out. Basically thisget_htmlfunction can basically achieve such extended functions

Of course there are other ways to achieve this. When you clearly know the NetEase webpage, you can simply collect it:

Copy the code The code is as follows:

$url = 'http://www.163.com/index.html';
echo get_html($url);

This also works Normal collection

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326895.htmlTechArticleThis is a series that cannot be written in a day or two, so I will publish it one by one. The general outline: 1. curl data collection series single page collection function get_html 2. curl data collection series multi-page...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Table Border in HTML Table Border in HTML Sep 04, 2024 pm 04:49 PM

Guide to Table Border in HTML. Here we discuss multiple ways for defining table-border with examples of the Table Border in HTML.

Nested Table in HTML Nested Table in HTML Sep 04, 2024 pm 04:49 PM

This is a guide to Nested Table in HTML. Here we discuss how to create a table within the table along with the respective examples.

HTML margin-left HTML margin-left Sep 04, 2024 pm 04:48 PM

Guide to HTML margin-left. Here we discuss a brief overview on HTML margin-left and its Examples along with its Code Implementation.

HTML Table Layout HTML Table Layout Sep 04, 2024 pm 04:54 PM

Guide to HTML Table Layout. Here we discuss the Values of HTML Table Layout along with the examples and outputs n detail.

Moving Text in HTML Moving Text in HTML Sep 04, 2024 pm 04:45 PM

Guide to Moving Text in HTML. Here we discuss an introduction, how marquee tag work with syntax and examples to implement.

How do you parse and process HTML/XML in PHP? How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

HTML Ordered List HTML Ordered List Sep 04, 2024 pm 04:43 PM

Guide to the HTML Ordered List. Here we also discuss introduction of HTML Ordered list and types along with their example respectively

HTML onclick Button HTML onclick Button Sep 04, 2024 pm 04:49 PM

Guide to HTML onclick Button. Here we discuss their introduction, working, examples and onclick Event in various events respectively.

See all articles