基于curl数据采集之单页面并行采集函数get

Home

php教程

php手册

基于curl数据采集之单页面并行采集函数get_htmls的使用

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 13, 2016 am 11:53 AM

curl get html use function based on parallel data collection use of collection page

用第一篇的get_html()实现简单的数据采集，由于是一个一个执行才采集数据的传输时间就会是所有页面下载的总时长，一个页面假设1秒，那么10个页面就是10秒了。所幸curl还提供了并行处理的功能。

要写一个并行采集的函数，先要了解要采集什么样的页面，对采集的页面用什么请求，才能写出一个相对常用的函数。

功能需求分析：

返回什么？

当然每一个页面的html集合成的数组

传递什么参数？

编写get_html()时，我们知道了可以用options数组来传递更多的curl参数，那么多页面同时采集函数的编写这种特性也得保留下来。

什么类型的参数？

无论是请求网页HTML，还是调用互联网api接口，get和post传递参数总是请求同一个页面或者接口，只是参数不同罢了。那么参数的类型是：

get_htmls($url,$options);

$url 是string

$options，是一个二维数组，每一个页面的参数为一个数组。

这样的话，貌似解决了问题。但是我找遍了curl的手册都没有看到get的参数传递在什么地方，所以只能$url 是数组的形式传递并且增加一个method参数

函数的原型就定下来了get_htmls($urls,$options = array, $method = ‘get');代码如下：

复制代码代码如下:

常用的get请求是通过改变url参数来实现的，又因为我们的函数是针对数据采集的。必然是分类采集，所以网址类似于这种：

http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8

上面五个页面是很有规律的，改变的仅仅是pn的值。

复制代码代码如下:

$urls = array();
for($i=1; $i $urls[] = 'http://www.baidu.com/s?wd=shili&pn='.(($i-1)*10).'&ie=utf-8';
}
$option[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0';
$htmls = get_htmls($urls,$option);
foreach($htmls as $html){
echo $html;//这里得到html 就可以进行数据处理了
}

模拟常用的post请求：

写一个post.php文件如下：

复制代码代码如下:

if(isset($_POST['username']) && isset($_POST['password'])){
echo '用户名是: '.$_POST['username'].' 密码是: '.$_POST['password'];
}else{
echo '请求错误!';
}

然后调用如下:

复制代码代码如下:

$url = 'http://localhost/yourpath/post.php';//这里是你的路径
$options = array();
for($i=1; $i     $option[CURLOPT_POSTFIELDS] = 'username=user'.$i.'&password=pass'.$i;
     $options[] = $option;
}
$htmls = get_htmls($url,$options,'post');
foreach($htmls as $html){
     echo $html;//这里得到html 就可以进行数据处理了
}

这样这个get_htmls函数也基本能实现一些数据采集的功能了

今天分享就到这里写的不好的讲得不清楚的请多多指教

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7378

Java Tutorial

1628

CakePHP Tutorial

1357

Laravel Tutorial

1267

PHP Tutorial

1216

Related knowledge

Table Border in HTML Sep 04, 2024 pm 04:49 PM

Guide to Table Border in HTML. Here we discuss multiple ways for defining table-border with examples of the Table Border in HTML.

Nested Table in HTML Sep 04, 2024 pm 04:49 PM

This is a guide to Nested Table in HTML. Here we discuss how to create a table within the table along with the respective examples.

HTML margin-left Sep 04, 2024 pm 04:48 PM

Guide to HTML margin-left. Here we discuss a brief overview on HTML margin-left and its Examples along with its Code Implementation.

HTML Table Layout Sep 04, 2024 pm 04:54 PM

Guide to HTML Table Layout. Here we discuss the Values of HTML Table Layout along with the examples and outputs n detail.

HTML Ordered List Sep 04, 2024 pm 04:43 PM

Guide to the HTML Ordered List. Here we also discuss introduction of HTML Ordered list and types along with their example respectively

How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Moving Text in HTML Sep 04, 2024 pm 04:45 PM

Guide to Moving Text in HTML. Here we discuss an introduction, how marquee tag work with syntax and examples to implement.

HTML Input Placeholder Sep 04, 2024 pm 04:54 PM

Guide to HTML Input Placeholder. Here we discuss the Examples of HTML Input Placeholder along with the codes and outputs.

See all articles