Home php教程 php手册 php里常用的远程采集函数

php里常用的远程采集函数

Jun 13, 2016 am 10:10 AM
curl php use function exist operate data use Remotely collection

在php中采集数据最常用的就是使用curl函数来操作,因为curl函数是高性能并且多线程功能,下面我来介绍一个php采集程序,各位同学有需要可进入参考。

函数

 代码如下 复制代码

/**
 * 获取远程url的内容
 * @param string $url
 * @return string
 */
function get_url_content($url) {
  if(function_exists(curl_init)) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);
     
    $file_contents = curl_exec($ch);
    curl_close($ch);
  } else {
    $file_contents = file_get_contents($url);
  }
 
  return $file_contents;
}

调用方法

 代码如下 复制代码

$url = 'http://www.bKjia.c0m';
$a = get_url_content($url);
echo $a;

上面只是一个简单的实例,如果我们想应用可参考我自己写的采集程序了。

1,获取目标网页数据;
2,截取相关内容;
3,写入数据库/生成HMTL文件;
下面就按照步骤来试试!
获取目标网页数据
1, 确定好,要获取的网页地址甚至形式,这里我们采用的网址是:/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=1&tr=59
这个页面是有分页的,根据规律,我们找到只需要改变page参数就可以翻页!即:

我 们的网页形式是:/index.html?pageconfig=catalog_byproducttype& amp;intProductTypeID=1&strStartChar=A&intResultsPage= NUMBER &tr=59

红色部分是当前页码对应值!只需要改变该值就可以了!


2,获取页面内容:自然要用到PHP函数了!这里,两个函数都可以!他们分别是:


file_get_contents() 把整个文件读入一个字符串中。和 file() 一样,不同的是file_get_contents() 把文件读入一个字符串。file_get_contents() 函数是用于将文件的内容读入到一个字符串中的首选方法。如果操作系统支持,还会使用内存映射技术来增强性能。语法: file_get_contents( path , include_path , context , start , max_length ) curl() 了解详细,请参阅官网文档:http://cn.php.net/curl fopen()函数打开文件或者 URL。如果打开失败,本函数返回 FALSE。语法: fopen(filename,mode,include_path,context) 当然,我们采用的是第一个!其实,所有的都差不多,有兴趣的童子可以常识常识其他的!

 代码如下 复制代码

$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59”);
echo $oldcontent;
?>

运行PHP程序,上面的代码可以显示出整个网页!由于原网页采用的是绝地路径,所以现在显示的效果和原来的是一模一样的!
接下来就是要,截取内容了!截取内容的方法也有很多,今天介绍的一种比较简单:

 代码如下 复制代码
$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
echo substr($oldcontent,$b,$c-1);
?>

Code

输出的,即为所需要的结果!
写入数据库和写入文件都是比较简单的!这里就写入文件了!

 代码如下 复制代码
$oldcontent = file_get_contents(“index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
$a = substr($oldcontent,$b,$c-1);
$file = date(‘YmdHis’).”.html”;
$fp = fopen($file,”w+”);
if(!is_writable($file)){
die(“File “.$file.” can not be written”);
}
else {
file_put_contents($file, $a);
echo “success”;
}
fclose($fp);
?>

Code

OK,继续上班,今天的截取就到这里,下次就说说正则表达式提取内容

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

CakePHP Project Configuration CakePHP Project Configuration Sep 10, 2024 pm 05:25 PM

In this chapter, we will understand the Environment Variables, General Configuration, Database Configuration and Email Configuration in CakePHP.

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

CakePHP Date and Time CakePHP Date and Time Sep 10, 2024 pm 05:27 PM

To work with date and time in cakephp4, we are going to make use of the available FrozenTime class.

CakePHP File upload CakePHP File upload Sep 10, 2024 pm 05:27 PM

To work on file upload we are going to use the form helper. Here, is an example for file upload.

CakePHP Routing CakePHP Routing Sep 10, 2024 pm 05:25 PM

In this chapter, we are going to learn the following topics related to routing ?

Discuss CakePHP Discuss CakePHP Sep 10, 2024 pm 05:28 PM

CakePHP is an open-source framework for PHP. It is intended to make developing, deploying and maintaining applications much easier. CakePHP is based on a MVC-like architecture that is both powerful and easy to grasp. Models, Views, and Controllers gu

CakePHP Creating Validators CakePHP Creating Validators Sep 10, 2024 pm 05:26 PM

Validator can be created by adding the following two lines in the controller.

How To Set Up Visual Studio Code (VS Code) for PHP Development How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

See all articles