php-爬虫练习：抓取京东商品列表与详情-2019年10月18日 - PHP

Community Learn Tools Library Leisure

English

Home > List of blog posts > php-爬虫练习：抓取京东商品列表与详情-2019年10月18日

Blogger Information

Blog 35

fans 0

comment 0

visits 44052

Special Recommendation

More>

Related recommendations

Related Tutorials

Popular Recommendations

Latest courses

The latest ThinkPHP 5.1 world premiere video tutorial (60 days to become a PHP expert online training course)

1417694 times of learning
Collection
PHP introductory tutorial one: Learn PHP in one week

4258855 times of learning
Collection
JAVA Beginner's Video Tutorial

2485696 times of learning
Collection

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template

php-爬虫练习：抓取京东商品列表与详情-2019年10月18日

Victor的博客

Original

3387 people have browsed it

10月18日
使用PHP抓取函数，练习从京东商城获得商品列表和详情信息。

实例分析：

1、京东为开发者提供开放的接口平台（宙斯），获得授权的用户可以从平台接口中获得所有关心的数据，可参考这里：http://open.jd.com/home/home#/doc/common?listId=892

2、本例中没有使用授权，仅从单一的一个接口中获得商品列表和相关信息。小范围抓取数据。

3、抓取前准备：

a、从京东页面中找到某类产品的分类接口，在本例中，我设定的产品是“手机自营”
b、建立数据表，本例中为省事，直接打印到屏幕了
c、将课堂所讲的公共函数、配置方法和数据库操作方法，整合到一个抓取类中，本例中是CurlSpider
e、编写代码开始抓取数据，代码如下：(抓取数据太多，只取第一页)

实例
<?php

include 'spider.class.php';
//构造接口地址
$url = "https://search-x.jd.com/Search";
$params = array(
	"callback" => 'jQuery6105339',
	"area" => '5',
	"enc" => 'utf-8',
	"keyword" => '手机自营',
	"adType" => '7',
	"page" => '1',
	"ad_ids" => '291:19',
	"xtest" => 'new_search',
	"_" => '1571621477591',
);
$paramstring = http_build_query($params);

//实例化爬虫类
$curlObj = new curlSpider();
//获取接口内容
$content = $curlObj->curl_data($url, $paramstring);

//截取需要的内容
$substr = substr($content, 21, -2);
$result = json_decode($substr, true);

$flag = ['sku_id', 'ad_title', 'sku_price', 'vender_id', 'comment_num'];
// $filer = "/<([a-z]+)[^>]*>/i"; //正则过滤
$filer = "/(<([a-z]+)[^>]*>)|(<([\/][a-z]+)[^>]*>)/i";
echo '编号' . "\t" . '商品名称' . "\t" . '商品价格' . "\t" . '供应商编号' . "\t" . '评论数' . "<br>";
foreach ($result as $values) {
	foreach ($values as $key => $value) {
		if (in_array($key, $flag)) {
			$$key = preg_replace($filer, "", $value);
		}
	}
	echo $sku_id ."\t". $ad_title ."\t". '￥'. $sku_price ."\t". $vender_id ."\t". $comment_num . "<br>";
}

?>
运行实例 »
点击 "运行实例" 按钮查看在线实例

抓取到到原始数据如图（展开其中一项）：

经过简单加工，提取需要的几个信息后，输出如下图：

总结：

简单了解了PHP爬虫的编写过程，要想掌握，还需要学习很多知识；

初学主要关注的是前面接口分析、连接的设计和后期字符的处理。

Correction status：qualified

Teacher's comments：你是第一个交抓取成品作业的，非常不错。

Statement of this Website

The copyright of this blog article belongs to the blogger. Please specify the address when reprinting! If there is any infringement or violation of the law, please contact admin@php.cn Report processing!

All comments Speak rationally on civilized internet, please comply with News Comment Service Agreement

0 comments

Author's latest blog post

通用的布局实战案例-2019年9月5日

2019-10-06 20:11:28