使用phpQuery轻巧采集网页内容
使用phpQuery轻松采集网页内容
phpQuery是一个基于PHP的服务端开源项目,它可以让PHP开发人员轻松处理DOM文档内容,比如获取某新闻网站的头条信息。更有意思的是,它采用了jQuery的思想,你可以像使用jQuery一样处理页面内容,获取你想要的页面信息。
采集头条
先看一实例,现在我要采集新浪网国内新闻的头条,代码如下:
<span style="color: #0000ff;">include</span> 'phpQuery/phpQuery.php'<span style="color: #000000;">; phpQuery</span>::newDocumentFile('http://news.sina.com.cn/china'<span style="color: #000000;">); </span><span style="color: #0000ff;">echo</span> pq(".blkTop h1:eq(0)")->html();
简单的三行代码,就可以获取头条内容。首先在程序中包含phpQuery.php核心程序,然后调用读取目标网页,最后输出对应标签下的内容。
pq()是一个功能强大的方法,跟jQuery的$()如出一辙,jQuery的选择器基本上都能使用在phpQuery上,只要把“.”变成“->”。如上例中,pq(".blkTop h1:eq(0)")抓取了页面class属性为blkTop的DIV元素,并找到该DIV内部的第一个h1标签,然后用html()方法获取h1标签里的内容(带html标签),也就是我们要获取的头条信息,如果使用text()方法,则只获取头条的文本内容。当然要使用好phpQuery,关键是要找对文档中对应内容的节点。
采集文章列表
下面再来看一个例子,获取helloweba.com网站的blog列表,请看代码:
<span style="color: #0000ff;">include</span> 'phpQuery/phpQuery.php'<span style="color: #000000;">; phpQuery</span>::newDocumentFile('http://www.helloweba.com/blog.html'<span style="color: #000000;">); </span><span style="color: #800080;">$artlist</span> = pq(".blog_li"<span style="color: #000000;">); </span><span style="color: #0000ff;">foreach</span>(<span style="color: #800080;">$artlist</span> <span style="color: #0000ff;">as</span> <span style="color: #800080;">$li</span><span style="color: #000000;">){ </span><span style="color: #0000ff;">echo</span> pq(<span style="color: #800080;">$li</span>)->find('h2')->html().""<span style="color: #000000;">; } </span>
通过循环列表中的DIV,找出文章标题并输出,就是这么简单。
解析XML文档
假设现在有一个这样的test.xml文档:
<?xml version="1.0" encoding="utf-8"?> <root> <contact> <name>张三</name> <age>22</age> </contact> <contact> <name>王五</name> <age>18</age> </contact> </root>
现在我要获取名字为张三的联系人的年龄,代码如下:
<span style="color: #0000ff;">include</span> 'phpQuery/phpQuery.php'<span style="color: #000000;">; phpQuery</span>::newDocumentFile('test.xml'<span style="color: #000000;">); </span><span style="color: #0000ff;">echo</span> pq('contact > age:eq(0)');
结果输出:22
像jQuery一样,精准查找文档节点,输出节点下的内容,解析一个XML文档就是这么简单。现在你不必为采集网站内容而使用那些头疼的正则算法、内容替换等繁琐的代码了,有了phpQuery,一切就变得轻松多了。
项目官网地址:http://code.google.com/p/phpquery/

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Detailed explanation of jQuery reference method: Quick start guide jQuery is a popular JavaScript library that is widely used in website development. It simplifies JavaScript programming and provides developers with rich functions and features. This article will introduce jQuery's reference method in detail and provide specific code examples to help readers get started quickly. Introducing jQuery First, we need to introduce the jQuery library into the HTML file. It can be introduced through a CDN link or downloaded

How to use PUT request method in jQuery? In jQuery, the method of sending a PUT request is similar to sending other types of requests, but you need to pay attention to some details and parameter settings. PUT requests are typically used to update resources, such as updating data in a database or updating files on the server. The following is a specific code example using the PUT request method in jQuery. First, make sure you include the jQuery library file, then you can send a PUT request via: $.ajax({u

How to remove the height attribute of an element with jQuery? In front-end development, we often encounter the need to manipulate the height attributes of elements. Sometimes, we may need to dynamically change the height of an element, and sometimes we need to remove the height attribute of an element. This article will introduce how to use jQuery to remove the height attribute of an element and provide specific code examples. Before using jQuery to operate the height attribute, we first need to understand the height attribute in CSS. The height attribute is used to set the height of an element

Title: jQuery Tips: Quickly modify the text of all a tags on the page In web development, we often need to modify and operate elements on the page. When using jQuery, sometimes you need to modify the text content of all a tags in the page at once, which can save time and energy. The following will introduce how to use jQuery to quickly modify the text of all a tags on the page, and give specific code examples. First, we need to introduce the jQuery library file and ensure that the following code is introduced into the page: <

Title: Use jQuery to modify the text content of all a tags. jQuery is a popular JavaScript library that is widely used to handle DOM operations. In web development, we often encounter the need to modify the text content of the link tag (a tag) on the page. This article will explain how to use jQuery to achieve this goal, and provide specific code examples. First, we need to introduce the jQuery library into the page. Add the following code in the HTML file:

jQuery is a popular JavaScript library that is widely used to handle DOM manipulation and event handling in web pages. In jQuery, the eq() method is used to select elements at a specified index position. The specific usage and application scenarios are as follows. In jQuery, the eq() method selects the element at a specified index position. Index positions start counting from 0, i.e. the index of the first element is 0, the index of the second element is 1, and so on. The syntax of the eq() method is as follows: $("s

How to tell if a jQuery element has a specific attribute? When using jQuery to operate DOM elements, you often encounter situations where you need to determine whether an element has a specific attribute. In this case, we can easily implement this function with the help of the methods provided by jQuery. The following will introduce two commonly used methods to determine whether a jQuery element has specific attributes, and attach specific code examples. Method 1: Use the attr() method and typeof operator // to determine whether the element has a specific attribute

jQuery is a popular JavaScript library widely used in web development. During web development, it is often necessary to dynamically add new rows to tables through JavaScript. This article will introduce how to use jQuery to add new rows to a table, and provide specific code examples. First, we need to introduce the jQuery library into the HTML page. The jQuery library can be introduced in the tag through the following code:
