Home PHP Libraries Other libraries php website crawling library
php website crawling library
<?php
header("Content-Type: text/html; charset=UTF-8");
require("phpQuery.php");
$hj = QueryList::Query('http://mobile.csdn.net/',array("title"=>array('.unit h1','text')));
//dump($hj->data);
$data = QueryList::Query('http://cms.querylist.cc/bizhi/453.html',array(
    'image' => array('img','src')
    ))->data;
//
$data = QueryList::Query('http://cms.querylist.cc/google/list_1.html',array(
    'link' => array('a','href')
    ))->data;
$page = 'http://cms.querylist.cc/news/566.html';
$reg = array(
    'title' => array('h1','text'),
    'date' => array('.pt_info','text','-span -a',function($content){
        $arr = explode(' ',$content);
        return $arr[0];
    }),
    'content' => array('.post_content','html','a -.content_copyright -script',function($content){
     
            $doc = phpQuery::newDocumentHTML($content);
            $imgs = pq($doc)->find('img');
            foreach ($imgs as $img) {
                $src = 'http://cms.querylist.cc'.pq($img)->attr('src');
                $localSrc = 'w/'.md5($src).'.jpg';
                $stream = file_get_contents($src);
                file_put_contents($localSrc,$stream);
                pq($img)->attr('src',$localSrc);
            }
            return $doc->htmlOuter();
    })
    );
$rang = '.content';
$ql = QueryList::Query($page,$reg,$rang);
$data = $ql->getData();
dump($data);

supports crawling websites and crawling. It is very powerful. It is a server-side open source project based on PHP. It allows PHP developers to easily process DOM document content, such as obtaining the headline information of a news website. What's more interesting is that it uses the idea of ​​​​jQuery. You can process the page content just like using jQuery to get the page information you want.

Disclaimer

All resources on this site are contributed by netizens or reprinted by major download sites. Please check the integrity of the software yourself! All resources on this site are for learning reference only. Please do not use them for commercial purposes. Otherwise, you will be responsible for all consequences! If there is any infringement, please contact us to delete it. Contact information: admin@php.cn

Related Article

Memcache vs. Memcached: Which PHP Library Should You Choose? Memcache vs. Memcached: Which PHP Library Should You Choose?

09 Nov 2024

Distinguishing "Memcache" and "Memcached" in PHPPHP offers two memcached libraries: memcache and memcached. Understanding their differences helps...

How Do I Link Static Libraries That Depend on Other Static Libraries? How Do I Link Static Libraries That Depend on Other Static Libraries?

13 Dec 2024

Linking Static Libraries to Other Static Libraries: A Comprehensive ApproachStatic libraries provide a convenient mechanism to package reusable...

Memcache vs Memcached: Which PHP Memcached Library Should You Choose? Memcache vs Memcached: Which PHP Memcached Library Should You Choose?

19 Nov 2024

Memcache vs Memcached: Choosing the Right PHP Memcached LibraryIntroductionPHP offers two seemingly similar memcached libraries: memcache and...

Which PHP Library Best Fits Your Email Address Validation Needs? Which PHP Library Best Fits Your Email Address Validation Needs?

18 Nov 2024

PHP Email Address Validation Libraries UncoveredEmail address validation plays a crucial role in data validation, but creating a...

laravel - What should the standardized PHP class library naming look like? laravel - What should the standardized PHP class library naming look like?

06 Jul 2016

I have seen many open source projects in the form of class.classname.php, but I have also seen many frameworks in the form of classname.class.php. Where should I place this class? I personally prefer the .class.php form, because in some frameworks, after importing third-party class libraries and specifying class libraries...

Memcache vs. Memcached: Which PHP Caching Library Should You Choose? Memcache vs. Memcached: Which PHP Caching Library Should You Choose?

12 Nov 2024

Memcache vs. Memcached: Choosing the Right PHP Library for Your Cache NeedsIn the realm of PHP caching libraries, Memcache and Memcached stand out...

See all articles