Home PHP Framework ThinkPHP How to implement thinkphp automatic collection

How to implement thinkphp automatic collection

Aug 22, 2019 am 09:46 AM
thinkphp

How to implement thinkphp automatic collection

thinkphp实现自动采集功能的三种方法:

方法一:QueryList

个人感觉比较好用,采集详情比较不错的选择,但是采集复杂一点的列表,不好用。具体使用:

How to implement thinkphp automatic collection

控制器示例:

public function index(){
    // 使用采集类
    // 使用手册 :http://www.php.cn/php/php-QueryList3-ThinkPHP.html
    import('Org.QL.QueryList');
    $url = "http://www.zyctd.com/gqqg/";
    $reg = array();
    $reg['title'] = array('.sulist_title','text');
    $reg['shuliang'] = array('.su_li1','html');
    $obj = new \QueryList($url,$reg);
    $data = $obj->jsonArr;
    // foreach($data as $v){
    //     echo "<br>".$v[&#39;title&#39;].&#39;___&#39;.$v[&#39;shuliang&#39;]."<br>";
    // }
    p($data);
}
Copy after login

相关推荐:《ThinkPHP教程

方法二:simple_html_dom

这个方法比较适合采集一点结构简单的页面,HTML标签的类名比较明确的页面,还不错。具体使用:

How to implement thinkphp automatic collection

控制器示例:

public function index(){
    // 参考文档:http://microphp.us/plugins/public/microphp_res/simple_html_dom/manual.htm#section_quickstart
    // 下载地址:https://github.com/samacs/simple_html_dom/edit/master/simple_html_dom.php
    // 使用方法:http://www.thinkphp.cn/topic/21635.html
    import("Org.Util.simple_html_dom", &#39;&#39;, &#39;.php&#39;);
    $html = file_get_html(&#39;http://www.zyctd.com/gqqg/&#39;);
    $ret = $html->find(&#39;.supply_list_box ul&#39;,0)->first_child();
    foreach($ret as $v){
        echo $v;
    };
}
Copy after login

方法三:获取页面HTMl,进行正则匹配采集

举例一个Demo:

采集一个页面:

http://www.zyctd.com/gqqg/

我要获取上面的四个信息:标题,数量,时间,跳转链接。

How to implement thinkphp automatic collection

获取这些信息,通过上面两种方法都采集不到,最后才选用的正则来采集。具体方法:

public function index(){
    $url = "http://www.zyctd.com/gqqg/";
    // http://www.zyctd.com/gqqg-p1.html
    $supplyDB = M(&#39;supply&#39;);    
    $urlList = array();
    $array = array();
    for($x=1; $x<=1; $x++) {
        array_push($urlList,"http://www.zyctd.com/gqqg-p".$x.".html");
    };        
    foreach($urlList as $v){
        $curPageList = $this->getInfo($v);
        array_push($array,$curPageList);
    };
    foreach($array as $v){
        foreach($v as $vv){
            //echo $vv[&#39;title&#39;]."__".$vv[&#39;weight&#39;]."__".$vv[&#39;time&#39;]."<br>";
            $data = array();
            $data[&#39;title&#39;] = $vv[&#39;title&#39;];
            $data[&#39;weight&#39;] = $vv[&#39;weight&#39;];
            $data[&#39;add_time&#39;] = $vv[&#39;add_time&#39;];
            $data[&#39;url&#39;] = $vv[&#39;url&#39;];
            //$res = $supplyDB->add($data);
            //echo $res;
            echo "<p><span style=&#39;display:inline-block; width:110px;&#39;>".$vv[&#39;title&#39;]."</span>
            <span style=&#39;display:inline-block; width:110px;&#39;>".$vv[&#39;weight&#39;]."</span>
            <span style=&#39;display:inline-block; width:110px;&#39;>".$vv[&#39;add_time&#39;]."</span>
            <span style=&#39;display:inline-block; width:110px;&#39;>".$vv[&#39;url&#39;]."</span></p>";
        }
    }
        // 获取信息
        //$curPageList = $this->getInfo($html);
        //p($curPageList);
}
private function getInfo($url){
    $html = $this->getHtml($url);
    $array = array();
    // 匹配所有的标题
    preg_match_all("#<divclass=\"sulist_title\"><i></i><span>(.*?)</span></div>#",$html,$matches);
    $all_title = $matches[1];
    preg_match_all("#<i>发布时间:</i><span>(.*?)</span>#",$html,$matches);
    // 匹配所有的发布时间
    $all_time = $matches[1];
    // 匹配所有的求购数量
    preg_match_all("#<i>求购数量:</i><span>(.*?)</span>#",$html,$matches);
    $all_weight = $matches[1];
    // 匹配跳转链接
    preg_match_all("#<atarget=\"_blank\"href=\"(.*?)\">#",$html,$matches);
    $all_url = $matches[1];
    // 组合
    foreach($all_title as $k => $v){
        $arr = array();
        $arr[&#39;title&#39;] = $v;
        $arr[&#39;weight&#39;] = $all_weight[$k];
        $arr[&#39;add_time&#39;] = $all_time[$k];
        $arr[&#39;url&#39;] = $all_url[$k];
        array_push($array,$arr);
    }
    return $array;
}
private function getHtml($url){
    $html = file_get_contents($url);
    $html = preg_replace("#\n#","",$html);
    $html = preg_replace("#\r#","",$html);
    $html = preg_replace("#\\s#","",$html);
    return $html;
}
Copy after login

The above is the detailed content of How to implement thinkphp automatic collection. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to run thinkphp project How to run thinkphp project Apr 09, 2024 pm 05:33 PM

To run the ThinkPHP project, you need to: install Composer; use Composer to create the project; enter the project directory and execute php bin/console serve; visit http://localhost:8000 to view the welcome page.

There are several versions of thinkphp There are several versions of thinkphp Apr 09, 2024 pm 06:09 PM

ThinkPHP has multiple versions designed for different PHP versions. Major versions include 3.2, 5.0, 5.1, and 6.0, while minor versions are used to fix bugs and provide new features. The latest stable version is ThinkPHP 6.0.16. When choosing a version, consider the PHP version, feature requirements, and community support. It is recommended to use the latest stable version for best performance and support.

How to run thinkphp How to run thinkphp Apr 09, 2024 pm 05:39 PM

Steps to run ThinkPHP Framework locally: Download and unzip ThinkPHP Framework to a local directory. Create a virtual host (optional) pointing to the ThinkPHP root directory. Configure database connection parameters. Start the web server. Initialize the ThinkPHP application. Access the ThinkPHP application URL and run it.

Which one is better, laravel or thinkphp? Which one is better, laravel or thinkphp? Apr 09, 2024 pm 03:18 PM

Performance comparison of Laravel and ThinkPHP frameworks: ThinkPHP generally performs better than Laravel, focusing on optimization and caching. Laravel performs well, but for complex applications, ThinkPHP may be a better fit.

Development suggestions: How to use the ThinkPHP framework to implement asynchronous tasks Development suggestions: How to use the ThinkPHP framework to implement asynchronous tasks Nov 22, 2023 pm 12:01 PM

"Development Suggestions: How to Use the ThinkPHP Framework to Implement Asynchronous Tasks" With the rapid development of Internet technology, Web applications have increasingly higher requirements for handling a large number of concurrent requests and complex business logic. In order to improve system performance and user experience, developers often consider using asynchronous tasks to perform some time-consuming operations, such as sending emails, processing file uploads, generating reports, etc. In the field of PHP, the ThinkPHP framework, as a popular development framework, provides some convenient ways to implement asynchronous tasks.

How to install thinkphp How to install thinkphp Apr 09, 2024 pm 05:42 PM

ThinkPHP installation steps: Prepare PHP, Composer, and MySQL environments. Create projects using Composer. Install the ThinkPHP framework and dependencies. Configure database connection. Generate application code. Launch the application and visit http://localhost:8000.

How is the performance of thinkphp? How is the performance of thinkphp? Apr 09, 2024 pm 05:24 PM

ThinkPHP is a high-performance PHP framework with advantages such as caching mechanism, code optimization, parallel processing and database optimization. Official performance tests show that it can handle more than 10,000 requests per second and is widely used in large-scale websites and enterprise systems such as JD.com and Ctrip in actual applications.

Development suggestions: How to use the ThinkPHP framework for API development Development suggestions: How to use the ThinkPHP framework for API development Nov 22, 2023 pm 05:18 PM

Development suggestions: How to use the ThinkPHP framework for API development. With the continuous development of the Internet, the importance of API (Application Programming Interface) has become increasingly prominent. API is a bridge for communication between different applications. It can realize data sharing, function calling and other operations, and provides developers with a relatively simple and fast development method. As an excellent PHP development framework, the ThinkPHP framework is efficient, scalable and easy to use.

See all articles