How does php use QueryList to easily collect js dynamically rendered pages?

青灯夜游
Release: 2023-04-04 06:48:02
Original
3602 people have browsed it

This chapter will introduce how PHP can use QueryList to easily collect js dynamic rendering pages? It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

QueryList uses jQuery for collection and has a wealth of plug-ins. Let's demonstrate that QueryList uses the PhantomJS plug-in to capture the page content dynamically created by JS.

1. Installation

Use Composer to install:

1. Install QueryList

composer require jaeger/querylist
Copy after login

GitHub: https://github.com/jae-jae/QueryList

2. Install PhantomJS plug-in

composer require jaeger/querylist-phantomjs
Copy after login

GitHub: https://github.com/ jae-jae/QueryList-PhantomJS

2. Download the PhantomJS binary file

PhantomJS official website:http:// phantomjs.org, download the PhantomJS binary file corresponding to the platform.

3. Plug-in API

QueryList browser($url,$debug = false,$commandOpt = []): Open with a browser Connection

4. Use

# Take the mobile version of "Today's Toutiao" as an example. The mobile version of "Today's Toutiao" is based on the React framework. The content It is purely dynamically rendered.

The following demonstrates the usage of QueryList's PhantomJs plug-in:

1. Install the plug-in

use QL\QueryList;
use QL\Ext\PhantomJs;

$ql = QueryList::getInstance();
// 安装时需要设置PhantomJS二进制文件路径
$ql->use(PhantomJs::class,'/usr/local/bin/phantomjs');
//or Custom function name
$ql->use(PhantomJs::class,'/usr/local/bin/phantomjs','browser');
Copy after login

2.Example-1

Get dynamically rendered HTML:

$html = $ql->browser('https://m.toutiao.com')->getHtml();
print_r($html);
Copy after login

Get all p tag text content:

$data = $ql->browser('https://m.toutiao.com')->find('p')->texts();
print_r($data->all());
Copy after login

Output:

Array(
    [0] => 自拍模式开启!国庆假期我和国旗合个影
    [1] => 你旅途已开始 他们仍在自己的岗位上为你的假期保驾护航
    [2] => 喜极而泣,都教授终于回到地球了!    //....)
Copy after login

Use http proxy:

// 更多选项可以查看文档: 
http://phantomjs.org/api/command-line.html
$ql->browser('https://m.toutiao.com',true,[    
// 使用http代理 
'--proxy' => '192.168.1.42:8080',    '--proxy-type' => 'http'
])
Copy after login

3.Example-2

Customize a complex request:

$data = $ql->browser(function (\JonnyW\PhantomJs\Http\RequestInterface $r){
    $r->setMethod('GET');
    $r->setUrl('https://m.toutiao.com');
    $r->setTimeout(10000); // 10 seconds
    $r->setDelay(3); // 3 seconds
    return $r;
})->find('p')->texts();

print_r($data->all());
Copy after login

Turn on debug mode and load the cookie file locally:

$data = $ql->browser(function (\JonnyW\PhantomJs\Http\RequestInterface $r){
    $r->setMethod('GET');
    $r->setUrl('https://m.toutiao.com');
    $r->setTimeout(10000); // 10 seconds
    $r->setDelay(3); // 3 seconds
    return $r;
},true,[
    '--cookies-file' => '/path/to/cookies.txt'
])->rules([
    'title' => ['p','text'],
    'link' => ['a','href']
])->query()->getData();

print_r($data->all());
Copy after login

The above is the detailed content of How does php use QueryList to easily collect js dynamically rendered pages?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template