


Python uses Phantomjs to crawl the web page after rendering JS
I recently needed to crawl a website, but unfortunately the pages were all generated after JS rendering. Ordinary crawler frameworks couldn't handle it, so I thought of using Phantomjs to build a proxy.
There seems to be no ready-made third-party library for Python to call Phantomjs (if there is, please inform Xiao2). After taking a walk, I found that only pyspider provides a ready-made solution.
After a brief trial, I feel that pyspider is more like a crawler tool built for novices, like an old lady, sometimes meticulous, sometimes chatty.
Lightweight gadgets should be more popular. With a little selfishness, I can use my favorite BeautifulSoup together without having to learn PyQuery (pyspider is used to parse HTML), let alone endure the browser writing Python Bad experience (laughing).
So I spent an afternoon taking out the part of pyspider that implements the Phantomjs agent and turning it into a small crawler module. I hope everyone will like it (thanks binux!).
Preparation
Of course you need Phantomjs, nonsense! (It is best to use supervisord to guard under Linux. Phantomjs must be kept open when crawling)
Start with phantomjs_fetcher.js in the project path: phantomjs phantomjs_fetcher.js [port]
Install tornado dependencies (using tornado httpclient module)
The call is super simple
from tornado_fetcher import Fetcher # 创建一个爬虫 >>> fetcher=Fetcher( user_agent='phantomjs', # 模拟浏览器的User-Agent phantomjs_proxy='http://localhost:12306', # phantomjs的地址 poolsize=10, # 最大的httpclient数量 async=False # 同步还是异步 ) # 开始连接Phantomjs的代理,可以渲染JS! >>> fetcher.phantomjs_fetch(url) # 渲染成功后执行额外的JS脚本(注意用function包起来!) >>> fetcher.phantomjs_fetch(url, js_script='function(){setTimeout("window.scrollTo(0,100000)}", 1000)')

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.

There is no absolute salary for Python and JavaScript developers, depending on skills and industry needs. 1. Python may be paid more in data science and machine learning. 2. JavaScript has great demand in front-end and full-stack development, and its salary is also considerable. 3. Influencing factors include experience, geographical location, company size and specific skills.

Although distinct and distinct are related to distinction, they are used differently: distinct (adjective) describes the uniqueness of things themselves and is used to emphasize differences between things; distinct (verb) represents the distinction behavior or ability, and is used to describe the discrimination process. In programming, distinct is often used to represent the uniqueness of elements in a collection, such as deduplication operations; distinct is reflected in the design of algorithms or functions, such as distinguishing odd and even numbers. When optimizing, the distinct operation should select the appropriate algorithm and data structure, while the distinct operation should optimize the distinction between logical efficiency and pay attention to writing clear and readable code.

The H5 page needs to be maintained continuously, because of factors such as code vulnerabilities, browser compatibility, performance optimization, security updates and user experience improvements. Effective maintenance methods include establishing a complete testing system, using version control tools, regularly monitoring page performance, collecting user feedback and formulating maintenance plans.

!x Understanding !x is a logical non-operator in C language. It booleans the value of x, that is, true changes to false, false changes to true. But be aware that truth and falsehood in C are represented by numerical values rather than boolean types, non-zero is regarded as true, and only 0 is regarded as false. Therefore, !x deals with negative numbers the same as positive numbers and is considered true.

There is no built-in sum function in C for sum, but it can be implemented by: using a loop to accumulate elements one by one; using a pointer to access and accumulate elements one by one; for large data volumes, consider parallel calculations.

How to obtain dynamic data of 58.com work page while crawling? When crawling a work page of 58.com using crawler tools, you may encounter this...

Copying and pasting the code is not impossible, but it should be treated with caution. Dependencies such as environment, libraries, versions, etc. in the code may not match the current project, resulting in errors or unpredictable results. Be sure to ensure the context is consistent, including file paths, dependent libraries, and Python versions. Additionally, when copying and pasting the code for a specific library, you may need to install the library and its dependencies. Common errors include path errors, version conflicts, and inconsistent code styles. Performance optimization needs to be redesigned or refactored according to the original purpose and constraints of the code. It is crucial to understand and debug copied code, and do not copy and paste blindly.
