比Selenium好用？Python使用playwright获取S站在线游戏排名 -

Home > List of blog posts > 比Selenium好用？Python使用playwright获取S站在线游戏排名

Blogger Information

Blog 9

fans 0

comment 0

visits 7043

Special Recommendation

More>

Related recommendations

Related Tutorials

Popular Recommendations

Latest courses

The latest ThinkPHP 5.1 world premiere video tutorial (60 days to become a PHP expert online training course)

1421822 times of learning
Collection
PHP introductory tutorial one: Learn PHP in one week

4266429 times of learning
Collection
JAVA Beginner's Video Tutorial

2520452 times of learning
Collection

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template

比Selenium好用？Python使用playwright获取S站在线游戏排名

IPIDEA全球HTTP

Original

991 people have browsed it

因为版权不明，以下所有的那个网站用S代替，那个第三方网站用SDB代替。

在之前的文章中爬取了S的热销商品，也说明了因为Cloudflare的浏览器验证导致SDB无法爬取，连selenium也不行。当时我就放弃了。

但是前一段时间，有一个伙计给我讲：【用playwright啊！】

playwright支持多种语法，相比于selenium的Http协议，playwright的Websocket获取浏览器情况会更好一点。

Playwright的使用

安装：需要python3.7+

1. pip install --upgrade pip  
2. pip install playwright  
3. playwright install

一次安装，Playwright就可以通过开发者工具与你安装的浏览器 (chromiun, firefox and webkit)进行交互，不像selenium下载对应浏览器版本的Driver了。

本次只讲一下最基本的页面获取，其他的功能各位自行查看文档吧：

1. from playwright.sync_api import sync_playwright  
2.    
3. with sync_playwright() as p:  
4.     browser = p.webkit.launch()  
5.     page = browser.new_page()  
6.     page.goto("http://whatsmyuseragent.org/")  
7.     page.wait_for_load_state('networkidle')  
8.     html = page.content()  
9.     browser.close()

html就是已经加载好的正文的内容，获取到的东西就可以交给选择器去处理跟筛选了。

SDB实例

SDB，是一个第三方的S的数据库，在线人数，游戏价格等等都能查询的到。

在线人数，在线排名都有展示出来。现在开始获取数据。

1. with sync_playwright() as p:  
2.     try:  
3.         browser = p.chromium.launch(headless=False)  
4.         page = browser.new_page()  
5.         page.goto('https://steamdb.info/graph/')  
6.         page.wait_for_load_state('networkidle')  
7.         html = page.content()  
8.         browser.close()  
9.         return html  
10.     except Exception as e:  
11.         print(e)

不知道为什么开启无头就是通过不了。这样的话本页的html内容就获取下来了：

接下来用选择器进行内容解析：

1. sel = Selector(text=content)  
2. nodes = sel.css('#table-apps .app')  
3. for node in nodes:  
4.     title = node.css('td:nth-child(3) a::text').extract_first()  
5.     current = node.css('td:nth-child(4)::text').extract_first()  
6.     peakToday = node.css('td:nth-child(5)::text').extract_first()  
7.     allPeak = node.css('td:nth-child(6)::text').extract_first()  
8.     print(f"游戏：{title},目前在线：{current},今日最高在线：{peakToday},历史最高在线：{allPeak}")

Playwright的代理配置

Playwright配置代理其实很简单，要在浏览器配置那一行加上proxy参数就可以了：

browser = p.chromium.launch(headless=False,proxy=proxy)

本人目前使用的是ipidea的代理，新用户可以白嫖流量哦。SDB国内访问慢，用上稳定的代理配上好用的playwright，事半功倍！

地址：http://www.ipidea.net/

代码整合

1. from playwright.sync_api import sync_playwright  
2. from parsel import Selector  
3.    
4. def getSteaminfo():  
5.    
6.     proxy = {  
7.         'server': "",  
8.         'username': "",  
9.         'password': ''  
10.     }  
11.    
12.     with sync_playwright() as p:  
13.         try:  
14.             browser = p.chromium.launch(headless=False,proxy=proxy)  
15.             page = browser.new_page()  
16.             page.goto('https://steamdb.info/graph/')  
17.             page.wait_for_load_state('networkidle')  
18.             html = page.content()  
19.             browser.close()  
20.             return html  
21.         except Exception as e:  
22.             print(e)  
23.    
24. def start():  
25.     content = getSteaminfo()  
26.     sel = Selector(text=content)  
27.     nodes = sel.css('#table-apps .app')  
28.     for node in nodes:  
29.         title = node.css('td:nth-child(3) a::text').extract_first()  
30.         current = node.css('td:nth-child(4)::text').extract_first()  
31.         peakToday = node.css('td:nth-child(5)::text').extract_first()  
32.         allPeak = node.css('td:nth-child(6)::text').extract_first()  
33.         print(f"游戏：{title},目前在线：{current},今日最高在线：{peakToday},历史最高在线：{allPeak}")  
34.    
35. if __name__ == '__main__':  
36.     start()

Statement of this Website

The copyright of this blog article belongs to the blogger. Please specify the address when reprinting! If there is any infringement or violation of the law, please contact admin@php.cn Report processing!

All comments Speak rationally on civilized internet, please comply with News Comment Service Agreement

0 comments

Author's latest blog post

Python中Scrapy框架的代理使用

2022-10-08 14:10:02