Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

Python当打之年

Aug 10, 2023 pm 03:53 PM

python

##This issue is<Zhihu Hot List/Weibo Hot Search Sequence Chart>Series of articlesThe content of the previous article introduces how to use Python to regularly crawl knowledge Hu hot list/Weibo hot search data, andsave it to a CSV file for subsequent visualization. The timing diagram part will be innext articleIntroduced in the content, I hope it will be helpful to you.

涉及到的内容：

pandas — 数据处理

schedule — 定时任务

#json - Data Format

read_html — Web form processing

1. Preparation

1.1 Import module
import json import time import requests import schedule import pandas as pd from fake_useragent import UserAgent
Copy after login

##
2. 知乎热榜数据

2.1 网页分析
知乎热榜电脑端接口：
https://www.zhihu.com/hot
Copy after login
知乎热榜手机端接口：
https://api.zhihu.com/topstory/hot-list?limit=10&reverse_order=0
Copy after login
注意：电脑端端直接F12调试页即可看到热榜数据，手机端需要借助抓包工具查看，这里我们使用手机端接口(返回json格式数据，解析比较方便)。
2.2 Get data
##Code:
def getzhihudata(url, headers): r = requests.get(url, headers=headers) r.raise_for_status() r.encoding = r.apparent_encoding datas = json.loads(r.text)['data'] allinfo = [] time_mow = time.strftime("%Y-%m-%d %H:%M", time.localtime()) print(time_mow) for indx,item in enumerate(datas): title = item['target']['title'] heat = item['detail_text'].split(' ')[0] answer_count = item['target']['answer_count'] follower_count = item['target']['follower_count'] href = item['target']['url'] info = [time_mow, indx+1, title, heat, answer_count, follower_count, href] allinfo.append(info) # 仅首次加表头 global csv_header df = pd.DataFrame(allinfo,columns=['时间','排名','标题','热度(万)','回答数','关注数','链接']) print(df.head())
Copy after login
定时间隔设置1S:
# 每1分钟执行一次爬取任务: schedule.every(1).minutes.do(getzhihudata,zhihu_url,headers) while True: schedule.run_pending() time.sleep(1)
Copy after login
效果：
2.3 保存数据
df.to_csv('zhuhu_hot_datas.csv', mode='a+', index=False, header=csv_header) csv_header = False
Copy after login
注意csv_header的设置，涉及到是否写入表头字段。

3. Weibo hot search data

##3.1 Web page analysis
##Weibo hot search URL:
https://s.weibo.com/top/summary
F12 View web page source code:
##The data is in the
tag of the web page.
##3.2 Obtain data
代码：
def getweibodata(): url = 'https://s.weibo.com/top/summary' r = requests.get(url, timeout=10) r.encoding = r.apparent_encoding df = pd.read_html(r.text)[0] df = df.loc[1:,['序号', '关键词']] df = df[~df['序号'].isin(['•'])] time_mow = time.strftime("%Y-%m-%d %H:%M", time.localtime()) print(time_mow) df['时间'] = [time_mow] * df.shape[0] df['排名'] = df['序号'].apply(int) df['标题'] = df['关键词'].str.split(' ', expand=True)[0] df['热度'] = df['关键词'].str.split(' ', expand=True)[1] df = df[['时间','排名','标题','热度']] print(df.head())
Copy after login
定时间隔设置1S，效果：
3.3 保存数据
df.to_csv('weibo_hot_datas.csv', mode='a+', index=False, header=csv_header)
Copy after login
结果：

The above is the detailed content of Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Show More

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?
1 months ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?
1 months ago By DDD

R.E.P.O. Best Graphic Settings
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution
1 weeks ago By DDD

Show More

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Show More

Hot Topics

Where is the login entrance for gmail email?

7393

15

Java Tutorial

1630

14

CakePHP Tutorial

1357

52

Laravel Tutorial

1268

25

PHP Tutorial

1217

29

Show More

Related knowledge

Is the conversion speed fast when converting XML to PDF on mobile phone? Apr 02, 2025 pm 10:09 PM
The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

How to convert XML files to PDF on your phone? Apr 02, 2025 pm 10:12 PM
It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

What is the function of C language sum? Apr 03, 2025 pm 02:21 PM
There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.

Is there any mobile app that can convert XML into PDF? Apr 02, 2025 pm 08:54 PM
An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

How to convert xml into pictures Apr 03, 2025 am 07:39 AM
XML can be converted to images by using an XSLT converter or image library. XSLT Converter: Use an XSLT processor and stylesheet to convert XML to images. Image Library: Use libraries such as PIL or ImageMagick to create images from XML data, such as drawing shapes and text.

Recommended XML formatting tool Apr 02, 2025 pm 09:03 PM
XML formatting tools can type code according to rules to improve readability and understanding. When selecting a tool, pay attention to customization capabilities, handling of special circumstances, performance and ease of use. Commonly used tool types include online tools, IDE plug-ins, and command-line tools.

What is the process of converting XML into images? Apr 02, 2025 pm 08:24 PM
To convert XML images, you need to determine the XML data structure first, then select a suitable graphical library (such as Python's matplotlib) and method, select a visualization strategy based on the data structure, consider the data volume and image format, perform batch processing or use efficient libraries, and finally save it as PNG, JPEG, or SVG according to the needs.

Is there a mobile app that can convert XML into PDF? Apr 02, 2025 pm 09:45 PM
There is no APP that can convert all XML files into PDFs because the XML structure is flexible and diverse. The core of XML to PDF is to convert the data structure into a page layout, which requires parsing XML and generating PDF. Common methods include parsing XML using Python libraries such as ElementTree and generating PDFs using ReportLab library. For complex XML, it may be necessary to use XSLT transformation structures. When optimizing performance, consider using multithreaded or multiprocesses and select the appropriate library.

See all articles

Public welfare online PHP training，Help PHP learners grow quickly！

About us Disclaimer Sitemap

© php.cn All rights reserved