Home > Backend Development > Python Tutorial > Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

Release: 2023-08-10 15:53:10
forward
1055 people have browsed it


##This issue is<Zhihu Hot List/Weibo Hot Search Sequence Chart>Series of articlesThe content of the previous article introduces how to use Python to regularly crawl knowledge Hu hot list/Weibo hot search data, andsave it to a CSV file for subsequent visualization. The timing diagram part will be innext articleIntroduced in the content, I hope it will be helpful to you.

涉及到的内容:
pandas — 数据处理
schedule — 定时任务
#json - Data Format

read_html — Web form processing


1. Preparation

1.1 Import module
import json
import time
import requests
import schedule
import pandas as pd
from fake_useragent import UserAgent
Copy after login


##

2. 知乎热榜数据

2.1 网页分析
知乎热榜电脑端接口:
https://www.zhihu.com/hot
Copy after login
知乎热榜手机端接口:
https://api.zhihu.com/topstory/hot-list?limit=10&reverse_order=0
Copy after login

注意:电脑端端直接F12调试页即可看到热榜数据,手机端需要借助抓包工具查看,这里我们使用手机端接口(返回json格式数据,解析比较方便)。

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

2.2 Get data

##Code:

def getzhihudata(url, headers):
    r = requests.get(url, headers=headers)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    datas = json.loads(r.text)[&#39;data&#39;]
    allinfo = []
    time_mow = time.strftime("%Y-%m-%d %H:%M", time.localtime())
    print(time_mow)
    for indx,item in enumerate(datas):
        title = item[&#39;target&#39;][&#39;title&#39;]
        heat = item[&#39;detail_text&#39;].split(&#39; &#39;)[0]
        answer_count = item[&#39;target&#39;][&#39;answer_count&#39;]
        follower_count = item[&#39;target&#39;][&#39;follower_count&#39;]
        href = item[&#39;target&#39;][&#39;url&#39;]
        info = [time_mow, indx+1, title, heat, answer_count, follower_count, href]
        allinfo.append(info)
    # 仅首次加表头
    global csv_header
    df = pd.DataFrame(allinfo,columns=[&#39;时间&#39;,&#39;排名&#39;,&#39;标题&#39;,&#39;热度(万)&#39;,&#39;回答数&#39;,&#39;关注数&#39;,&#39;链接&#39;])
    print(df.head())
Copy after login

定时间隔设置1S:

# 每1分钟执行一次爬取任务:
schedule.every(1).minutes.do(getzhihudata,zhihu_url,headers)
while True:
     schedule.run_pending()
     time.sleep(1)
Copy after login

效果:

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

2.3 保存数据

df.to_csv(&#39;zhuhu_hot_datas.csv&#39;, mode=&#39;a+&#39;, index=False, header=csv_header)
csv_header = False
Copy after login
注意csv_header的设置,涉及到是否写入表头字段。


3. Weibo hot search data

##3.1 Web page analysis

##Weibo hot search URL:

https://s.weibo.com/top/summary

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)
F12 View web page source code:
Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

##The data is in the

tag of the web page.

##3.2 Obtain data

代码:

def getweibodata():
    url = &#39;https://s.weibo.com/top/summary&#39;
    r = requests.get(url, timeout=10)
    r.encoding = r.apparent_encoding
    df = pd.read_html(r.text)[0]
    df = df.loc[1:,[&#39;序号&#39;, &#39;关键词&#39;]]
    df = df[~df[&#39;序号&#39;].isin([&#39;•&#39;])]
    time_mow = time.strftime("%Y-%m-%d %H:%M", time.localtime())
    print(time_mow)
    df[&#39;时间&#39;] = [time_mow] * df.shape[0]
    df[&#39;排名&#39;] = df[&#39;序号&#39;].apply(int)
    df[&#39;标题&#39;] = df[&#39;关键词&#39;].str.split(&#39; &#39;, expand=True)[0]
    df[&#39;热度&#39;] = df[&#39;关键词&#39;].str.split(&#39; &#39;, expand=True)[1]
    df = df[[&#39;时间&#39;,&#39;排名&#39;,&#39;标题&#39;,&#39;热度&#39;]]
    print(df.head())
Copy after login

定时间隔设置1S效果:

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)

3.3 保存数据

df.to_csv(&#39;weibo_hot_datas.csv&#39;, mode=&#39;a+&#39;, index=False, header=csv_header)
Copy after login

结果:

Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1)


The above is the detailed content of Crawler + Visualization | Python Zhihu Hot List/Weibo Hot Search Sequence Chart (Part 1). For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:Python当打之年
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template