Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

黄舟

Oct 09, 2017 am 10:25 AM

python3 crawl NetEase

This article mainly introduces to you the relevant information about Python3 practical crawler to capture NetEase Cloud music hot reviews. The article introduces it in detail through the example code. It has certain reference learning value for everyone's study or work. It needs Friends, please follow the editor to learn together.

Preface

# I just got started with python crawler, and I haven’t written python for about half a month, and I almost forgot about it. So I was going to write a simple crawler to practice. I felt that the best features of NetEase Cloud Music were its accurate song recommendations and unique user reviews, so I wrote this method to capture the hot reviews in NetEase Cloud Music’s hot song list. reptile. I am also just getting started with crawling. If you have any comments or questions, please feel free to raise them. Let’s make progress together.

No more nonsense~ Let’s take a look at the detailed introduction.

Our goal is to crawl the popular comments of all songs in the hot song rankings in NetEase Cloud.

This can not only reduce the workload we need to crawl, but also save high-quality comments.

Implementation Analysis

First, we open the NetEase Cloud web version, as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Click on the ranking list, and then click on the cloud music hot song list on the left, as shown in the picture:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Let’s first open a song at random and find out how to grab the specified The method of popular song reviews of songs is as shown in the picture. I chose a song that I like recently as an example:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

After entering, we will see the song review in Below this page, next we have to find a way to get these comments.

Next open the web console (open the developer tools for chrom, it should be similar for other browsers), press F12 under chrom, as shown in the picture:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Select Network, and then we press F5 to refresh. The data obtained after refreshing is as shown below:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

You can see that the browser sent a lot of information , so which one do we want? Here we can make a preliminary judgment through the status code. The status code marks the status of the server request. The status code here is 200, which means the request is normal, and 304, which means it is abnormal (there are many types of status codes. If you want If you want to know more about it, you can search it yourself. I won’t talk about the specific meaning of 304 here). So we generally only need to look at requests with status code 200. Also, we can roughly observe what information the server returns (or view the response) through the preview in the right column. By combining these two methods, we can quickly find the request we want to analyze. After repeated searches, I finally found a request containing a song review, as shown in the picture:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Maybe the screenshot is not very clear on CSDN, we found a request with the name R_SO_4_489998494 A song review containing this song was found in the POST request of ?csrf_token=. We send this block screenshot so that we can see it more clearly:

Request basic information:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Request header:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Form data in the request:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

We can see that it contains this The request URL for song reviews is http://music.163.com/weapi/v1/resource/comments/R_SO_4_489998494?csrf_token=. After changing a few songs, we found that the first part of the request is the same. It's just that the string of numbers immediately following R_SO_4_ is different. We can infer that each song has a specified id, and what follows R_SO_4_ is the id of the song.

Let's look at the submitted form data again. We will find that two data need to be filled in the form, named params and encSecKey. What follows is a large string of characters. If you change a few songs, you will find that the params and encSecKey of each song are different. Therefore, these two data may have been encrypted by a specific algorithm.

The data related to comments returned by the server is in json format, which contains very rich information (such as information about the commentator, comment date, number of likes, comment content, etc.), among which hotComments is our There are a total of 15 popular comments we are looking for, as shown in the picture:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

#At this point, we have determined the direction, that is, we only need to determine the two parameters params and encSecKey. Just value. But these two parameters are encrypted through a specific algorithm, what should we do? I discovered a pattern, http://music.163.com/weapi/v1/resource/comments/R_SO_4_489998494?csrf_token= The number after R_SO_4_ is the id value of this song, and for different songs, the param and encSecKey value, if you pass these two parameter values of a song such as A to song B, then for the same number of pages, this parameter is universal, that is, the two parameter values of the first page of A are passed to song B. For the two parameters of any other song, you can get the comments on the first page of the corresponding song. The same is true for the second page, third page, etc.

In fact, we only need to get the 15 popular comments on the first page, so we just need to find a song and add the params and encSecKey in the request on the first page of the song Copy these two parameter values and you can use them.

Regarding how to decrypt these two parameters, there is actually an answer on the powerful Zhihu. Friends who are interested can go in and take a look (https://www.zhihu.com/question/ 36081767), we only need to use our lazy method to complete the requirements here, xixi.

So far, we have analyzed how to capture the popular comments of NetEase Cloud Music. Let’s analyze how to obtain the information of all songs in the cloud music hot song list.

We need to obtain the song names and corresponding id values of all songs in the cloud music hot song list.

Similar to the above analysis steps, we first enter the URL of the hot song list, as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Press F12 to enter the WEB workbench , as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

# We found all the song information of the list in a GET request named toplist?id=3778678.

The information corresponding to the request is as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Let’s preview the result returned by the request, as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

We found the code containing song information at line 524 of the code, as shown in the figure:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Therefore, we only need to add the request code to , filter out codes containing information.

We use regular expressions for data filtering here.

By observing the characteristics, we can extract the song information we need through two regular expression filters.

For the first regular expression, we extracted the 525th line of code from all the codes returned by the request.

The first regular expression is as follows:

<ul class="f-hide">
<li>
<a href="/song\?id=\d*?" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >.*</a>
</li>
</ul>

Copy after login

The second regular expression we extract the song information we need in line 524, we need the song of the song name and id, the corresponding regular expressions are as follows:

Get the song name:

<li><a href="/song\?id=\d*?" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >(.*?)</a></li>

Copy after login

Get the song id:

<li><a href="/song\?id=(\d*?)" rel="external nofollow" rel="external nofollow" >.*?</a></li>

Copy after login

At this point, our entire process has been completed After the analysis, go to the code to see the specific details~~

##The code is as follows:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
import urllib.request
import urllib.error
import urllib.parse
import json
def get_all_hotSong():  #获取热歌榜所有歌曲名称和id
 url=&#39;http://music.163.com/discover/toplist?id=3778678&#39; #网易云云音乐热歌榜url
 html=urllib.request.urlopen(url).read().decode(&#39;utf8&#39;) #打开url
 html=str(html)  #转换成str
 pat1=r&#39;<ul class="f-hide"><li><a href="/song\?id=\d*?">.*</a></li></ul>&#39; #进行第一次筛选的正则表达式
 result=re.compile(pat1).findall(html)  #用正则表达式进行筛选
 result=result[0]  #获取tuple的第一个元素

 pat2=r&#39;<li><a href="/song\?id=\d*?">(.*?)</a></li>&#39; #进行歌名筛选的正则表达式
 pat3=r&#39;<li><a href="/song\?id=(\d*?)">.*?</a></li>&#39; #进行歌ID筛选的正则表达式
 hot_song_name=re.compile(pat2).findall(result) #获取所有热门歌曲名称
 hot_song_id=re.compile(pat3).findall(result) #获取所有热门歌曲对应的Id

 return hot_song_name,hot_song_id

def get_hotComments(hot_song_name,hot_song_id):
 url=&#39;http://music.163.com/weapi/v1/resource/comments/R_SO_4_&#39; + hot_song_id + &#39;?csrf_token=&#39; #歌评url
 header={ #请求头部
 &#39;User-Agent&#39;:&#39;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36&#39;
}
 #post请求表单数据
 data={&#39;params&#39;:&#39;zC7fzWBKxxsm6TZ3PiRjd056g9iGHtbtc8vjTpBXshKIboaPnUyAXKze+KNi9QiEz/IieyRnZfNztp7yvTFyBXOlVQP/JdYNZw2+GRQDg7grOR2ZjroqoOU2z0TNhy+qDHKSV8ZXOnxUF93w3DA51ADDQHB0IngL+v6N8KthdVZeZBe0d3EsUFS8ZJltNRUJ&#39;,&#39;encSecKey&#39;:&#39;4801507e42c326dfc6b50539395a4fe417594f7cf122cf3d061d1447372ba3aa804541a8ae3b3811c081eb0f2b71827850af59af411a10a1795f7a16a5189d163bc9f67b3d1907f5e6fac652f7ef66e5a1f12d6949be851fcf4f39a0c2379580a040dc53b306d5c807bf313cc0e8f39bf7d35de691c497cda1d436b808549acc&#39;}
 postdata=urllib.parse.urlencode(data).encode(&#39;utf8&#39;) #进行编码
 request=urllib.request.Request(url,headers=header,data=postdata)
 reponse=urllib.request.urlopen(request).read().decode(&#39;utf8&#39;)
 json_dict=json.loads(reponse) #获取json
 hot_commit=json_dict[&#39;hotComments&#39;] #获取json中的热门评论


 num=0
 fhandle=open(&#39;./song_comments&#39;,&#39;a&#39;) #写入文件
 fhandle.write(hot_song_name+&#39;:&#39;+&#39;\n&#39;)

 for item in hot_commit:
  num+=1
  fhandle.write(str(num)+&#39;.&#39;+item[&#39;content&#39;]+&#39;\n&#39;)
 fhandle.write(&#39;\n==============================================\n\n&#39;)
 fhandle.close()




hot_song_name,hot_song_id=get_all_hotSong() #获取热歌榜所有歌曲名称和id

num=0
while num < len(hot_song_name): #保存所有热歌榜中的热评
 print(&#39;正在抓取第%d首歌曲热评...&#39;%(num+1))
 get_hotComments(hot_song_name[num],hot_song_id[num])
 print(&#39;第%d首歌曲热评抓取成功&#39;%(num+1))
 num+=1

Copy after login

The results of running the code are as follows:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

Compare the song reviews of the song "If I Love You" on the web page with the song reviews we saved:

Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture)

The information is correct~

Summary

The above is the detailed content of Python3 implements crawler to capture popular comments analysis of NetEase Cloud Music (picture). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7609

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

NetEase's first! The 5V5 multi-hero skills shooting mobile game "Operation Apocalypse" pre-order is open! Mar 16, 2024 am 08:01 AM

Since the release of the domestic game version in February, NetEase's mysterious shooting game "Operation Apocalypse" has attracted the curiosity of many players. As we all know, although NetEase also had some shooting games in the early years, it seems that except for "The Day After Tomorrow" and "Knives Out", there are not many playable games. In recent years, the works launched by NetEase have gained momentum and have achieved considerable results in many niche tracks. The fact that "Operation Apocalypse", which has never been exposed before, is placed as the first shot of NetEase in the new year is enough to raise eyebrows. Is this a signal that NetEase is going to launch an attack on the shooting track? Players are eagerly looking forward to new mobile shooting games. , on March 13, this 5V5 multi-hero skills shooting mobile game "Operation Apocalypse" developed by NetEase finally unveiled its mystery and officially released its first live demonstration

NetEase Mar 28, 2024 pm 12:50 PM

On March 27, 2024, Beijing time, NetEase Games and Marvel Games officially announced a new game: the superhero PVP team shooting game "Marvel Rivals". Players can choose their favorite characters from a rich and diverse lineup of superheroes and super villains to form an all-star team, and use their unique superpowers to engage in exciting battles on various breakable maps in the Marvel multiverse. "We are very excited to bring "Marvel Confrontation" to players around the world. We have always loved the Marvel Universe and its characters, and we are excited to develop this game." The main creative team of "Marvel Contest" said, "This is exactly the game we dreamed of creating, and we are extremely proud to be able to turn it from a dream into a reality." "NetEase

NetEase announced the suspension of 'Marvel Super War', which was Marvel's first MOBA game! Apr 18, 2024 am 10:50 AM

NetEase's "Marvel Super War" announced that it will terminate operations and close the game server at 15:00 on June 17, 2024. The download entrance for all platforms has now been closed, and game recharge and new user registration have been stopped. As Marvel's first MOBA mobile game, this game authentically displays the combat characteristics of superheroes and restores the grand world view of the Marvel universe. In the game, you will be able to assemble in the parallel universe with the Avengers, X-Men, Fantastic Four and many superheroes and super villains, and compete with Iron Man, Captain America, Spider-Man, Loki, Thanos, Deadpool Wait for more than 60 classic Marvel characters to fight together!

Game version of Sora? Nishuihan mobile game releases AI video generation tool, supports typing input Feb 26, 2024 pm 08:55 PM

Recently, Nishuihan mobile game officially released a new AI video generation tool, through which players can "produce blockbusters just by typing." According to the official introduction, the functions are implemented based on the Nishuihan game itself and are highly involved in AI. There is no need for any equipment, actors, or special effects. You only need to type in any character image, actions, and lines, and the corresponding content can be generated in real time in the game through AI and filmed. At the same time, players are supported to adjust details, including the character’s clothing, makeup, hairstyle, personality, voice, etc. This function also supports the uploading of pictures/videos, motion capture through AI, and real-time generation of movements and expressions that are “not available in the game” in the game. Officials stated that this feature has the same vision as Sora, which is to “make the creative space endless and the creative threshold close to the limit”.

Blizzard's national server returns to the scene to watch, Valina shows off her snow-skinned skin and long legs, and Dva in her pink skirt is so cute! Apr 11, 2024 pm 04:04 PM

NetEase and Microsoft Blizzard have officially announced the return of Blizzard's national server. NetEase also held a celebration ceremony under the headquarters building. The scene was also very exciting. Let's go to the scene together! The scene was set up early in the morning, and all Blizzard’s games are listed, including World of Warcraft, Hearthstone, Diablo 3, Heroes of the Storm, Overwatch, StarCraft 2, etc. Pay attention to the official announcements from Blizzard and NetEase Heroes of the Storm is not mentioned, but there is no poster that can be seen on site for this game, so players who like Heroes of the Storm don’t have to worry, Blizzard’s entire family will be back. We should also pay special attention to World of Warcraft. The poster displayed is for World of Warcraft 11.0 War for the Center of the Earth. Obviously, this classic version in which Mason personally copied the sword will be played on the Chinese server this summer.

Can NetEase Master re-register after logging out? Mar 07, 2024 pm 03:25 PM

NetEase Master APP is a social platform for gamers. Users can get game consultation here. Many users want to know whether NetEase Master can re-register after logging out. NetEase Master can re-register after logging out. Can NetEase Master re-register after logging out? Answer: Yes. 1. NetEase Master can re-register after logging out. 2. But you must register for at least one week before you can choose this number to register. 3. The account cannot be specified, and previous data will be cleared. 4. After the user logs out, all previous information and related balances have been cleared. Related articles: How does NetEase master change the bound mobile phone number?

Tencent Photon H Studio is hiring in Hangzhou and plans to make a 3A open world RPG Feb 05, 2024 pm 01:45 PM

Recently, Tencent Interactive Entertainment Recruitment released a recruitment information, indicating that Photon H Studio is committed to developing a content-rich, AAA-level open world RPG project. The hot recruitment positions cover multiple fields such as UE5 engineers, backend, level design, action scene design, character modeling, special effects and distribution. The target working location of these positions is in Hangzhou, where NetEase is headquartered.

On the occasion of the re-opening of the national server of World of Warcraft, here are the 4 major version selection guides, the last one is more suitable for casual players Apr 13, 2024 am 09:16 AM

There are currently 4 versions of World of Warcraft. The national server has been closed for more than a year. It is estimated that many players do not know where each version has developed. Let's sort out the current status of each version. 1. At the end of the official server version 10.0, before the closure of the national server, version 10.0 has just started. It is currently in version 10.26. There will be version 10.27 later, and the Age of Dragon expansion pack will be over. Although version 10.0 has received good reviews in foreign servers and has restored some popularity for Blizzard, the core of the game in version 10.0 has not changed at all. It is still mainly about big secrets and raids, and the number of PVP players is very small. With the continuous updates of the official server version, players’ gaming tendencies have also changed from PVE and PVP to collecting.

See all articles