Home Backend Development Python Tutorial python3 crawls WeChat articles

python3 crawls WeChat articles

Jul 21, 2017 pm 01:46 PM
python3 article reptile

Prerequisite:

python3.4

windows

Function: Search related WeChat articles through Sogou’s WeChat search interface, and import titles and related links into Excel tables Medium

Note: The xlsxwriter module is required, and the program writing time is 2017/7/11, so as to avoid that the program cannot be used later, which may be due to relevant changes made to the website. The program is relatively simple, excluding more than 40 lines of comments.

Title:

Idea: Open the initial Url --> Get the title and link regularly --> Change the page loop in the second step --> Import the obtained title and link into Excel

The first step of the crawler is to do it manually (gossip)

Enter the URL mentioned above, such as input: "image recognition", search, the URL will become "" marked in red It is an important parameter. When type=1, it is searching for official accounts. Regardless, query='search keywords', the keywords have been encoded, and there is also a hidden parameter page=1

when you jump to the second You can see "" when page +search+'&page='+str(page)

search is the keyword to be searched. Use quote() encoding to insert

1 search = urllib.request.quote(search)
Copy after login

page is used for looping
1 for page in range(1,pagenum+1):
2     url = 'http://weixin.sogou.com/weixin?type=2&query='+search+'&page='+str(page)
Copy after login

The complete url has been obtained. Next, access the url and obtain the data (create opener object , add header())

1 import urllib.request
2     header = ('User-Agent','Mozilla/5.0')
3     opener = urllib.request.build_opener()
4     opener.addheaders = [header]
5     urllib.request.install_opener(opener)
6     data = urllib.request.urlopen(url).read().decode()
Copy after login
Get the page content, use regular expression to obtain relevant data

1 import re
2     finddata = re.compile('<a target="_blank" href="(.*?)".*?uigs="article_title_.*?">(.*?)</a>').findall(data)
3     #finddata = [('',''),('','')]
Copy after login
There is interference in the data obtained through regular expression Item (link: 'amp;') and irrelevant item (title: '<...><....>'), use replace() to solve

1 title = title.replace('<em><!--red_beg-->','')
2 title = title.replace('<!--red_end--></em>','')
Copy after login
1 link = link.replace('amp;','')
Copy after login

Save the processed titles and links in the list
1 title_link.append(link)
2 title_link.append(title)
Copy after login
The titles and links searched in this way are obtained Okay, next import Excel

Create Excel first
1 import xlsxwriter
2 workbook = xlsxwriter.Workbook(search+'.xlsx')
Copy after login
3 worksheet = workbook.add_worksheet('微信')
Copy after login

Import the data in title_link into Excel

1 for i in range(0,len(title_link),2):
2     worksheet.write('A'+str(i+1),title_link[i+1])
3     worksheet.write('C'+str(i+1),title_link[i])
4 workbook.close()
Copy after login
Complete code :

 1 '''
 2 python3.4 + windows
 3 羽凡-2017/7/11-
 4 用于搜索微信文章,保存标题及链接至Excel中
 5 每个页面10秒延迟,防止被限制
 6 import urllib.request,xlsxwriter,re,time
 7 '''
 8 import urllib.request
 9 search = str(input("搜索微信文章:"))
10 pagenum = int(input('搜索页数:'))
11 import xlsxwriter
12 workbook = xlsxwriter.Workbook(search+'.xlsx')
13 search = urllib.request.quote(search)
14 title_link = []
15 for page in range(1,pagenum+1):
16     url = 'http://weixin.sogou.com/weixin?type=2&query='+search+'&page='+str(page)
17     import urllib.request
18     header = ('User-Agent','Mozilla/5.0')
19     opener = urllib.request.build_opener()
20     opener.addheaders = [header]
21     urllib.request.install_opener(opener)
22     data = urllib.request.urlopen(url).read().decode()
23     import re
24     finddata = re.compile('<a target="_blank" href="(.*?)".*?uigs="article_title_.*?">(.*?)</a>').findall(data)
25     #finddata = [('',''),('','')]
26     for i in range(len(finddata)):
27         title = finddata[i][1]
28         title = title.replace('<em><!--red_beg-->','')
29         title = title.replace('<!--red_end--></em>','')
30         try:
31             #标题中可能存在引号
32             title = title.replace('&ldquo;','"')
33             title = title.replace('&rdquo;','"')
34         except:
35             pass
36         link = finddata[i][0]
37         link = link.replace('amp;','')
38         title_link.append(link)
39         title_link.append(title)
40     print('第'+str(page)+'页')
41     import time
42     time.sleep(10)
43 worksheet = workbook.add_worksheet('微信')
44 worksheet.set_column('A:A',70)
45 worksheet.set_column('C:C',100)
46 bold = workbook.add_format({'bold':True})
47 worksheet.write('A1','标题',bold)
48 worksheet.write('C1','链接',bold)
49 for i in range(0,len(title_link),2):
50     worksheet.write('A'+str(i+1),title_link[i+1])
51     worksheet.write('C'+str(i+1),title_link[i])
52 workbook.close()
53 print('导入Excel完毕!')
Copy after login

The above is the detailed content of python3 crawls WeChat articles. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How can I make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! How can I make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! Mar 15, 2024 pm 04:13 PM

1. How can you make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! 1. Activate basic rights and interests: original articles can earn profits by advertising, and videos must be original in horizontal screen mode to earn profits. 2. Activate the rights of 100 fans: if the number of fans reaches 100 fans or above, you can get profits from micro headlines, original Q&amp;A creation and Q&amp;A. 3. Insist on original works: Original works include articles, micro headlines, questions, etc., and are required to be more than 300 words. Please note that if illegally plagiarized works are published as original works, credit points will be deducted, and even any profits will be deducted. 4. Verticality: When writing articles in professional fields, you cannot write articles across fields at will. You will not get appropriate recommendations, you will not be able to achieve the professionalism and refinement of your work, and it will be difficult to attract fans and readers. 5. Activity: high activity,

How long does it take to learn python crawler How long does it take to learn python crawler Oct 25, 2023 am 09:44 AM

The time it takes to learn Python crawlers varies from person to person and depends on factors such as personal learning ability, learning methods, learning time and experience. Learning Python crawlers is not just about learning the technology itself, but also requires good information gathering skills, problem solving skills and teamwork skills. Through continuous learning and practice, you will gradually grow into an excellent Python crawler developer.

Analysis and solutions to common problems of PHP crawlers Analysis and solutions to common problems of PHP crawlers Aug 06, 2023 pm 12:57 PM

Analysis of common problems and solutions for PHP crawlers Introduction: With the rapid development of the Internet, the acquisition of network data has become an important link in various fields. As a widely used scripting language, PHP has powerful capabilities in data acquisition. One of the commonly used technologies is crawlers. However, in the process of developing and using PHP crawlers, we often encounter some problems. This article will analyze and give solutions to these problems and provide corresponding code examples. 1. Description of the problem that the data of the target web page cannot be correctly parsed.

Efficient Java crawler practice: sharing of web data crawling techniques Efficient Java crawler practice: sharing of web data crawling techniques Jan 09, 2024 pm 12:29 PM

Java crawler practice: How to efficiently crawl web page data Introduction: With the rapid development of the Internet, a large amount of valuable data is stored in various web pages. To obtain this data, it is often necessary to manually access each web page and extract the information one by one, which is undoubtedly a tedious and time-consuming task. In order to solve this problem, people have developed various crawler tools, among which Java crawler is one of the most commonly used. This article will lead readers to understand how to use Java to write an efficient web crawler, and demonstrate the practice through specific code examples. 1. The base of the reptile

PHP practice: crawling Bilibili barrage data PHP practice: crawling Bilibili barrage data Jun 13, 2023 pm 07:08 PM

Bilibili is a popular barrage video website in China. It is also a treasure trove, containing all kinds of data. Among them, barrage data is a very valuable resource, so many data analysts and researchers hope to obtain this data. In this article, I will introduce the use of PHP language to crawl Bilibili barrage data. Preparation work Before starting to crawl barrage data, we need to install a PHP crawler framework Symphony2. You can enter through the following command

Efficiently crawl web page data: combined use of PHP and Selenium Efficiently crawl web page data: combined use of PHP and Selenium Jun 15, 2023 pm 08:36 PM

With the rapid development of Internet technology, Web applications are increasingly used in our daily work and life. In the process of web application development, crawling web page data is a very important task. Although there are many web scraping tools on the market, these tools are not very efficient. In order to improve the efficiency of web page data crawling, we can use the combination of PHP and Selenium. First, we need to understand what PHP and Selenium are. PHP is a powerful

Tutorial on using PHP to crawl Douban movie reviews Tutorial on using PHP to crawl Douban movie reviews Jun 14, 2023 pm 05:06 PM

As the film market continues to expand and develop, people's demand for films is also getting higher and higher. As for movie evaluation, Douban Film Critics has always been a more authoritative and popular choice. Sometimes, we also need to perform certain analysis and processing on Douban film reviews, which requires using crawler technology to obtain information about Douban film reviews. This article will introduce a tutorial on how to use PHP to crawl Douban movie reviews for your reference. Obtain the page address of Douban movies. Before crawling Douban movie reviews, you need to obtain the page address of Douban movies. OK

Practical crawler practice: using PHP to crawl stock information Practical crawler practice: using PHP to crawl stock information Jun 13, 2023 pm 05:32 PM

The stock market has always been a topic of great concern. The daily rise, fall and changes in stocks directly affect investors' decisions. If you want to understand the latest developments in the stock market, you need to obtain and analyze stock information in a timely manner. The traditional method is to manually open major financial websites to view stock data one by one. This method is obviously too cumbersome and inefficient. At this time, crawlers have become a very efficient and automated solution. Next, we will demonstrate how to use PHP to write a simple stock crawler program to obtain stock data. allow

See all articles