


Explain the example code of writing Python crawler to capture gif images on Rampage comics
This article explains how to write an example code for Python crawler to capture gif images on Rampage comics. The example code is Python3, which uses the urllib module, request module and BeautifulSoup module. Friends in need can refer to this article
The crawler I want to introduce is to grab the interesting GIF pictures on the Rampage comics for offline viewing. The crawler was developed using python3.3, mainly using the urllib, request and BeautifulSoup modules.
The urllib module provides a high-level interface for obtaining data from the World Wide Web. When we use urlopen() to open a URL, it is equivalent to us using Python's built-in open() to open a file. But the difference is that the former receives a URL as a parameter, and there is no way to perform a seek operation on the open file stream (from a low-level perspective, because it is actually a socket, it is natural that there is no way to perform a seek operation), while the latter What is received is a local file name.
Python's BeautifulSoup module can help you parse HTML and XML
First of all, you usually write a web crawler, that is, crawl the html source code and other content of the web page, and then analyze and extract the corresponding content.
This kind of work of analyzing html content, if you just use the ordinary regular expression re module to match bit by bit, it is basically enough for analyzing web pages with simpler content.
But if you have to parse HTML that has a heavy workload and complicated content, you will find it impossible or difficult to implement using the re module.
If you use the beautifulsoup module to help you analyze html source code, you will find that things become so simple, which greatly improves the efficiency of analyzing html source code.
Note: BeautifulSoup is a third-party library, I use bs4. urllib2 is assigned to urllib.request in python3. The original text in the document is as follows.
Note: The urllib2 module has been split across several modules in Python 3 named urllib.requestand urllib.error.
The crawler source code is as follows
# -*- coding: utf-8 -*- import urllib.request import bs4,os page_sum = 1 #设置下载页数 path = os.getcwd() path = os.path.join(path,'暴走GIF') if not os.path.exists(path): os.mkdir(path) #创建文件夹 url = "http://baozoumanhua.com/gif/year" #url地址 headers = { #伪装浏览器 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)' ' Chrome/32.0.1700.76 Safari/537.36' } for count in range(page_sum): req = urllib.request.Request( url = url+str(count+1), headers = headers ) print(req.full_url) content = urllib.request.urlopen(req).read() soup = bs4.BeautifulSoup(content) # BeautifulSoup img_content = soup.findAll('img',attrs={'style':'width:460px'}) url_list = [img['src'] for img in img_content] #列表推导 url title_list = [img['alt'] for img in img_content] #图片名称 for i in range(url_list.__len__()) : imgurl = url_list[i] filename = path + os.sep +title_list[i] + ".gif" print(filename+":"+imgurl) #打印下载信息 urllib.request.urlretrieve(imgurl,filename) #下载图片
On line 15, you can modify the number of downloaded pages and save this file as baozougif.py. After running the command python baozougif.py, a folder of "Rampage GIF" will be generated in the same directory. All pictures will be automatically Download to this directory.
The above is the detailed content of Explain the example code of writing Python crawler to capture gif images on Rampage comics. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

Regular expressions are powerful tools for pattern matching and text manipulation in programming, enhancing efficiency in text processing across various applications.

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...
