


Python crawler multi-threading detailed explanation and example code
Python supports multi-threading, mainly through the two modules thread and threading. The thread module is a relatively low-level module, and the threading module provides some packaging for thread, making it more convenient to use.
Although python's multi-threading is limited by GIL and is not true multi-threading, it can still significantly improve efficiency for I/O-intensive calculations, such as crawlers.
An example is used below to verify the efficiency of multi-threading. The code only involves page acquisition and does not parse it.
# -*-coding:utf-8 -*- import urllib2, time import threading class MyThread(threading.Thread): def __init__(self, func, args): threading.Thread.__init__(self) self.args = args self.func = func def run(self): apply(self.func, self.args) def open_url(url): request = urllib2.Request(url) html = urllib2.urlopen(request).read() print len(html) return html
if __name__ == '__main__': # 构造url列表 urlList = [] for p in range(1, 10): urlList.append('http://s.wanfangdata.com.cn/Paper.aspx?q=%E5%8C%BB%E5%AD%A6&p=' + str(p))
# 一般方式 n_start = time.time() for each in urlList: open_url(each) n_end = time.time() print 'the normal way take %s s' % (n_end-n_start)
# 多线程 t_start = time.time() threadList = [MyThread(open_url, (url,)) for url in urlList] for t in threadList: t.setDaemon(True) t.start() for i in threadList: i.join() t_end = time.time() print 'the thread way take %s s' % (t_end-t_start)
Use two methods to obtain 10 web pages with relatively slow access speed. The general method takes 50 seconds, and multi-threading takes 10 seconds.
Interpretation of multi-threaded code:
# 创建线程类,继承Thread类 class MyThread(threading.Thread): def __init__(self, func, args): threading.Thread.__init__(self) # 调用父类的构造函数 self.args = args self.func = func def run(self): # 线程活动方法 apply(self.func, self.args)
threadList = [MyThread(open_url, (url,)) for url in urlList] # 调用线程类创建新线程,返回线程列表 for t in threadList: t.setDaemon(True) # 设置守护线程,父线程会等待子线程执行完后再退出 t.start() # 线程开启 for i in threadList: i.join() # 等待线程终止,等子线程执行完后再执行父线程
The above is the entire content of this article, I hope it will be helpful to everyone’s study.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

Alternative usage of Python parameter annotations In Python programming, parameter annotations are a very useful function that can help developers better understand and use functions...

How do Python scripts clear output to cursor position at a specific location? When writing Python scripts, it is common to clear the previous output to the cursor position...

Exploration of cracking verification codes using Python In daily network interactions, verification codes are a common security mechanism to prevent malicious manipulation of automated programs...

Many developers rely on PyPI (PythonPackageIndex)...

Choice of Python Cross-platform desktop application development library Many Python developers want to develop desktop applications that can run on both Windows and Linux systems...

Getting started with Python: Hourglass Graphic Drawing and Input Verification This article will solve the variable definition problem encountered by a Python novice in the hourglass Graphic Drawing Program. Code...
