Python implementation of multi-threaded HTTP downloader example-Python Tutorial-php.cn

Python implementation of multi-threaded HTTP downloader example

高洛峰

Release： 2017-02-13 13:40:01

Original

1633 people have browsed it

This article will introduce using Python to write a multi-threaded HTTP downloader and generate an .exe executable file.

Environment: windows/Linux + Python2.7.x

Single thread

Before introducing multi-threading, we first introduce single-threading. The idea of writing a single thread is:

1. Parse the url;

2. Connect to the web server;

3. Construct an http request Package;

4. Download the file.

The following is explained through the code.

Parse url

Parses the url input by the user. If the parsed path is empty, the assigned value is '/'; if the port number is empty, the assigned value is "80"; the file name of the downloaded file can be changed according to the user's wishes (enter 'y' to indicate change, enter other to indicate No changes are required).

Listed below are several parsing functions:

#解析host和path
def analyHostAndPath(totalUrl):
  protocol,s1 = urllib.splittype(totalUrl)
  host, path = urllib.splithost(s1)
  if path == &#39;&#39;:
    path = &#39;/&#39;
  return host, path

#解析port
def analysisPort(host):
  host, port = urllib.splitport(host)
  if port is None:
    return 80
  return port

#解析filename
def analysisFilename(path):
  filename = path.split(&#39;/&#39;)[-1]
  if &#39;.&#39; not in filename:
    return None
  return filename

Copy after login

Connect to the web server

Use socket Module, connect to the web server based on the host and port obtained by parsing the url, the code is as follows:

import socket
from analysisUrl import port,host

ip = socket.gethostbyname(host)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((ip, port))

print "success connected webServer！！"

Copy after login

Construct the http request package

Construct an HTTP request package based on the path, host, and port obtained by parsing the url.

from analysisUrl import path, host, port

packet = &#39;GET &#39; + path + &#39; HTTP/1.1\r\nHost: &#39; + host + &#39;\r\n\r\n&#39;

Copy after login

Download file

Send the file to the server based on the constructed http request package and capture the response message "Content-Length" of the header.

def getLength(self):
    s.send(packet)
    print "send success!"
    buf = s.recv(1024)
    print buf
    p = re.compile(r&#39;Content-Length: (\d*)&#39;)
    length = int(p.findall(buf)[0])
    return length, buf

Copy after login

Download the file and calculate the time it takes to download.

def download(self):
    file = open(self.filename,&#39;wb&#39;)
    length,buf = self.getLength()
    packetIndex = buf.index(&#39;\r\n\r\n&#39;)
    buf = buf[packetIndex+4:]
    file.write(buf)
    sum = len(buf)
    while 1:
      buf = s.recv(1024)
      file.write(buf)
      sum = sum + len(buf)
      if sum >= length:
        break
    print "Success!!"

if __name__ == "__main__":
  start = time.time()
  down = downloader()
  down.download()
  end = time.time()
  print "The time spent on this program is %f s"%(end - start)

Copy after login

Multi-threading

Capture the "Content-Length" field in the header of the response message and combine it with the thread number, locked and downloaded in sections. Different from the single-threaded one, all the code is integrated into one file here, and more Python built-in modules are used in the code.

Get "Content-Length":

def getLength(self):
    opener = urllib2.build_opener()
    req = opener.open(self.url)
    meta = req.info()
    length = int(meta.getheaders("Content-Length")[0])
    return length

Copy after login

According to the obtained Length, divide the range based on the number of threads:

def get_range(self):
    ranges = []
    length = self.getLength()
    offset = int(int(length) / self.threadNum)
    for i in range(self.threadNum):
      if i == (self.threadNum - 1):
        ranges.append((i*offset,&#39;&#39;))
      else:
        ranges.append((i*offset,(i+1)*offset))
    return ranges

Copy after login

Realize multi-threaded downloading, lock the thread when writing content to the file, and use with lock instead of lock.acquire()...lock.release(); Use file.seek() to set the file offset address to ensure the accuracy of writing files.

def downloadThread(self,start,end):
    req = urllib2.Request(self.url)
    req.headers[&#39;Range&#39;] = &#39;bytes=%s-%s&#39; % (start, end)
    f = urllib2.urlopen(req)
    offset = start
    buffer = 1024
    while 1:
      block = f.read(buffer)
      if not block:
        break
      with lock:
        self.file.seek(offset)
        self.file.write(block)
        offset = offset + len(block)

  def download(self):
    filename = self.getFilename()
    self.file = open(filename, &#39;wb&#39;)
    thread_list = []
    n = 1
    for ran in self.get_range():
      start, end = ran
      print &#39;starting:%d thread &#39;% n
      n += 1
      thread = threading.Thread(target=self.downloadThread,args=(start,end))
      thread.start()
      thread_list.append(thread)

    for i in thread_list:
      i.join()
    print &#39;Download %s Success!&#39;%(self.file)
    self.file.close()

Copy after login

Running results:

Python implementation of multi-threaded HTTP downloader example

Convert (*.py) files into ( *.exe) executable file

After writing a tool, how do you let people who do not have Python installed use this tool? This requires converting the .py file into an .exe file.

The py2exe module of Python is used here. It is the first time to use it, so I will introduce it:

py2exe is a executable file that converts a Python script into an independently executable file on Windows (*.exe ) tool, so that you can run this executable program on Windows without installing Python.

Next, in the same directory as multiThreadDownload.py, create the mysetup.py file and write:

from distutils.core import setup
import py2exe

setup(console=["multiThreadDownload.py"])

Copy after login

Then execute the command: Python mysetup.py py2exe

Generate the dist folder, the multiTjhreadDownload.exe file is located in it, click to run:

Python implementation of multi-threaded HTTP downloader example

demo download address: HttpFileDownload_jb51.rar

The above is the entire content of this article. I hope it will be helpful to everyone's learning, and I also hope that everyone will support the PHP Chinese website.

For more articles related to Python implementation of multi-threaded HTTP downloader examples, please pay attention to the PHP Chinese website!