This article will introduce using Python to write a multi-threaded HTTP downloader and generate an .exe executable file.
Environment: windows/Linux + Python2.7.x
Single thread
Before introducing multi-threading, we first introduce single-threading. The idea of writing a single thread is:
1. Parse the url;
2. Connect to the web server;
3. Construct an http request Package;
4. Download the file.
The following is explained through the code.
Parse url
Parses the url input by the user. If the parsed path is empty, the assigned value is '/'; if the port number is empty, the assigned value is "80"; the file name of the downloaded file can be changed according to the user's wishes (enter 'y' to indicate change, enter other to indicate No changes are required).
Listed below are several parsing functions:
#解析host和path def analyHostAndPath(totalUrl): protocol,s1 = urllib.splittype(totalUrl) host, path = urllib.splithost(s1) if path == '': path = '/' return host, path #解析port def analysisPort(host): host, port = urllib.splitport(host) if port is None: return 80 return port #解析filename def analysisFilename(path): filename = path.split('/')[-1] if '.' not in filename: return None return filename
Connect to the web server
Use socket Module, connect to the web server based on the host and port obtained by parsing the url, the code is as follows:
import socket from analysisUrl import port,host ip = socket.gethostbyname(host) s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.connect((ip, port)) print "success connected webServer!!"
Construct the http request package
Construct an HTTP request package based on the path, host, and port obtained by parsing the url.
from analysisUrl import path, host, port packet = 'GET ' + path + ' HTTP/1.1\r\nHost: ' + host + '\r\n\r\n'
Download file
Send the file to the server based on the constructed http request package and capture the response message "Content-Length" of the header.
def getLength(self): s.send(packet) print "send success!" buf = s.recv(1024) print buf p = re.compile(r'Content-Length: (\d*)') length = int(p.findall(buf)[0]) return length, buf
Download the file and calculate the time it takes to download.
def download(self): file = open(self.filename,'wb') length,buf = self.getLength() packetIndex = buf.index('\r\n\r\n') buf = buf[packetIndex+4:] file.write(buf) sum = len(buf) while 1: buf = s.recv(1024) file.write(buf) sum = sum + len(buf) if sum >= length: break print "Success!!" if __name__ == "__main__": start = time.time() down = downloader() down.download() end = time.time() print "The time spent on this program is %f s"%(end - start)
Multi-threading
Capture the "Content-Length" field in the header of the response message and combine it with the thread number, locked and downloaded in sections. Different from the single-threaded one, all the code is integrated into one file here, and more Python built-in modules are used in the code.
Get "Content-Length":
def getLength(self): opener = urllib2.build_opener() req = opener.open(self.url) meta = req.info() length = int(meta.getheaders("Content-Length")[0]) return length
According to the obtained Length, divide the range based on the number of threads:
def get_range(self): ranges = [] length = self.getLength() offset = int(int(length) / self.threadNum) for i in range(self.threadNum): if i == (self.threadNum - 1): ranges.append((i*offset,'')) else: ranges.append((i*offset,(i+1)*offset)) return ranges
Realize multi-threaded downloading, lock the thread when writing content to the file, and use with lock instead of lock.acquire()...lock.release(); Use file.seek() to set the file offset address to ensure the accuracy of writing files.
def downloadThread(self,start,end): req = urllib2.Request(self.url) req.headers['Range'] = 'bytes=%s-%s' % (start, end) f = urllib2.urlopen(req) offset = start buffer = 1024 while 1: block = f.read(buffer) if not block: break with lock: self.file.seek(offset) self.file.write(block) offset = offset + len(block) def download(self): filename = self.getFilename() self.file = open(filename, 'wb') thread_list = [] n = 1 for ran in self.get_range(): start, end = ran print 'starting:%d thread '% n n += 1 thread = threading.Thread(target=self.downloadThread,args=(start,end)) thread.start() thread_list.append(thread) for i in thread_list: i.join() print 'Download %s Success!'%(self.file) self.file.close()
Running results:
Convert (*.py) files into ( *.exe) executable file
After writing a tool, how do you let people who do not have Python installed use this tool? This requires converting the .py file into an .exe file.
The py2exe module of Python is used here. It is the first time to use it, so I will introduce it:
py2exe is a executable file that converts a Python script into an independently executable file on Windows (*.exe ) tool, so that you can run this executable program on Windows without installing Python.
Next, in the same directory as multiThreadDownload.py, create the mysetup.py file and write:
from distutils.core import setup import py2exe setup(console=["multiThreadDownload.py"])
Then execute the command: Python mysetup.py py2exe
Generate the dist folder, the multiTjhreadDownload.exe file is located in it, click to run:
demo download address: HttpFileDownload_jb51.rar
The above is the entire content of this article. I hope it will be helpful to everyone's learning, and I also hope that everyone will support the PHP Chinese website.
For more articles related to Python implementation of multi-threaded HTTP downloader examples, please pay attention to the PHP Chinese website!