Optimizing HTTP Request Dispatch in Python
Handling large-scale HTTP requests can pose a challenge in Python, especially for tasks involving thousands of URLs. This article explores a highly efficient solution for dispatching 100,000 HTTP requests in Python 2.6, leveraging concurrency and threading to maximize performance.
Twistedless Solution:
The following code snippet provides a fast and effective method for sending HTTP requests concurrently:
from urlparse import urlparse from threading import Thread import httplib, sys from Queue import Queue concurrent = 200 def doWork(): while True: url = q.get() status, url = getStatus(url) doSomethingWithResult(status, url) q.task_done() def getStatus(ourl): try: url = urlparse(ourl) conn = httplib.HTTPConnection(url.netloc) conn.request("HEAD", url.path) res = conn.getresponse() return res.status, ourl except: return "error", ourl def doSomethingWithResult(status, url): print status, url q = Queue(concurrent * 2) for i in range(concurrent): t = Thread(target=doWork) t.daemon = True t.start() try: for url in open('urllist.txt'): q.put(url.strip()) q.join() except KeyboardInterrupt: sys.exit(1)
Explanation:
This approach has been shown to be faster than the Twisted-based solution while also reducing CPU usage. It provides a highly efficient and reliable way to handle large-scale HTTP requests in Python 2.6.
The above is the detailed content of How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?. For more information, please follow other related articles on the PHP Chinese website!