Home > Backend Development > Python Tutorial > How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

Susan Sarandon
Release: 2024-11-17 16:27:02
Original
1003 people have browsed it

How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?

Optimizing HTTP Request Dispatch in Python

Handling large-scale HTTP requests can pose a challenge in Python, especially for tasks involving thousands of URLs. This article explores a highly efficient solution for dispatching 100,000 HTTP requests in Python 2.6, leveraging concurrency and threading to maximize performance.

Twistedless Solution:

The following code snippet provides a fast and effective method for sending HTTP requests concurrently:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
    while True:
        url = q.get()
        status, url = getStatus(url)
        doSomethingWithResult(status, url)
        q.task_done()

def getStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(url.netloc)   
        conn.request("HEAD", url.path)
        res = conn.getresponse()
        return res.status, ourl
    except:
        return "error", ourl

def doSomethingWithResult(status, url):
    print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for url in open('urllist.txt'):
        q.put(url.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)
Copy after login

Explanation:

  • A thread pool is created with a configurable level of concurrency (in this case, 200).
  • Each thread in the pool executes the doWork function, which fetches URLs from a queue and sends HTTP HEAD requests to obtain status codes.
  • The results are processed in the doSomethingWithResult function, which can be customized to log or perform other operations based on the response.
  • The queue ensures that tasks are distributed evenly among the threads, minimizing contention and increasing throughput.

This approach has been shown to be faster than the Twisted-based solution while also reducing CPU usage. It provides a highly efficient and reliable way to handle large-scale HTTP requests in Python 2.6.

The above is the detailed content of How can I optimize HTTP request dispatch for 100,000 URLs in Python 2.6?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template