I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?
I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?
python3
<code class="python"> import urllib import queue import threading def download(queue,lck): """ 工作者,当队列中没有任务的时候就执行退出。 """ while not queue.empty(): pg = queue.get() #在此写 抓取网页的代码 #然后把抓到的内容写入文件 lck.acquire() print ('第 %d 页已完成'%pg) lck.release() queue.task_done() def main(): """ 主线程, """ print ('开始下载……') lck = threading.Lock() q = queue.Queue() for pg in range(1,106): #网站内容有105页 q.put(pg) for i in range(10):#十个线程 t = threading.Thread(target=download, args=(q,lck)) t.start() q.join() # 等待所以任务完成 print ('结束') if __name__ == '__main__': main() </code>