This is the code for sequential execution of a single process:
import requests,time,os,random
def img_down(url):
with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
fob.write(requests.get(url).content)
urllist=[]
with open("urllist.txt","r+") as u:
for a in u.readlines():
urllist.append(a.strip())
s=time.clock()
for i in range(len(urllist)):
img_down(urllist[i])
e=time.clock()
print ("time: %d" % (e-s))
This is the code for multi-process:
from multiprocessing import Pool
import requests,os,time,random
def img_down(url):
with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
fob.write(requests.get(url).content)
if __name__=="__main__":
urllist=[]
with open("urllist.txt","r+") as urlfob:
for s in urlfob.readlines():
urllist.append(s.strip())
s=time.clock()
p=Pool()
for i in range(len(urllist)):
p.apply_async(img_down,args=(urllist[i],))
p.close()
p.join()
e=time.clock()
print ("time: {}".format(e-s))
But there is almost no difference between the time spent in single process and multi-process. The problem is probably that requests block IO. Is your understanding correct? How to modify the code to achieve the purpose of multi-process?
Thanks!
The bottleneck of writing files is disk IO, not CPU. Parallelism does not have much effect. You can try not to write files and then compare the times
Pool without parameters uses
os.cpu_count() or 1
If it is a single-core CPU, or the number cannot be collected, there is only one process.
That should be the reason.