When writing a crawler, I want to save the crawled data into the database. There are many entries in each page. For example, a person may have many visitors, so I insert the unloading loop,
try:
sql_visitor='INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("%s",%d,"%s",%d,"%s")'%(ownername,owneruid,visitorname,visitoruid,visitortime)
print sql_visitor
self.cursor.execute(sql_visitor)
self.connect.commit()
except Exception as e:
print e
One page has one thread. If I dislike the slowness, I opened 5
max_threads=5
while uid < 8000000 or threadlist:
for thread1 in threadlist:
if not thread1.is_alive():
threadlist.remove(thread1)
while len(threadlist) < max_threads and uid < 8000000:
uid+=1
thread2=threading.Thread(target=run,args=(uid,))
thread2.setDaemon(True)
thread2.start()
threadlist.append(thread2)
time.sleep(5)
Run very smoothly:
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("huosai7",4893,"Liang2017",7252799,"2017-5-22 21:06")
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie ) VALUE( "huosai7",4893,0,0,0,0,0,0,0,0,0,0,0,0,"","","2100-01-01 12:00","2100- 01-01 12:00","2100-01-01 12:00","2004-1-3 19:28",0,"2100-01-01 12:00",0)
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("龙乐",4894,"Liang2017",7252799,"2017-5-22 21:06")
(1062, "Duplicate entry '4894- 7252799-2017-05-22 21:06:00' for key 'PRIMARY'")
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang ,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie) VALUE("龙乐",4894,0,0,0,0,0,0,0,0,0, 0,0,0,"","","2100-01-01 12:00","2100-01-01 12:00","2100-01-01 12:00","2004-1- 3 20:21",0,"2100-01-01 12:00",0)
..........
So I set max_thread to 10, so the results are as follows:
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("xiao61",4889,"Liang2017",7252799,"2017-5-22 21:06")
(2006, 'MySQL server has gone away')
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian, shengri ,xingbie) VALUE("xiao61",4889,0,0,0,0,0,0,0,0,0,0,0,0,"","","2100-01-01 12:00 ","2100-01-01 12:00","2100-01-01 12:00","2004-1-3 15:56",0,"2100-01-01 12:00",0)
(2006, 'MySQL server has gone away')
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("confused cool bear",4897,"Liang2017",7252799,"2017-5-22 21:06")
(2006, 'MySQL server has gone away')
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian, shengri ,xingbie) VALUE("confused cool bear",4897,611,0,1655,0,0,2,0,0,0,34,0,0,"","","2007-3-27 00:37","2007-3-27 00:37","2007-3-27 00:37","2004-1-3 21:08",0,"2100-01-01 12:00" ,1)
(2006, 'MySQL server has gone away')
..........
It can be seen that 2006 came out, and then I set max_thread to 30, and the results are as follows:
That’s it, is it detailed enough? If it’s not detailed enough, what else do you need? Just tell me!
Look here, I guess you are using pymysql, its thread safety description is 1, and the corresponding pep249 has a detailed description:
Threads can share modules but not connections. This means that you may have to create a connection in each thread.
Na~Why not use ORM to do it?