84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
此前一直是做PHP开发的,现在想学习下爬虫开发,很疑惑呀不知道从何做起,请大家指教下学习线路,我是属于想要深入研究型的。网上看到很多示例感觉就像做采集,Url扩散爬去和分析部分的资料很少...求推荐学习线路、数据、视频等各种,能介绍下避坑攻略就更好啦。
学习是最好的投资!
Having done web development, I think making a crawler is very simple. Just make sure that this is the http protocol and it will be ok
Just tell me a few points
Crawling speed (control vs. speed trade-off)
Multi-threading
Multiple processes
Message Queue
Web page analysis
Interface discovery-> Make good use of F12.Network
xpath, re and other parsing libraries
Structured data
Persistence->Database connection pool->Enable database connections to a certain number
Anti-crawler
Ban IP->Proxy Pool->How to use proxy more rationally
Verification code->OCR
You can first use PHP to implement the crawler and understand the principles. Curl can also do it, language is just a tool
Read a book called "Python Web Crawler".
Having done web development, I think making a crawler is very simple. Just make sure that this is the http protocol and it will be ok
Just tell me a few points
Crawling speed (control vs. speed trade-off)
Multi-threading
Multiple processes
Message Queue
Web page analysis
Interface discovery-> Make good use of F12.Network
xpath, re and other parsing libraries
Structured data
Persistence->Database connection pool->Enable database connections to a certain number
Anti-crawler
Ban IP->Proxy Pool->How to use proxy more rationally
Verification code->OCR
You can first use PHP to implement the crawler and understand the principles. Curl can also do it, language is just a tool
Read a book called "Python Web Crawler".