python3.x - Python网络爬虫学习建议，初学者需要哪些准备？

Question

此前一直是做PHP开发的，现在想学习下爬虫开发，很疑惑呀不知道从何做起，请大家指教下学习线路，我是属于想要深入研究型的。网上看到很多示例感觉就像做采集，Url扩散爬去和分析部分的资料很少...求推荐学习线路...

PHP中文网 · Answer

Having done web development, I think making a crawler is very simple. Just make sure that this is the http protocol and it will be ok

Just tell me a few points

Crawling speed (control vs. speed trade-off)
- Multi-threading
- Multiple processes
  - Message Queue
Web page analysis
- Interface discovery-> Make good use of F12.Network
- xpath, re and other parsing libraries
- Structured data
Persistence->Database connection pool->Enable database connections to a certain number
Anti-crawler
- Ban IP->Proxy Pool->How to use proxy more rationally
- Verification code->OCR

迷茫 · Answer

You can first use PHP to implement the crawler and understand the principles. Curl can also do it, language is just a tool

天蓬老师 · Answer

Read a book called "Python Web Crawler".