I am about to be a sophomore, I have studied Python by myself, and I know basic grammar. I want to learn crawling, but I feel that it involves a lot of knowledge. Is there anyone who has experienced it and can summarize what they know, or how to learn Python crawling?
When learning crawlers, you must learn from needs. You see, there are so many junior crawlers on the Internet crawling for jokes, pictures of beautiful women, etc. You can get these simple crawlers in three days.
But if you go in depth, it is very difficult, and there are many aspects involved.
Getting started is not difficult, you can read this--
How to learn Python crawler [Introduction] https://zhuanlan.zhihu.com/p/...
In principle, it is an http request, a little more is session and cookie, and a little more is verification code recognition.
As for the tool, the request tool can use urllib2, or even better, the request library. If the request comes in and needs to be parsed, that is beautifulsoup.
Python basic tutorial | Novice tutorial http://www.runoob.com/python/...
Beautiful Soup 4.2.0 documentation — Beautiful Soup 4.2.0 documentation https://www.crummy.com/softwa...
Crawler performance: NodeJs VS Python - QueenKing - SegmentFault /a/11...
Use KNN for verification code recognition - QueenKing - SegmentFault /a/11...
You can refer to the Python-Scrapy crawler framework, which has a Chinese manual.