Many friends have asked me recently that I am learning crawlers by myself. How far can I learn to find a job?
This article will talk about my own experience, about crawlers and work, for reference only.
What level of learning
Let’s target junior crawler engineers and list them briefly:
(necessary parts)
Language selection: generally understand one of Python, Java, and Golang
Familiar with multi-threaded programming, network programming, and HTTP protocol related
Have developed a complete crawler project (preferably a full-site crawler Experience, this will be mentioned below)
Anti-crawling related, cookie, ip pool, verification code, etc.
Proficient in using distributed
Understand message queues, such as RabbitMQ, Kafka, Redis, etc.
Have experience in data mining, natural language processing, information retrieval, machine learning
Familiar with APP data collection, middleman agent
Big data processing (Hive/MR /Spark/Storm)
Database Mysql, redis, mongdb
Familiar with Git operation and Linux environment development
Understanding js code, this is really important
How to improve
Just look at the tutorials on Zhihu to get started. As far as Python is concerned, knowing requests is of course not enough. You also need to understand scrapy and pyspider. Framework and scrapy_redis also need to understand the principles.
How to build a distributed system and how to solve the problems of memory and speed.
Reference What is the difference between scrapy-redis and scrapy?
What is full-site crawling?
The simplest example is to use a hook to search for keywords. There are 30 pages. Don’t think that crawling all 30 pages is all. If the website is crawled, you should find a way to crawl down all the data.
What method can you use to narrow down the scope through filtering and take your time?
At the same time, each position will also have recommended positions, and then write a crawler to collect recommendations.
The above is the detailed content of To what extent can a Python crawler learn to find a job?. For more information, please follow other related articles on the PHP Chinese website!