Use superagent to initiate an http request. After getting the response, use cheerio to parse the text. Then you can use syntax similar to jQuery to operate the Dom.
Mongodb can be used for data storage, and mongoose can be used as the corresponding ORM tool.
The learning cost of Python is not very high, let me talk about my experience At first I used urllib, or urllib2, and I was very excited to find that I could crawl the data Then I saw a webpage where Connection is Closed, and I learned about httplib2 Then I saw requests again Now I’m watching scrapy
Crawling is a metaphysics. I think the goal of the question is to check the price trend. However, I think we can set a small goal first and crawl Baidu Encyclopedia~
Recommend https://github.com/alsotang/n...
Crawler tool chain:
Use superagent to initiate an http request. After getting the response, use cheerio to parse the text. Then you can use syntax similar to jQuery to operate the Dom.
Mongodb can be used for data storage, and mongoose can be used as the corresponding ORM tool.
Code sample
https://github.com/zhanyouwei...
Isn’t the cost of learning Python known to be the lowest? ? It’s easy to get started with Python crawlers. There are many online tutorials
scrapy is definitely ranked first! ! !
To get started with Node decisively, you can refer to my novice experience https://github.com/hanzichi/f...
The learning cost of Python is not very high, let me talk about my experience
At first I used urllib, or urllib2, and I was very excited to find that I could crawl the data
Then I saw a webpage where Connection is Closed, and I learned about httplib2
Then I saw requests again
Now I’m watching scrapy
Crawling is a metaphysics. I think the goal of the question is to check the price trend. However, I think we can set a small goal first and crawl Baidu Encyclopedia~
Node crawler only costs 20
nodejs
superagent + cheerio
nodejs
request
+cheerio
is also good.