Using a browser or browser-like method to parse a page is far less fast than regular analysis. If you want to use a selector, you have to build something. This is not a labor-saving job
However, the biggest problem with regular parsing is that once someone else changes the version, you may find it easier to change it
Language is not a problem. The specific business depends on the module. There must be a useful http library, a useful concurrency library, a useful job scheduling library, and a useful markup language parsing library. These are all available and the language has good performance. Having a more beautiful syntax depends on whether most people in the company can accept this language. From a broad perspective, python, java, ruby, nodejs, c# all meet these conditions. As for how to choose, it depends on the following conditions.
scrapy +1
It is very convenient to use, has a lot of functions, and the documentation is very clear:
scrapy official website
The questioner has already added the python tag himself, why do you still ask about the language...
The company I work for uses Java.
Using a browser or browser-like method to parse a page is far less fast than regular analysis. If you want to use a selector, you have to build something. This is not a labor-saving job
However, the biggest problem with regular parsing is that once someone else changes the version, you may find it easier to change it
nodejs +1
I know a lot about python, but occasionally I use java
I have used nokogiri when writing ruby, but for high efficiency, python is more convenient
node +1
Language is not a problem. The specific business depends on the module. There must be a useful http library, a useful concurrency library, a useful job scheduling library, and a useful markup language parsing library. These are all available and the language has good performance. Having a more beautiful syntax depends on whether most people in the company can accept this language. From a broad perspective, python, java, ruby, nodejs, c# all meet these conditions. As for how to choose, it depends on the following conditions.
We wrote it in ruby