In the process of using selenium to crawl 12306, I found that phantomjs cannot be used to crawl, and chromedriver can be used. It should be that phantomjs is detected and banned by the website. Using chromedriver will display the interface again, and the crawling efficiency is low.
Now I have two questions. I have been searching on Google for a long time and have not found an effective solution.
1. How to disguise phantomjs as much as possible
2. How to set up chromedriver so that it does not display the interface, or still Are there any other ways to improve crawling efficiency
grateful! ! !
You can achieve your needs through PyVirtualDisplay. The code is probably like this:
I don’t know if you have modified the header information of phantomjs, you can pass
This method modifies the header information of phantomjs. You can also try this
You can refer to my article to run selenium in headless mode