python - scrapy captures duplicate content of CNKI response
黄舟
黄舟 2017-06-30 09:55:07
0
3
867

Traverse the URL requesting page turning

for i in range(3):
    yield Request("http:xx/page/%s"%str(i),callback=self.parse_page)

The result is that the response request is successful, but the content is the same every time. It is the content of the first request. However, using Postman to request the paginated URLs separately does not have this problem. = = Have you been banned? It was never like this before

黄舟
黄舟

人生最曼妙的风景,竟是内心的淡定与从容!

reply all(3)
刘奇

Then we need to analyze the difference between the header requested when using postman or browser and the header requested when using scrapy

三叔

Recognized by anti-crawling

洪涛

Look at the log printed by the console to see if the next page has been crawled correctly
2017-06-29 09:26:13 [scrapy] DEBUG: Scraped from <200 http:xx/page/x>,
Pay attention to whether the last x (http:xx/page/x) has changed

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!