PC access has become a mobile address problem#python#scrapy
淡淡烟草味
淡淡烟草味 2017-06-30 09:55:13
0
2
1061

Use scrapy to crawl Himalaya and crawl the PC address. The response of the entry link is fine, but the subsequent response sees the mobile address. . . .

The spider code is as follows:

class SpxmlySpider(scrapy.Spider):
    name = 'ximalaya'
    allowed_domains = ["ximalaya.com"]
    # 保存每页链接
    start_urls = ['http://www.ximalaya.com/dq/all/{}'.format(num) for num in range(2, 3)]  #先改为第二页试试

    def parse(self, response):
        # 取出专辑链接
        print(response)
        mainurls = response.xpath('//p[@class="albumfaceOutter"]/a/@href').extract()
        # for url in mainurls:
        #     yield Request(url = url, callback=self.parse_details)
        print(mainurls[0])
        yield Request(url = mainurls[0], dont_filter=True, callback = self.parse_details)

# TODO  为什么PC端访问会变成移动地址问题!!!!!!!!!!!!!!!!!
    def parse_details(self, response):
        item = XimalayaItem()
        print(response)
        ......以下省略

Console output:

I have written a middlewares.RotateUserAgentMiddleware, which is effective, and the output content can also be seen.

Is it triggering any anti-crawling mechanism?

淡淡烟草味
淡淡烟草味

reply all(2)
小葫芦

It should be because your headers do not have user-agent set up

学霸

Configure the request headers carefully. Determining whether it is a mobile terminal usually relies on user-agent
You can access the data without anything, which also shows that the target website does not pay much attention to anti-hotlinking

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!