Scrapy how to get original start_url
代言
代言 2017-06-28 09:23:41
0
1
1024

ScrapyWhen crawling, the original start_url will change due to redirection or other reasons. How can I get the original start_url?

def start_requests(self):
    start_url = 'your_scrapy_start_url'
    yield Request(start_url, self.parse)
    
def parse(self, response):
    item = YourItem()
    item['start_url'] = 原始请求的start_url
    yield item
代言
代言

reply all(1)
为情所困

Reference article: Summary of common problems with Scrapy crawlers

Use the meta parameter in Request to transfer information

def start_requests(self):
    start_url = 'your_scrapy_start_url'
    yield Request(start_url, self.parse, meta={'start_url':start_url})
    
def parse(self, response):
    item = YourItem()
    item['start_url'] = response.meta['start_url']
    yield item
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template