Scrapy
When crawling, the original start_url
will change due to redirection or other reasons. How can I get the original start_url
?
def start_requests(self):
start_url = 'your_scrapy_start_url'
yield Request(start_url, self.parse)
def parse(self, response):
item = YourItem()
item['start_url'] = 原始请求的start_url
yield item
Reference article: Summary of common problems with Scrapy crawlers
Use the
meta
parameter inRequest
to transfer information