import scrapy
from movie.items import MovieItem
class MeijuSpider(scrapy.Spider):
name = "meiju"
allowed_domains = ["alexa.cn"]
start_urls = ['www.alexa.cn/siterank']
def parse(self, response):
movies = response.xpath('//ul[@class="siterank-sitelist"]/li')
for each_movie in movies:
item = MovieItem()
item['name'] =each_movie.xpath('.//p[@class="infos"]').extract()[0]
yield item
The code is like this. What I want to capture in a loop is:
www.alexa.cn/siterank/2
www.alexa.cn/siterank/3
www.alexa.cn/siterank/4
...
I think the loop should be like this for i in range(2,10):
yield scrapy.Request('www.alexa.cn/siterank/%d'%i), but I don’t know how to fill it in Go in. Help
If you are sure about the scope, it is better to start with start_urls
There are examples on the official website. Regarding tracking the next page, the examples on the official website use recursion. The code on the official website is as follows:
I used Scrapy to write a Tieba crawler. I also used this recursive method to get the next page. The code is as follows: