When using scrapy
to grab data, use the itemloader
class, and when the value taken out using selector
is empty, enter scrapy.Field ()
calls filter()
, and the selector value is not empty and indeed returns "has value". If selector
is taken out, []
Or ""
, then value
will not return "no value" after entering filter()
def filter(value):
if value:
return "有值"
else:
return "无值"
# 下面就简写了,熟悉的应该能看的懂
scrapy.Field(filter())
Is there any way to capture the empty value and turn it into "no value" after filterer()
Thanks for the invitation~
I don’t know much about Scrapy, so I can’t say much about the topic. The general idea of the crawler I wrote myself in PHP is:
1. First, according to regular rules and some loops, put the pages to be collected into the queue, and press Category classification, for example, a queue for paginated list pages, and a queue for data content pages in the list.
2. Then use xpath to crawl the data of the relevant content pages. During the crawling process, some of the crawled data will be processed as required by the subject.
3. Assemble the data and save the data according to the standards you need.
That’s roughly it. Most of my crawler frameworks are probably based on this idea. They just add anti-crawling mechanism, multi-threading, multi-process, incremental crawling and other functions on this basis. Therefore, the questioner found your framework
爬取数据那里进行处理或组装数据的地方进行处理都行
.