python - 使用Scrapy中的Request的时候,怎么把拿到的内容编码转换为utf-8?
PHPz
PHPz 2017-04-18 09:06:14
0
2
1133

当使用第三方库requests的时候,可以这样转换:

import requests

html = requests.get('http://example.com')
html.encoding = 'utf-8'

问题:
使用Scrapy中的Request的时候,怎么把拿到的内容编码转换为utf-8?

demo:

import scrapy


class StackOverflowSpider(scrapy.Spider):
    name = 'stackoverflow'
    start_urls = ['http://stackoverflow.com/questions?sort=votes']

    def parse(self, response):
        for href in response.css('.question-summary h3 a::attr(href)'):
            full_url = response.urljoin(href.extract())
            yield scrapy.Request(full_url, callback=self.parse_question)

    def parse_question(self, response):
        yield {
            'title': response.css('h1 a::text').extract_first(),
            'votes': response.css('.question .vote-count-post::text').extract_first(),
            'body': response.css('.question .post-text').extract_first(),
            'tags': response.css('.question .post-tag::text').extract(),
            'link': response.url,
        }
PHPz
PHPz

学习是最好的投资!

reply all(2)
大家讲道理

Trying to answer your question, I feel like your understanding of python coding is a bit off.
1. Both requests and requests are just implementation packages of the http protocol.
The encoding of the packet return message comes from the website visited by the HTTP protocol. The encoding format will be written in the header of the http protocol.
For example:
r=requests.get('http://www.baidu.com')
print r.headers['Content-Type']
Output:
text/html;charset=UTF-8
This shows the UTF-8 format of the response message.
Scrapy.Request is the same.
2. If the returned charset=gbk2312, you can determine whether to transcode it to the encoding you need based on your code needs.
r=requests.get('http://www.baidu.com')
print r.content[:1000].decode('utf-8')
print r.content[:1000].decode(' utf-8').encode('gbk')

洪涛

Just use decode and encode, regardless of whether it’s scrapy or not.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template