字符串 - Python编码问题?
PHPz
PHPz 2017-04-18 10:33:39
0
2
530

我用Python3的requests库从一个api请求一个json数据,然后试图去print出来:


    res = requests.get("http://aaa.com/bbb.php")
    res.encoding='utf-8'
    name = res.json(encoding = "utf8")["name"]
    print(name)

也试了一下方法:

name.encode('utf8').decode("utf8")
print(name)

我这个name字符串有可能有中文,数字,英文,也有可能有阿拉伯文。或者只有他们之中的一个。
我每次print的时候有时候能输出成功,有时候有以下错误:

  File "demo.py", line 53, in play_one
    print(json.loads(result_str)["name"])
UnicodeEncodeError: 'gbk' codec can't encode character '\u062f' in position 0: illegal multibyte sequence

我该怎么处理这个字符串,有可能同一个字符串混有不同的编码?还是我获取到的字符串每次都是不同编码的,我应该怎么正确输去这个不确定的字符串?

PHPz
PHPz

学习是最好的投资!

reply all(2)
大家讲道理

Standard JSON does not require specifying encoding.

You are using the Simplified Chinese version of Windows. The system console needs to output characters in GBK encoding, but your character "U+062F د ARABIC LETTER DAL" has no correspondence in GBK encoding, so it cannot be output.

You can choose to write to a file, or install the Arabic version of Windows. Or use another operating system/terminal with better Unicode support.

小葫芦
  1. First you have to understand why requests have this problem

Requests will obtain the character set encoding from the Content-Type of the response header returned by the server. If the content-type has a charset field, then requests can correctly identify the encoding. Otherwise, the default ISO-8859-1 will be used. Please read this article for details. Blog code analysis Python requests library Chinese coding issues

Several methods are mentioned in the article, but it seems that 3.x has fixed this problem.

  1. My suggestion
    First go to the page manually to see what encoding the charset in the header part of this page is, assuming it is GBK

resp = requests.get(item_info_url)
resp.encoding = 'GBK'
html = resp.text
name = json.loads(html)['name']

# or
# 我不太用res.json这个方法==

res = requests.get("http://aaa.com/bbb.php")
res.encoding='GBK'
name = res.json()["name"]
print(name)
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template