我用Python3的requests库从一个api请求一个json数据,然后试图去print出来:
res = requests.get("http://aaa.com/bbb.php")
res.encoding='utf-8'
name = res.json(encoding = "utf8")["name"]
print(name)
也试了一下方法:
name.encode('utf8').decode("utf8")
print(name)
我这个name字符串有可能有中文,数字,英文,也有可能有阿拉伯文。或者只有他们之中的一个。
我每次print的时候有时候能输出成功,有时候有以下错误:
File "demo.py", line 53, in play_one
print(json.loads(result_str)["name"])
UnicodeEncodeError: 'gbk' codec can't encode character '\u062f' in position 0: illegal multibyte sequence
我该怎么处理这个字符串,有可能同一个字符串混有不同的编码?还是我获取到的字符串每次都是不同编码的,我应该怎么正确输去这个不确定的字符串?
Standard JSON does not require specifying encoding.
You are using the Simplified Chinese version of Windows. The system console needs to output characters in GBK encoding, but your character "U+062F د ARABIC LETTER DAL" has no correspondence in GBK encoding, so it cannot be output.
You can choose to write to a file, or install the Arabic version of Windows. Or use another operating system/terminal with better Unicode support.
First you have to understand why requests have this problem
Several methods are mentioned in the article, but it seems that 3.x has fixed this problem.
My suggestion
First go to the page manually to see what encoding the charset in the header part of this page is, assuming it is GBK