python - pandas读取中文的时候乱码要如何解决?

Question

下载了一份新浪微博的数据,但是原始数据是用csv的,在mac上没办法直接打开,读取的时候,也会错误,会出现 {代码...} 然后自己google,发现read_csv('file', encoding = "ISO-8859-1") 这样的时候读取不会有错 但是读...

伊谢尔伦 · Answer

Give me the code and original data

Just write some capable code + representative data, don’t create a few gigabytes of big data~

Who is watching?

大家讲道理 · Answer

I’m in the same situation as you. I tried a lot of encodings but it still doesn’t work. But if the data is encoded in UTF8, some data can be converted successfully. So the way I can think of for the time being is to use open to read line by line. If there is encoding conversion, The errors are discarded, so the amount of data is actually quite large

高洛峰 · Answer

You can also try using cp1252. The best way is to first use the chardet package (https://pypi.python.org/pypi/...) to see what encoding is used for the file.

天蓬老师 · Answer

There is no problem after trying it. I guess it is a problem with your environment encoding. You can try the following code

#coding=utf-8

import pandas as pd
import sys

reload(sys)
sys.setdefaultencoding("utf-8")

df = pd.read_csv('week1.csv', encoding='utf-8', nrows=10)

print df