python在linux下运行的编码问题
PHP中文网
PHP中文网 2017-04-18 10:05:49
0
3
725

程序是去本地execel表格中提取数据。问题是,当采用linux默认编码(LANG=en_US)的时候,在读取excel文件中表的名字的时候会报错(表的名字中有数字和汉字)。

最前面已经写# -- coding:utf-8 --

def get_standard_template_infos():

excel_files = get_excel_files(config.get_standard_template_files())#list存放excel文件路径
standard_template_infos = {}
for file in excel_files:
    wb = xlrd.open_workbook(file)
    sheet_names = wb.sheet_names()
    for sheet_name in sheet_names:
        standard_template_id = get_standard_template_id(sheet_name)#调用下面函数

def get_standard_template_id(sheet_name):

pattern = u'^(\d{5})'
match = re.match(pattern, sheet_name)
if match is not None:
    code = sheet_name[0:5]
    return code
else:
    print sheet_name#这里报错
return None

报错,报的错误为 :
unicodeEncodeError:"latin-1" codec can't encode characters in position 4-6:ordinal not in range(256)

可当把linux控制台编码方式改为LANG=zh_CN.UTF-8之后,在通过os.walk获取excel文件的时候就会报错(目录为英文,excel名为汉字,也带-)。

代码:

def get_excel_files(dir)

files = []
if not os.path.exists(dir):
    return files
for item in os.walk(dir):
    file_names = item[2]
    if file_names is None or len(file_names) == 0:
        continue

    dir_path = item[0]
    for file_name in file_names:
        if file_name[0] == '.' or file_name[0] == '~':
            continue
        if file_name[-5:] == '.xlsx' or file_name[-4:] == '.xls':
            files.append(os.path.join(dir_path, file_name))

return files

报错:

"/home/users/zhangzhida/o_platform/import_to_hdp/check_data/share_function.py", line 27, in get_excel_files

for item in os.walk(dir):

File "/home/users/zhangzhida/.jumbo/lib/python2.7/os.py", line 284, in walk

if isdir(join(top, name)):

File "/home/users/zhangzhida/.jumbo/lib/python2.7/posixpath.py", line 71, in join

path += '/' + b

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 3: invalid start byte

windows下运行的时候,正常,不报错。求教,为什么?是因为我的excel文件名的编码格式的问题么?该如何解决

注:linux中python版本为2.7.3
widows下为2.7.13

PHP中文网
PHP中文网

认证0级讲师

reply all(3)
Peter_Zhu

Change to print sheet_name.encode('utf-8') and try it.

左手右手慢动作

Try changing LANG to GBK

小葫芦

I found the reason. The encoding format of the file name is wrong. Thank you everyone, hehe

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template