程序是去本地execel表格中提取数据。问题是,当采用linux默认编码(LANG=en_US)的时候,在读取excel文件中表的名字的时候会报错(表的名字中有数字和汉字)。
最前面已经写# -- coding:utf-8 --
def get_standard_template_infos():
excel_files = get_excel_files(config.get_standard_template_files())#list存放excel文件路径
standard_template_infos = {}
for file in excel_files:
wb = xlrd.open_workbook(file)
sheet_names = wb.sheet_names()
for sheet_name in sheet_names:
standard_template_id = get_standard_template_id(sheet_name)#调用下面函数
def get_standard_template_id(sheet_name):
pattern = u'^(\d{5})'
match = re.match(pattern, sheet_name)
if match is not None:
code = sheet_name[0:5]
return code
else:
print sheet_name#这里报错
return None
报错,报的错误为 :
unicodeEncodeError:"latin-1" codec can't encode characters in position 4-6:ordinal not in range(256)
可当把linux控制台编码方式改为LANG=zh_CN.UTF-8之后,在通过os.walk获取excel文件的时候就会报错(目录为英文,excel名为汉字,也带-)。
代码:
def get_excel_files(dir)
files = []
if not os.path.exists(dir):
return files
for item in os.walk(dir):
file_names = item[2]
if file_names is None or len(file_names) == 0:
continue
dir_path = item[0]
for file_name in file_names:
if file_name[0] == '.' or file_name[0] == '~':
continue
if file_name[-5:] == '.xlsx' or file_name[-4:] == '.xls':
files.append(os.path.join(dir_path, file_name))
return files
报错:
"/home/users/zhangzhida/o_platform/import_to_hdp/check_data/share_function.py", line 27, in get_excel_files
for item in os.walk(dir):
File "/home/users/zhangzhida/.jumbo/lib/python2.7/os.py", line 284, in walk
if isdir(join(top, name)):
File "/home/users/zhangzhida/.jumbo/lib/python2.7/posixpath.py", line 71, in join
path += '/' + b
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 3: invalid start byte
windows下运行的时候,正常,不报错。求教,为什么?是因为我的excel文件名的编码格式的问题么?该如何解决
注:linux中python版本为2.7.3
widows下为2.7.13
Change to print sheet_name.encode('utf-8') and try it.
Try changing LANG to GBK
I found the reason. The encoding format of the file name is wrong. Thank you everyone, hehe