1. The xml.etree.ElementTree
library used by Python only supports parsing and generating standard UTF-8 format encoding
2. Common Chinese-encoded XML files such as GBK
or GB2312
are used to ensure the ability of XML to record Chinese characters in old systems
3. There is a header at the beginning of the XML file. The header specifies the encoding that the program should use when processing XML
4. To modify the encoding, not only the encoding of the entire file must be modified , and also modify the value of the encoding part in the identification header
1. Reading & decoding:
Use binary mode to read the XML file and convert the file into a binary stream
Use the .encode()
method to convert the binary stream to the encoding format of the original file Parsed into the string
2. Process the identification header: use the .replace()
method to replace encoding="xxx"## in the string #Part
# filepath -- 原文件路径 # savefilepath -- 转换后文件存储路径(默认 = 原文件路径) # oldencoding -- 原文件的编码格式 # newencoding -- 转换后文件的编码格式 def convert_xml_encoding(filepath, savefilepath=filepath, oldencoding, newencoding): # Read the XML file with open(filepath, 'rb') as file: content = file.read() # Decode the content from old encoding # 出现错误时忽略 errors='ignore' decoded_content = content.decode(oldencoding, errors='ignore') # decoded_content = content.decode('GBK') # Update the encoding in the XML header updated_content = decoded_content.replace('encoding="{}"'.format(oldencoding), 'encoding="{}"'.format(newencoding)) # Encode the content to new encoding # 出现错误时忽略 errors='ignore' encoded_content = updated_content.encode(newencoding,errors='ignore') # Write the updated content to the file with open(savefilepath, 'wb') as file: file.write(encoded_content) # Result output print(f"XML file '{os.path.basename(filepath)}'({oldencoding}) --> '{os.path.basename(savefilepath)}'({newencoding})") # ---------------------- 使用示例 --------------------- # GBK --> utf-8 convert_xml_encoding(filepath, savefilepath2, 'GBK', 'utf-8') # utf-8 --> gb2312 convert_xml_encoding(filepath, savefilepath2, 'utf-8', 'gb2312') # GBK --> gb2312 convert_xml_encoding(filepath, savefilepath2, 'GBK', 'gb2312')
The above is the detailed content of How to convert the encoding of XML files in Python. For more information, please follow other related articles on the PHP Chinese website!