Use Byte Order Mark (BOM) to accurately identify file encoding
TheStreamReader.CurrentEncoding
method does not always reliably provide file encoding information, and analyzing a file's byte order mark (BOM) is an accurate and efficient way to identify the encoding. The BOM sequence (if present) can indicate the encoding format.
The following code snippet introduces a method called GetEncoding
that determines the encoding of a text file based on the file's BOM. If BOM detection fails, defaults to ASCII:
<code class="language-csharp">public static Encoding GetEncoding(string filename) { // 读取 BOM var bom = new byte[4]; using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read)) { file.Read(bom, 0, 4); } // 分析 BOM if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7; if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8; if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true); //UTF-32BE // BOM 检测失败时,默认为 ASCII return Encoding.ASCII; }</code>
Using this method, you can accurately identify the encoding of a file, allowing for accurate text interpretation and manipulation.
The above is the detailed content of How Can I Precisely Identify a File's Encoding Using Byte Order Marks (BOMs)?. For more information, please follow other related articles on the PHP Chinese website!