When reading UTF-16 text files in Go, you may encounter issues with bytes being interpreted as ASCII. This arises because the standard bufio.NewReader function doesn't handle unicode correctly.
The latest version of "golang.org/x/text/encoding/unicode" introduces unicode.BOMOverride, which automatically detects and interprets the BOM to decode UTF-16 correctly. Here's an example using ReadFileUTF16():
func ReadFileUTF16(filename string) ([]byte, error) { raw, err := ioutil.ReadFile(filename) if err != nil { return nil, err } win16be := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) utf16bom := unicode.BOMOverride(win16be.NewDecoder()) unicodeReader := transform.NewReader(bytes.NewReader(raw), utf16bom) decoded, err := ioutil.ReadAll(unicodeReader) return decoded, err }
This function will decode UTF-16 files with a BOM.
If your file does not contain a BOM, you can use the following code:
func ReadFileUTF16WithoutBOM(filename string) ([]byte, error) { f, err := os.Open(filename) if err != nil { return nil, err } r := bufio.NewReader(f) // Read past the BOM, if any. var b, e = r.Peek(2) if (b[0] == 0xFF && b[1] == 0xFE) || (b[0] == 0xFE && b[1] == 0xFF) { r.Discard(2) } // Read the rest of the file. decoded, err := ioutil.ReadAll(r) return decoded, err }
This function will skip any BOM and read the file as UTF-16.
By using ReadFileUTF16() or ReadFileUTF16WithoutBOM(), you can handle both BOM and non-BOM UTF-16 text files in Go, ensuring accurate decoding and representation of your data.
The above is the detailed content of How Can I Correctly Read UTF-16 Text Files in Go, Handling Both BOM and Non-BOM Encodings?. For more information, please follow other related articles on the PHP Chinese website!