In Go, the standard library assumes UTF-8 encoding for all text files. However, this may not be the case for files encoded in other character sets. This article explains how to read non-UTF-8 text files in Go using the golang.org/x/text/encoding package.
The golang.org/x/text/encoding package provides an interface for generic character encodings that can convert to and from UTF-8. For example, the golang.org/x/text/encoding/simplifiedchinese sub-package provides encoders for GB18030, GBK, and HZ-GB2312.
Example: Reading a GBK Encoded File
package main import ( "bufio" "fmt" "log" "os" "golang.org/x/text/encoding/simplifiedchinese" "golang.org/x/text/transform" ) func main() { const filename = "example_GBK_file" // Read UTF-8 from a GBK encoded file f, err := os.Open(filename) if err != nil { log.Fatal(err) } r := transform.NewReader(f, simplifiedchinese.GBK.NewDecoder()) // Read converted UTF-8 from `r` as needed sc := bufio.NewScanner(r) for sc.Scan() { fmt.Printf("Read line: %s\n", sc.Bytes()) } if err := sc.Err(); err != nil { log.Fatal(err) } if err = f.Close(); err != nil { log.Fatal(err) } }
This example uses a transform.NewReader to wrap an os.File object and perform on-the-fly decoding from GBK to UTF-8.
Additional Notes:
The above is the detailed content of How to Read Non-UTF-8 Encoded Text Files in Go?. For more information, please follow other related articles on the PHP Chinese website!