Home > Backend Development > Golang > How to Read Non-UTF-8 Encoded Text Files in Go?

How to Read Non-UTF-8 Encoded Text Files in Go?

Mary-Kate Olsen
Release: 2024-12-01 03:29:13
Original
206 people have browsed it

How to Read Non-UTF-8 Encoded Text Files in Go?

Reading Non-UTF-8 Text Files in Go

In Go, the standard library assumes UTF-8 encoding for all text files. However, this may not be the case for files encoded in other character sets. This article explains how to read non-UTF-8 text files in Go using the golang.org/x/text/encoding package.

The golang.org/x/text/encoding package provides an interface for generic character encodings that can convert to and from UTF-8. For example, the golang.org/x/text/encoding/simplifiedchinese sub-package provides encoders for GB18030, GBK, and HZ-GB2312.

Example: Reading a GBK Encoded File

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"

    "golang.org/x/text/encoding/simplifiedchinese"
    "golang.org/x/text/transform"
)

func main() {
    const filename = "example_GBK_file"

    // Read UTF-8 from a GBK encoded file
    f, err := os.Open(filename)
    if err != nil {
        log.Fatal(err)
    }
    r := transform.NewReader(f, simplifiedchinese.GBK.NewDecoder())

    // Read converted UTF-8 from `r` as needed
    sc := bufio.NewScanner(r)
    for sc.Scan() {
        fmt.Printf("Read line: %s\n", sc.Bytes())
    }
    if err := sc.Err(); err != nil {
        log.Fatal(err)
    }
    if err = f.Close(); err != nil {
        log.Fatal(err)
    }
}
Copy after login

This example uses a transform.NewReader to wrap an os.File object and perform on-the-fly decoding from GBK to UTF-8.

Additional Notes:

  • This approach uses only packages provided by the Go authors, eliminating dependencies on third-party packages or cgo.
  • You can easily swap out the encoding implementation to support other character sets, such as Big5, Windows1252, or EUCKR.
  • Refer to the golang.org/x/text/encoding and golang.org/x/text/encoding/simplifiedchinese packages for more details.

The above is the detailed content of How to Read Non-UTF-8 Encoded Text Files in Go?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template