Home Backend Development Golang golang csv parsing garbled characters

golang csv parsing garbled characters

May 15, 2023 am 09:13 AM

When using Golang to parse csv files, sometimes you will encounter the problem of garbled characters. This situation is very common, but it is also very troublesome. So, how to solve this problem?

First we must understand that csv is a text file format, using "," to separate each field. When the text data in the CSV file contains non-ASCII characters, garbled characters will occur. The cause of this problem is actually related to encoding. It is usually caused by the inconsistency between the encoding format of the csv file and the encoding format used during parsing.

In golang, the commonly used csv library is the built-in encoding/csv. This library uses UTF-8 encoding by default to parse csv files. If you want to process csv files in other encoding formats, additional processing is required.

There are several methods to solve the problem of garbled characters. We will introduce them one by one below:

Method 1. Manual conversion of encoding format

Before parsing csv, we can manually convert The encoding format of the csv file is converted to UTF-8. The easiest way is to use Notepad to open the csv file and save it to UTF-8 format.

Manual conversion may be troublesome, especially when we have a large number of csv files. Therefore, we can try the second method.

Method 2. Use a third-party library

The common csv parsing library in Golang is encoding/csv. If we need to process csv files in other encoding formats, we need to use a third-party library to assist. parse. For example, you can use gocsv to parse csv files in gbk encoding format.

Gocsv installation method:

$ go get github.com/kuangyh/csv

Next, you can use gocsv to parse the csv file like this:

package main

import (
    "encoding/csv"
    "fmt"
    "github.com/kuangyh/csv"
    "os"
)

func main() {
    file, err := os.Open("example.csv")
    if err != nil {
        fmt.Println("Error:", err)
        return
    }

    defer file.Close()

    reader := csv.NewReader(gocsv.NewReader(file))
    reader.Comma = ','

    lines, err := reader.ReadAll()
    if err != nil {
        fmt.Println("Error:", err)
        return
    }

    for i, line := range lines {
        fmt.Printf("Line %d: %v
", i+1, line)
    }
}
Copy after login

In the above code, we first import the gocsv library, then use gocsv to create a new reader, pass it into the encoding/csv library, and set the delimiter to ",". Finally, use the ReadAll method to get all the lines in the file and print the output.

Although this method is effective, it also has some problems. For example, we need to use a third-party library to complete the conversion, which will increase dependencies and complexity. If we don't want to use third-party libraries, there is a third method.

Method 3. Manual parsing

The process of manual parsing may be cumbersome, but it is also an effective solution. The key is to understand the format of the csv file.

Usually we add a file header to the first line of the csv file, which contains the name of each field. This file header is also part of the csv file and can be obtained by parsing the first line. In the data row, the data of each row is composed of multiple fields, and these fields are separated by ",". If there is no garbled code problem, then we can use the encoding/csv library to directly parse the csv file. But if garbled characters occur, you need to manually parse each field and convert them into UTF-8 format.

The following is a manual parsing code:

package main

import (
    "bufio"
    "encoding/csv"
    "fmt"
    "io"
    "os"
)

func main() {
    file, err := os.Open("example.csv")
    if err != nil {
        fmt.Println("Error:", err)
    }
    defer file.Close()

    reader := bufio.NewReader(file)
    var lines [][]string

    for {
        line, err := reader.ReadString('
')
        if err != nil && err != io.EOF {
            fmt.Println("Error:", err)
            return
        }

        if line == "" {
            break
        }

        // 去除换行符
        line = line[:len(line)-2]

        r := csv.NewReader([]byte(line))
        r.Comma = ','

        fields, err := r.Read()
        if err != nil {
            fmt.Println("Error:", err)
            return
        }

        // 将字段转换为UTF-8
        for i, s := range fields {
            fields[i] = transform(s)
        }

        lines = append(lines, fields)
    }

    for i, line := range lines {
        fmt.Printf("Line %d: %v
", i+1, line)
    }
}

// 将单个字段转换为UTF-8
func transform(s string) string {
    data, err := ioutil.ReadAll(transform.NewReader(strings.NewReader(s), simplifiedchinese.GBK.NewDecoder()))
    if err != nil {
        return s
    }
    return string(data)
}
Copy after login

In the above code, we first read each line of the csv file through bufio, and then use the encoding/csv library to parse the data of each line . In order to solve the garbled problem, we use the function transform() to convert each field into UTF-8 format.

This function receives a string parameter, first converts it to Reader, then uses simplifiedchinese.GBK.NewDecoder() to create a decoder, and finally uses the ioutil.ReadAll() function to convert the encoded string into UTF-8.

In this way, we can manually parse the csv file and convert it to UTF-8 encoding format.

Summary:

The above are three methods to solve the problem of golang csv parsing garbled characters. If the csv file you are using is UTF-8 encoded, it can be easily parsed using golang's own encoding/csv. Otherwise, you can choose to manually parse or use a third-party library for conversion according to actual needs. In any case, as long as you master the correct method, the problem of garbled characters is no longer a problem.

The above is the detailed content of golang csv parsing garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How do you use the pprof tool to analyze Go performance? How do you use the pprof tool to analyze Go performance? Mar 21, 2025 pm 06:37 PM

The article explains how to use the pprof tool for analyzing Go performance, including enabling profiling, collecting data, and identifying common bottlenecks like CPU and memory issues.Character count: 159

How do you write unit tests in Go? How do you write unit tests in Go? Mar 21, 2025 pm 06:34 PM

The article discusses writing unit tests in Go, covering best practices, mocking techniques, and tools for efficient test management.

How do I write mock objects and stubs for testing in Go? How do I write mock objects and stubs for testing in Go? Mar 10, 2025 pm 05:38 PM

This article demonstrates creating mocks and stubs in Go for unit testing. It emphasizes using interfaces, provides examples of mock implementations, and discusses best practices like keeping mocks focused and using assertion libraries. The articl

How can I define custom type constraints for generics in Go? How can I define custom type constraints for generics in Go? Mar 10, 2025 pm 03:20 PM

This article explores Go's custom type constraints for generics. It details how interfaces define minimum type requirements for generic functions, improving type safety and code reusability. The article also discusses limitations and best practices

How can I use tracing tools to understand the execution flow of my Go applications? How can I use tracing tools to understand the execution flow of my Go applications? Mar 10, 2025 pm 05:36 PM

This article explores using tracing tools to analyze Go application execution flow. It discusses manual and automatic instrumentation techniques, comparing tools like Jaeger, Zipkin, and OpenTelemetry, and highlighting effective data visualization

Explain the purpose of Go's reflect package. When would you use reflection? What are the performance implications? Explain the purpose of Go's reflect package. When would you use reflection? What are the performance implications? Mar 25, 2025 am 11:17 AM

The article discusses Go's reflect package, used for runtime manipulation of code, beneficial for serialization, generic programming, and more. It warns of performance costs like slower execution and higher memory use, advising judicious use and best

How do you use table-driven tests in Go? How do you use table-driven tests in Go? Mar 21, 2025 pm 06:35 PM

The article discusses using table-driven tests in Go, a method that uses a table of test cases to test functions with multiple inputs and outcomes. It highlights benefits like improved readability, reduced duplication, scalability, consistency, and a

How do you specify dependencies in your go.mod file? How do you specify dependencies in your go.mod file? Mar 27, 2025 pm 07:14 PM

The article discusses managing Go module dependencies via go.mod, covering specification, updates, and conflict resolution. It emphasizes best practices like semantic versioning and regular updates.

See all articles