Home > Backend Development > Golang > How to write garbled crawler code in golang

How to write garbled crawler code in golang

angryTom
Release: 2020-02-15 09:52:40
Original
3467 people have browsed it

How to write garbled crawler code in golang

What to do when writing crawler garbled code in golang

When writing a crawler program in golang, you will encounter a page with encoding format gb2312.

It can be seen from the web page that the character encoding of the page is gb2312

<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
Copy after login

and golang supports the UTF-8 encoding format by default, so the result of climbing directly will be Garbled characters.

Solution:

Use github.com/axgle/mahonia This package can complete the encoding conversion,

1, and execute go get github.com/axgle/mahonia After the command is used to download this package,

github.com\axgle\mahonia
Copy after login
2 will be produced in the

%gopath%/src

directory. 2. How to use the code

1) Import package

import "github.com/axgle/mahonia"
Copy after login

2) Conversion function

func ConvertToString(src string, srcCode string, tagCode string) string {
    srcCoder := mahonia.NewDecoder(srcCode)
    srcResult := srcCoder.ConvertString(src)
    tagCoder := mahonia.NewDecoder(tagCode)
    _, cdata, _ := tagCoder.Translate([]byte(srcResult), true)
    result := string(cdata)
    return result
}
Copy after login

3) Call this function where string conversion encoding is required

result = ConvertToString(html, "gbk", "utf-8")
Copy after login

For more golang knowledge, please Follow the golang tutorial column on the PHP Chinese website.

The above is the detailed content of How to write garbled crawler code in golang. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template