What to do when writing crawler garbled code in golang
When writing a crawler program in golang, you will encounter a page with encoding format gb2312.
It can be seen from the web page that the character encoding of the page is gb2312
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
and golang supports the UTF-8 encoding format by default, so the result of climbing directly will be Garbled characters.
Solution:
Use github.com/axgle/mahonia This package can complete the encoding conversion,
1, and execute go get github.com/axgle/mahonia After the command is used to download this package, github.com\axgle\mahonia
directory. 2. How to use the code
1) Import package
import "github.com/axgle/mahonia"
2) Conversion function
func ConvertToString(src string, srcCode string, tagCode string) string { srcCoder := mahonia.NewDecoder(srcCode) srcResult := srcCoder.ConvertString(src) tagCoder := mahonia.NewDecoder(tagCode) _, cdata, _ := tagCoder.Translate([]byte(srcResult), true) result := string(cdata) return result }
3) Call this function where string conversion encoding is required
result = ConvertToString(html, "gbk", "utf-8")
For more golang knowledge, please Follow the golang tutorial column on the PHP Chinese website.
The above is the detailed content of How to write garbled crawler code in golang. For more information, please follow other related articles on the PHP Chinese website!