golang.org/x/net/html
, github.com/PuerkitoBio/goquery
, etc. These tools provide a set of methods and structures for parsing, traversing, and modifying HTML documents.
<p>2.1 Use golang.org/x/net/html
<p>golang.org/x/net/html
is one provided by Go language A standard library that provides a rich API for parsing HTML documents. Next, we'll demonstrate how to use the library to query node data in an HTML document.
<p>The following is a simple HTML document: <!DOCTYPE html> <html> <head> <title>A Simple HTML Document</title> </head> <body> <h1>This is a heading</h1> <p>This is a paragraph.</p> <p>This is another paragraph.</p> </body> </html>
<p>
tags) in this document. First, we need to parse the HTML document into a DOM tree structure, and then query the node data by recursively traversing the DOM tree. package main import ( "fmt" "golang.org/x/net/html" "strings" ) var htmlString = ` <!DOCTYPE html> <html> <head> <title>A Simple HTML Document</title> </head> <body> <h1>This is a heading</h1> <p>This is a paragraph.</p> <p>This is another paragraph.</p> </body> </html> ` func main() { reader := strings.NewReader(htmlString) doc, err := html.Parse(reader) if err != nil { fmt.Println("Failed to parse HTML string:", err) return } var find func(*html.Node) find = func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "p" { fmt.Println(n.FirstChild.Data) } else { for c := n.FirstChild; c != nil; c = c.NextSibling { find(c) } } } find(doc) }
strings.NewReader()
to convert the string to the io.Reader interface type and pass it to html.Parse()
Function to parse HTML documents. Then, we define a recursive function named find()
to traverse the DOM tree and find nodes that meet the conditions. When a paragraph node is encountered, we output the text content of that node. Finally, we call the find()
function to query and output the text content of all paragraph nodes. <p>2.2 Use github.com/PuerkitoBio/goquery
<p>github.com/PuerkitoBio/goquery
is a very popular Go language library. It provides a simple and convenient way for HTML parsing and querying. We can use goquery
to traverse and query HTML documents without having to delve into the structure of the DOM tree. <p>The following is a sample HTML document: <!DOCTYPE html> <html> <head> <title>A Simple HTML Document</title> </head> <body> <h1>This is a heading</h1> <p>This is a paragraph.</p> <p>This is another paragraph.</p> </body> </html>
goquery
: package main import ( "fmt" "github.com/PuerkitoBio/goquery" "strings" ) var htmlString = ` <!DOCTYPE html> <html> <head> <title>A Simple HTML Document</title> </head> <body> <h1>This is a heading</h1> <p>This is a paragraph.</p> <p>This is another paragraph.</p> </body> </html> ` func main() { reader := strings.NewReader(htmlString) doc, err := goquery.NewDocumentFromReader(reader) if err != nil { fmt.Println("Failed to parse HTML string:", err) return } doc.Find("p").Each(func(i int, s *goquery.Selection) { fmt.Println(s.Text()) }) }
strings.NewReader()
to convert the string to the io.Reader interface type and pass it to the goquery.NewDocumentFromReader()
function to Parse HTML documents. Then, we use doc.Find("p")
to query all paragraph nodes and output their text content through the s.Text()
method.
<p>3. Summary
<p>This article introduces how to query the content of HTML documents in Go language. We explored two different approaches, using golang.org/x/net/html
and github.com/PuerkitoBio/goquery
. These tools are not only able to parse HTML documents, but also provide a rich API for traversing and manipulating the DOM tree. No matter which method you choose, you can easily obtain data from HTML documents, helping you build more elegant and efficient applications. The above is the detailed content of golang query html. For more information, please follow other related articles on the PHP Chinese website!