Home > Backend Development > Golang > Colly - How to get the value of a child property?

Colly - How to get the value of a child property?

WBOY
Release: 2024-02-11 09:36:08
forward
697 people have browsed it

Colly - 如何获取子属性的值?

php editor Xigua introduces you to Colly, a powerful web crawler framework. Colly is a simple and flexible crawler framework written in Go language. It provides rich functions, including obtaining HTML elements, extracting data, and processing requests and responses. When using Colly, sometimes we need to get the value of a sub-attribute of an HTML element, such as getting the href attribute of a link. So, how to get the value of sub-property in Colly? Next, we will answer your questions one by one.

Question content

This is a sample page I have been working on https://www.lazada.vn/-i1701980654-s7563711492.html

This is the element I want to get (product title)

...
<div>
   <img src="https://lzd-img-global.slatic.net/g/tps/imgextra/i1/o1cn01juoyif22n3uu7jx4r_!!6000000007107-2-tps-162-48.png" class="pdp-mod-product-badge" alt="lazmall">
    <h1 class="pdp-mod-product-badge-title">
     yierku 【free shipping miễn phí vận chuyển】giày nam mùa thu và mùa đông giày thường xu hướng nam thể thao tất cả các trận đấu giày da tăng chiều cao giày nam
    </h1>
</div>
...
Copy after login

I want to get the text value between <h1> elements, that is yierku [Free shipping miễn phí vận chuyển] giày n....

Here's what I've tried so far

c := colly.NewCollector()
    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })
    c.OnXML("/html/body", func(e *colly.XMLElement) {
        child := e.ChildAttrs("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1", "class")
        fmt.Println(child)
        //fmt.Println(child)
    })
Copy after login

It gives a response of pdp-mod-product-badge-title

When I try to change it to

child := e.childattrs("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1", "text" )

It doesn't give me any results

Workaround

Use func (*xmlelement) childtextinstead.

package main

import (
    "fmt"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()
    c.OnError(func(_ *colly.Response, err error) {
        fmt.Println("Something went wrong:", err)
    })
    c.OnXML("/html/body", func(e *colly.XMLElement) {
        child := e.ChildText("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1")
        fmt.Println(child)
    })
    c.Visit("https://www.lazada.vn/-i1701980654-s7563711492.html")
    // Output:
    // Yierku 【Free Shipping Miễn phí vận chuyển】Giày nam mùa thu và mùa đông giày thường xu hướng nam thể thao tất cả các trận đấu giày da tăng chiều cao giày nam
}
Copy after login

The above is the detailed content of Colly - How to get the value of a child property?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template