php editor Xiaoxin here introduces a method to access dynamic HTML elements through web crawling. When we crawl web pages, we sometimes encounter dynamically generated content that cannot be obtained directly until the web page is loaded. Fortunately, there are tools and techniques we can use to solve this problem. This article will introduce a PHP-based method that can be used to easily crawl and access dynamic HTML elements. Let’s take a look!
I am using go-rod for web scraping. I want to access links within dynamic <a>
.
To make this a
visible, I have to complete a searcher which is an input
with the next format (without submit
):
<form> <input> <!--this is the searcher--> <form/>
So, when I'm done, the a
I want to access appears:
Up to here, everything is fine. This is the code I use to complete the searcher:
//page's url page := rod.new().mustconnect().mustpage("https://www.sofascore.com/") //acept cookies alert page.mustelement("cookiesalertselector...").mustclick() //completes the searcher el := page.mustelement(`searcherselector...`) el.mustinput("lionel messi")
Now the problem arises, when I want to click on the a
that appears after completing the search.
I tried this:
diviwant := page.mustelement("aselector...") diviwant.mustclick()
and this:
diviwant := page.mustelement("aselector...").mustwaitvisible() diviwant.mustclick()
However, they all return me the same error:
panic: {-32000 node is detached from document } goroutine 1 [running]: github.com/go-rod/rod/lib/utils.glob..func2({0x100742dc0?, 0x140002bad50?}) /users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email protected]/lib/utils/utils.go:65 +0x24 github.com/go-rod/rod.gene.func1({0x14000281ca0?, 0x1003a98b7?, 0x4?}) /users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email protected]/must.go:36 +0x64 github.com/go-rod/rod.(*element).mustclick(0x14000289320) /users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email protected]/must.go:729 +0x9c main.main() /users/lucastomicbenitez/development/golang/evolutionaryalgorithm/main/main.go:22 +0x9c exit status 2
So, while looking for some solutions, I found this github issue and tried to get the link via this method:
link := page.musteval(`()=> document.queryselector('aselector...').href`)
But it returns this:
panic: eval js error: TypeError: Cannot read properties of null (reading 'href')
However, I'm pretty sure the selector is correct. What did i do wrong?
As @hymns for disco said in the comments, I just had to wait a while after the searcher finished.
el.MustInput("Lionel Messi") time.Sleep(time.Second) link := page.MustEval(`()=> document.querySelector('aSelector...').href`)
The above is the detailed content of How to access dynamic HTML elements via web scraping?. For more information, please follow other related articles on the PHP Chinese website!