How to access dynamic HTML elements via web scraping?

王林
Release: 2024-02-09 09:51:17
forward
338 people have browsed it

如何通过网页抓取访问动态 HTML 元素?

php editor Xiaoxin here introduces a method to access dynamic HTML elements through web crawling. When we crawl web pages, we sometimes encounter dynamically generated content that cannot be obtained directly until the web page is loaded. Fortunately, there are tools and techniques we can use to solve this problem. This article will introduce a PHP-based method that can be used to easily crawl and access dynamic HTML elements. Let’s take a look!

Question content

I am using go-rod for web scraping. I want to access links within dynamic <a>. To make this a visible, I have to complete a searcher which is an input with the next format (without submit):

<form>
    <input> <!--this is the searcher-->
<form/>
Copy after login

So, when I'm done, the a I want to access appears:

Up to here, everything is fine. This is the code I use to complete the searcher:

//page's url
page := rod.new().mustconnect().mustpage("https://www.sofascore.com/")

//acept cookies alert
page.mustelement("cookiesalertselector...").mustclick()

//completes the searcher
el := page.mustelement(`searcherselector...`)
el.mustinput("lionel messi")
Copy after login

Now the problem arises, when I want to click on the a that appears after completing the search.

I tried this:

diviwant := page.mustelement("aselector...")
diviwant.mustclick()
Copy after login

and this:

diviwant := page.mustelement("aselector...").mustwaitvisible()
diviwant.mustclick()
Copy after login

However, they all return me the same error:

panic: {-32000 node is detached from document }
goroutine 1 [running]:
github.com/go-rod/rod/lib/utils.glob..func2({0x100742dc0?,
0x140002bad50?})
/users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/lib/utils/utils.go:65
+0x24 github.com/go-rod/rod.gene.func1({0x14000281ca0?, 0x1003a98b7?, 0x4?})
/users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/must.go:36
+0x64 github.com/go-rod/rod.(*element).mustclick(0x14000289320)   /users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/must.go:729
+0x9c main.main()     /users/lucastomicbenitez/development/golang/evolutionaryalgorithm/main/main.go:22
+0x9c exit status 2
Copy after login

So, while looking for some solutions, I found this github issue and tried to get the link via this method:

link := page.musteval(`()=> document.queryselector('aselector...').href`)
Copy after login

But it returns this:

panic: eval js error: TypeError: Cannot read properties of null
(reading 'href')
Copy after login

However, I'm pretty sure the selector is correct. What did i do wrong?

Workaround

As @hymns for disco said in the comments, I just had to wait a while after the searcher finished.

el.MustInput("Lionel Messi")

time.Sleep(time.Second)

link := page.MustEval(`()=> document.querySelector('aSelector...').href`)
Copy after login

The above is the detailed content of How to access dynamic HTML elements via web scraping?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!