How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Patricia Arquette
Release: 2024-10-19 06:44:01
Original
376 people have browsed it

How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Extracting Information from a #shadow-root using Selenium Python

In the realm of web scraping, extracting data from elements concealed within #shadow-roots can pose a significant challenge. This article explores the techniques to overcome this obstacle using Selenium Python.

Problem:

Consider the URL https://www.tiendasjumbo.co/buscar?q=mani from an online store. To extract product labels and other fields from this site, a user attempted the following approach:

<code class="python">from selenium import webdriver
import time
from random import randint

driver = webdriver.Firefox(executable_path="C:\Program Files (x86)\geckodriver.exe")
driver.implicitly_wait(10)
time.sleep(4)

url = "https://www.tiendasjumbo.co/buscar?q=mani"
driver.maximize_window()
driver.get(url)
driver.find_element_by_xpath('//h1[@class="impulse-title"]')</code>
Copy after login

However, this approach failed, and switching iframes proved equally unsuccessful.

Solution:

The key to extracting data from this site lies in recognizing that the products are located within a #shadow-root. To access these elements, Selenium provides the shadowRoot.querySelector() method. Using this method, the product label can be extracted using the following Locator Strategy:

<code class="python">driver.get('https://www.tiendasjumbo.co/buscar?q=mani')
item = driver.execute_script("return document.querySelector('impulse-search').shadowRoot.querySelector('div.group-name-brand h1.impulse-title span.formatted-text')")
print(item.text)</code>
Copy after login

Running this script outputs the product label:

<code class="text">La especial mezcla de nueces, maní, almendras y marañones x 450 g</code>
Copy after login

References:

For further insights, refer to the following resources:

  • Unable to locate the Sign In element within #shadow-root (open) using Selenium and Python
  • How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python

Note:

Regarding Microsoft Edge and Google Chrome version 96, changes to shadow root return values for Selenium have been introduced. Refer to the links provided in the solution for more information on addressing these changes in different programming languages.

The above is the detailed content of How to Extract Hidden Information from #shadow-roots Using Selenium Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template