Web scraping: Missing href attribute - Need to simulate mouse clicks for web scraping?
P粉550823577
P粉550823577 2024-04-04 10:32:06
0
1
3677

For a fun web scraping project, I want to collect NHL data from ttps://www.nhl.com/stats/teams.

There is a clickable Excel export tag and I can find it using selenium and bs4.

Unfortunately, things end here: I can't seem to access the data since there is no href attribute.

I got what I wanted by simulating a mouse click using pynput, but I want to know:

What could I have done differently? If it feels awkward.

-> Labels with export icons can be found here:

a class="styles__ExportIcon-sc-16o6kz0-0 dIDMgQ"

-> This is my code

`import pynput
from pynput.mouse import Button, Controller
import time

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path = 'somepath\chromedriver.exe')

URL = 'https://www.nhl.com/stats/teams'

driver.get(URL)
html = driver.page_source  # DOM with JavaScript execution complete
soup = BeautifulSoup(html)
body = soup.find('body')
print(body.prettify())


mouse = Controller()

time.sleep(5) # Sleep for 5 seconds until page is loaded
mouse.position = (1204, 669) # thats where the icon is on my screen
mouse.click(Button.left, 1) # executes download`

P粉550823577
P粉550823577

reply all(1)
P粉807471604

There is no href attribute, and the download is triggered through JS. When using selenium find your element and use .click() to download the file:

driver.find_element(By.CSS_SELECTOR,'h2>a').click()

Use the css selector here to get the <a> of direct children

or by ending with # The class starting with ##styles__ExportIcon select it directly:

driver.find_element(By.CSS_SELECTOR,'a[class^="styles__ExportIcon"]').click()
Example

You may need to deal with the onetrust banner, so click on it first and then download the table.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

url = 'https://www.nhl.com/stats/teams'
driver.get(url)
driver.find_element(By.CSS_SELECTOR,'#onetrust-reject-all-handler').click()
driver.find_element(By.CSS_SELECTOR,'h2>a').click()
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template