How to scrape data from a website without HTML elements?
P粉819533564
P粉819533564 2024-03-20 10:55:29
0
1
470

How to scrape data from the following website to find specific case details?

Here are the manual steps to find case details:

  1. Navigate to https://www.claytoncountyga.gov/government/courts/court-case-inquiry/
  2. It seems there may be a JavaScript loading form with a button/input that allows you to drill down to further case details - "Name Search" needs to be selected to search cases by last name - click on it
  3. A new screen then appears in the same element as (2), allowing the user to select from a drop-down court (e.g. Magistrates Court) and enter a first and last name (Smith John) via free-form text input.
  4. Click "Submit" to view all cases
  5. The case details can be viewed by clicking on the case number on one of the rows populated in the same element in the table as in all previous steps - I want to scrape the data from this page.

Because the inner form appears to be encapsulated (I'm guessing implemented in Javascript), I can't see the HTML elements that are rendered after each input is provided. How do I automate using Python?

P粉819533564
P粉819533564

reply all(1)
P粉458725040

The form is contained within an iframe with the ID "Clayton County". In order for selenium to interact with elements within it, we first have to switch to it using the EC.frame_to_be_available_and_switch_to_it method.

Then using Select() we can select an option from the drop down menu.

In the last page we get all the case number urls and save them in case_numbers_urls so that we can loop through them, load each case, get the information and pass it to the next case.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(service=Service(chromedriver_path))
driver.get('https://www.claytoncountyga.gov/government/courts/court-case-inquiry/')

# page 1
wait = WebDriverWait(driver, 9)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "Clayton County")))
driver.find_element(By.XPATH, "//a[contains(.,'Name Search')]").click()

# page 2
dropdown = wait.until(EC.element_to_be_clickable((By.ID, "ctt")))
Select(dropdown).select_by_value('M')
lname = 'Smith'
fname = 'John'
driver.find_element(By.NAME, 'lname').send_keys(lname)
driver.find_element(By.NAME, 'fname').send_keys(fname)
driver.find_element(By.ID, 'btnSrch').click()

# page 3
case_numbers_urls = [c.get_attribute('href') for c in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#myTable a[href]:not([rel])')))]
for url in case_numbers_urls:
    driver.get(url)
    #dosomething
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template