How Can Python Retrieve Values from Dynamically Generated HTML Content?-Python Tutorial-php.cn

How Can Python Retrieve Values from Dynamically Generated HTML Content?

Barbara Streisand

Release： 2024-10-19 07:45:02

Original

758 people have browsed it

How Can Python Retrieve Values from Dynamically Generated HTML Content?

How to Retrieve Values from Dynamic HTML Content Using Python

When attempting to retrieve data from a website that loads content dynamically, conventional methods using Python's request or BeautifulSoup libraries may fail. This is because these libraries don't interpret JavaScript code that generates the data.

Understanding the Problem

In the example provided, the page in question uses Handlebars templates to create dynamic content. When inspecting the HTML source with a browser's developer tools, you may find template placeholders like "{{formatPrice median}}" instead of the actual values.

Solutions

To retrieve the actual values from dynamically generated content, you need to use techniques that interpret JavaScript. Consider the following options:

Parse AJAX JSON Directly: If the data is fetched via AJAX requests, you can intercept and parse the JSON responses.
Use an Offline JavaScript Interpreter: Install and use tools like SpiderMonkey or Crowbar to run the JavaScript code and generate the DOM elements.
Use a Browser Automation Tool: Utilize drivers like Selenium or Watir to interact with a headless browser, execute JavaScript, and access the rendered HTML.

Using Selenium with BeautifulSoup

For the example page (eve-central.com), using Selenium to retrieve the "median" value:

<code class="python">from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get('http://eve-central.com/home/quicklook.html?typeid=34')

html = driver.page_source
soup = BeautifulSoup(html)

for tag in soup.find_all('span', class_="a-price-amount"):
    print(tag.text)</code>

Copy after login

This code uses Selenium to load the page and BeautifulSoup to parse the rendered HTML, extracting tags with the specific class ID and printing their text content, which includes the desired "median" value.

The above is the detailed content of How Can Python Retrieve Values from Dynamically Generated HTML Content?. For more information, please follow other related articles on the PHP Chinese website!