<p>
<p>
Dynamic Content Scrapping with Python
<p>Obtaining plain text from static HTML is straightforward, but dynamic content is a different story. JavaScript embeds content that is not immediately accessible to Python's HTTP request libraries like urllib2.
<p>
Accessing Dynamic Content
<p>To access dynamic content, Python can leverage external tools that simulate web browsers. These tools execute JavaScript and return the rendered page content.
<p>
1. Selenium with PhantomJS:
- Install PhantomJS (headless browser) and ensure it's in your path.
- Use Selenium's Python library to instantiate PhantomJS as a web driver.
- Navigate to the target page and locate the elements of interest.
<p>
2. dryscape (Python 2 only):
- Install dryscrape using pip.
- Open a dryscrape session and visit the target page.
- Retrieve the rendered page content as a string.
<p>
Example
<p>Consider the sample HTML page with dynamic JavaScript:
<p>
Without JS support:
import requests
from bs4 import BeautifulSoup
response = requests.get(my_url)
soup = BeautifulSoup(response.text)
print(soup.find(id="intro-text"))
Copy after login
<p>
Output:
<p>
With JS support (Selenium):
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(my_url)
print(driver.find_element_by_id("intro-text").text)
Copy after login
<p>
Output:
Yay! Supports javascript
Copy after login
The above is the detailed content of How Can Python Scrape Dynamic Website Content?. For more information, please follow other related articles on the PHP Chinese website!