Scraping dynamic content from web pages can pose challenges when using static methods like urllib2.urlopen(request) in Python. Such content is often generated and executed by JavaScript embedded within the page.
One approach to tackle this issue is to leverage the Selenium framework with Phantom JS as a web driver. Ensure that Phantom JS is installed, and its binary is available in the current path.
Here's an example to illustrate:
import requests from bs4 import BeautifulSoup response = requests.get(my_url) soup = BeautifulSoup(response.text) soup.find(id="intro-text") # Result: <p>
This code will retrieve the page without JavaScript support. To scrape with JS support, use Selenium:
from selenium import webdriver driver = webdriver.PhantomJS() driver.get(my_url) p_element = driver.find_element_by_id(id_='intro-text') print(p_element.text) # Result: 'Yay! Supports javascript'
Alternatively, you can utilize Python libraries specifically designed for scraping JavaScript-driven websites, such as dryscrape:
import dryscrape from bs4 import BeautifulSoup session = dryscrape.Session() session.visit(my_url) response = session.body() soup = BeautifulSoup(response) soup.find(id="intro-text") # Result: <p>
The above is the detailed content of How to Scrape Dynamic JavaScript-Rendered Content in Python?. For more information, please follow other related articles on the PHP Chinese website!