I am trying to crawl a website. I've tried using both methods, but neither gives me the full website source code I'm looking for. I am trying to scrape news headlines from the website url provided below.
Website: "https://www.todayonline.com/"
Here are the two methods I tried and failed.
tdy_url = "https://www.todayonline.com/" page = requests.get(tdy_url).text soup = beautifulsoup(page) soup # returns me a html with javascript text soup.find_all('h3') ### returns me empty list []
tdy_url = "https://www.todayonline.com/" options = Options() options.headless = True driver = webdriver.Chrome("chromedriver",options=options) driver.get(tdy_url) time.sleep(10) html = driver.page_source soup = BeautifulSoup(html) soup.find_all('h3') ### Returns me only less than 1/4 of the 'h3' tags found in the original page source
please help. I've tried scraping other news sites and this is much easier. Thanks.
You can access the data through the api (look at the Network tab):
For example,
import requests url = "https://www.todayonline.com/api/v3/news_feed/7" data = requests.get(url).json()
The above is the detailed content of How to scrape javascript website with Python?. For more information, please follow other related articles on the PHP Chinese website!