How to scrape javascript website with Python?

WBOY
Release: 2024-02-10 15:40:04
forward
1127 people have browsed it

如何用 Python 抓取 javascript 网站?

Question content

I am trying to crawl a website. I've tried using both methods, but neither gives me the full website source code I'm looking for. I am trying to scrape news headlines from the website url provided below.

Website: "https://www.todayonline.com/"

Here are the two methods I tried and failed.

Method One: Beautiful Soup

tdy_url = "https://www.todayonline.com/"
page = requests.get(tdy_url).text
soup = beautifulsoup(page)
soup  # returns me a html with javascript text
soup.find_all('h3')

### returns me empty list []
Copy after login

Method 2: selenium beautifulsoup

tdy_url = "https://www.todayonline.com/"

options = Options()
options.headless = True

driver = webdriver.Chrome("chromedriver",options=options)

driver.get(tdy_url)
time.sleep(10)
html = driver.page_source

soup = BeautifulSoup(html)
soup.find_all('h3')

### Returns me only less than 1/4 of the 'h3' tags found in the original page source
Copy after login

please help. I've tried scraping other news sites and this is much easier. Thanks.


Correct answer


You can access the data through the api (look at the Network tab):

For example,

import requests
url = "https://www.todayonline.com/api/v3/news_feed/7"
data = requests.get(url).json()
Copy after login

The above is the detailed content of How to scrape javascript website with Python?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!