How to Retrieve Data Generated by JavaScript from a Web Page
Web scraping can be challenging when page content is dynamically generated by JavaScript. One such scenario is encountered at http://vtis.vn/index.aspx, where the desired data ("Danh sách chậm") is not available until a button is clicked.
Solution Using PhantomJS
To retrieve this data programmatically, consider utilizing PhantomJS, a headless WebKit browser with JavaScript capabilities. PhantomJS enables scripting of browser interactions, allowing you to simulate clicking the button and subsequently accessing the rendered data.
Example Script:
var page = require('webpage').create(); page.open('http://vtis.vn/index.aspx', function() { page.evaluate(function() { // Click the "Danh sách chậm" button document.querySelector('button[onclick="DanhSachCham();"]').click(); }); // Wait for the data to load setTimeout(function() { var data = page.evaluate(function() { // Extract the data from the page return document.querySelector('div[id="DivDanhSachTTHT"] tbody').innerHTML; }); console.log(data); }, 1000); });
Alternative Approach: Using an API
If possible, exploring whether the page makes any Ajax calls to retrieve the data is recommended. If so, it may be possible to avoid scraping and instead interact with an API to obtain the data directly. This approach is typically more stable and maintainable than scraping.
The above is the detailed content of How Can I Scrape Dynamic JavaScript-Generated Data from a Website?. For more information, please follow other related articles on the PHP Chinese website!