Home > Web Front-end > JS Tutorial > How Can I Scrape Dynamic Web Page Content Using Node.js?

How Can I Scrape Dynamic Web Page Content Using Node.js?

Mary-Kate Olsen
Release: 2024-12-18 05:04:14
Original
520 people have browsed it

How Can I Scrape Dynamic Web Page Content Using Node.js?

Scraping Pages with Dynamic Content Using Node.js

For web scrapers, dynamic content can pose challenges. One such example is when a page's elements are created after the initial page load. In such scenarios, a standard scraping method may not suffice.

Consider this issue when using cheerio in Node.js. The following code attempts to scrape elements from a page, but the dynamic elements are not present when the cheerio load occurs:

var request = require('request');
var cheerio = require('cheerio');
var url = "http://www.bdtong.co.kr/index.php?c_category=C02";

request(url, function (err, res, html) {
    var $ = cheerio.load(html);
    $('.listMain > li').each(function () {
        console.log($(this).find('a').attr('href'));
    });
});
Copy after login

This code often returns an empty response because the elements are not yet present in the page's HTML when cheerio loads. So, how can we retrieve these elements using Node.js?

Solution: Utilizing PhantomJS

To handle dynamic content, we can employ PhantomJS, a headless web browser that can execute JavaScript. PhantomJS allows us to simulate a browser interacting with the page and retrieve elements as they become available. Here's an example using PhantomJS:

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    var url = "http://www.bdtong.co.kr/index.php?c_category=C02";
    page.open(url, function() {
      page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
        page.evaluate(function() {
          $('.listMain > li').each(function () {
            console.log($(this).find('a').attr('href'));
          });
        }, function(){
          ph.exit()
        });
      });
    });
  });
});
Copy after login

In this code, we first include jQuery into the page using PhantomJS, allowing us to interact with the elements dynamically. We then evaluate the JavaScript code that logs the elements' href attributes to the console.

The above is the detailed content of How Can I Scrape Dynamic Web Page Content Using Node.js?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template