Home > Web Front-end > JS Tutorial > A JavaScript scraper for the Wikipedia Academy Award List.

A JavaScript scraper for the Wikipedia Academy Award List.

Susan Sarandon
Release: 2025-01-24 16:39:12
Original
1028 people have browsed it

This tutorial demonstrates web scraping using JavaScript's Cheerio library to extract Academy Award-winning films from Wikipedia and save them to a CSV file.

First, install the required packages:

<code class="language-bash">npm install cheerio axios</code>
Copy after login

The Wikipedia page URL is:

<code class="language-javascript">const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';</code>
Copy after login

The code fetches the page's HTML using axios, then uses Cheerio to parse it:

<code class="language-javascript">const { data: html } = await axios.get(url);
const $ = cheerio.load(html);

const theadData = [];
const tableData = [];</code>
Copy after login

The script navigates the DOM, extracting data from table cells:

<code class="language-javascript">$('tbody').each((i, column) => {
  const columnData = [];
  $(column).find('th').each((j, cell) => {
    columnData.push($(cell).text().replace('\n', ''));
  });
  theadData.push(columnData);
});

tableData.push(theadData[0]);

$('table tr').each((i, row) => {
  const rowData = [];
  $(row).find('td').each((j, cell) => {
    rowData.push($(cell).text().trim());
  });
  if (rowData.length) tableData.push(rowData);
});</code>
Copy after login

Finally, the extracted data is formatted and saved to a CSV file using fs.writeFileSync, with semicolons as delimiters:

<code class="language-javascript">const csvContent = tableData.map((row) => row.join(';')).join('\n');
fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');</code>
Copy after login

Run the script using:

<code class="language-bash">node scraper.js</code>
Copy after login

The resulting academy_awards.csv file contains the scraped data.

A JavaScript scraper for the Wikipedia Academy Award List.

This tutorial builds upon previous scraping tutorials using Go and Python. Consider supporting the author if this was helpful: A JavaScript scraper for the Wikipedia Academy Award List.

The above is the detailed content of A JavaScript scraper for the Wikipedia Academy Award List.. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template