Home Web Front-end JS Tutorial Use cheerio to make a simple web crawler in Node.js (detailed tutorial)

Use cheerio to make a simple web crawler in Node.js (detailed tutorial)

Jun 02, 2018 pm 02:30 PM
cheerio javascript node.js

This article mainly introduces Node.js to use cheerio to create a simple web crawler example. Now I share it with you and give it as a reference.

This article introduces Node.js to use cheerio to create a simple web crawler example, and shares it with everyone. It has the following features:

1. Goal

  1. Completed Obtain the title information of the website

  2. Output the obtained information in a new file

  3. Tool: cheerio, use npm to download npm install cheerio

  4. The API usage method of cheerio is basically the same as the usage method of jQuery

  5. If you are proficient in using jQuery, you will get started with cheerio quickly

2. Code part

Introduction: Get the list title of the segment fault page, get the title list number, and finally output it to the pageTitle.txt file

const https = require('https');
const fs = require('fs');
const cheerio = require('cheerio');
const url = 'https://segmentfault.com/';

https.get(url, (res) => {
  let html = '';
  res.on('data', (data) => {
    html += data;
  });
  res.on('end', () => {
    getPageTitle(html);
  });
}).on('error', () => {
  console.log('获取网页信息错误');
});

function getPageTitle(html) {
  const $ = cheerio.load(html);
  let chapters = $('.news__item-title');
  let data = [];
  let index = 0;
  let fileName = 'pageTitle.txt';
  for (let i = 0; i < chapters.length; i++) {
    let chapterTitle = $(chapters[i]).find(&#39;a&#39;).text().trim();
    index++;
    data.push(`\n${index}, ${chapterTitle}`);
  }
  fs.writeFile(fileName, data, &#39;utf8&#39;, (err) => {
    if (err) {
      console.log(&#39;fs文件系统创建新文件失败&#39;, err);
    }
    console.log(`已成功将获取到的标题放入新文件${fileName}文件中`)
  })
}
Copy after login

The above is what I compiled for everyone. I hope it will be helpful to everyone in the future.

Related articles:

Talk about the use of JS animation library Velocity.js

vue toggle makes a click switching class (explanation with examples )

Vue2.0 How to add styles to Tab tabs and page switching transitions

The above is the detailed content of Use cheerio to make a simple web crawler in Node.js (detailed tutorial). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to implement an online speech recognition system using WebSocket and JavaScript How to implement an online speech recognition system using WebSocket and JavaScript Dec 17, 2023 pm 02:54 PM

How to use WebSocket and JavaScript to implement an online speech recognition system Introduction: With the continuous development of technology, speech recognition technology has become an important part of the field of artificial intelligence. The online speech recognition system based on WebSocket and JavaScript has the characteristics of low latency, real-time and cross-platform, and has become a widely used solution. This article will introduce how to use WebSocket and JavaScript to implement an online speech recognition system.

WebSocket and JavaScript: key technologies for implementing real-time monitoring systems WebSocket and JavaScript: key technologies for implementing real-time monitoring systems Dec 17, 2023 pm 05:30 PM

WebSocket and JavaScript: Key technologies for realizing real-time monitoring systems Introduction: With the rapid development of Internet technology, real-time monitoring systems have been widely used in various fields. One of the key technologies to achieve real-time monitoring is the combination of WebSocket and JavaScript. This article will introduce the application of WebSocket and JavaScript in real-time monitoring systems, give code examples, and explain their implementation principles in detail. 1. WebSocket technology

How to use JavaScript and WebSocket to implement a real-time online ordering system How to use JavaScript and WebSocket to implement a real-time online ordering system Dec 17, 2023 pm 12:09 PM

Introduction to how to use JavaScript and WebSocket to implement a real-time online ordering system: With the popularity of the Internet and the advancement of technology, more and more restaurants have begun to provide online ordering services. In order to implement a real-time online ordering system, we can use JavaScript and WebSocket technology. WebSocket is a full-duplex communication protocol based on the TCP protocol, which can realize real-time two-way communication between the client and the server. In the real-time online ordering system, when the user selects dishes and places an order

How to implement an online reservation system using WebSocket and JavaScript How to implement an online reservation system using WebSocket and JavaScript Dec 17, 2023 am 09:39 AM

How to use WebSocket and JavaScript to implement an online reservation system. In today's digital era, more and more businesses and services need to provide online reservation functions. It is crucial to implement an efficient and real-time online reservation system. This article will introduce how to use WebSocket and JavaScript to implement an online reservation system, and provide specific code examples. 1. What is WebSocket? WebSocket is a full-duplex method on a single TCP connection.

JavaScript and WebSocket: Building an efficient real-time weather forecasting system JavaScript and WebSocket: Building an efficient real-time weather forecasting system Dec 17, 2023 pm 05:13 PM

JavaScript and WebSocket: Building an efficient real-time weather forecast system Introduction: Today, the accuracy of weather forecasts is of great significance to daily life and decision-making. As technology develops, we can provide more accurate and reliable weather forecasts by obtaining weather data in real time. In this article, we will learn how to use JavaScript and WebSocket technology to build an efficient real-time weather forecast system. This article will demonstrate the implementation process through specific code examples. We

Simple JavaScript Tutorial: How to Get HTTP Status Code Simple JavaScript Tutorial: How to Get HTTP Status Code Jan 05, 2024 pm 06:08 PM

JavaScript tutorial: How to get HTTP status code, specific code examples are required. Preface: In web development, data interaction with the server is often involved. When communicating with the server, we often need to obtain the returned HTTP status code to determine whether the operation is successful, and perform corresponding processing based on different status codes. This article will teach you how to use JavaScript to obtain HTTP status codes and provide some practical code examples. Using XMLHttpRequest

How to use insertBefore in javascript How to use insertBefore in javascript Nov 24, 2023 am 11:56 AM

Usage: In JavaScript, the insertBefore() method is used to insert a new node in the DOM tree. This method requires two parameters: the new node to be inserted and the reference node (that is, the node where the new node will be inserted).

JavaScript and WebSocket: Building an efficient real-time image processing system JavaScript and WebSocket: Building an efficient real-time image processing system Dec 17, 2023 am 08:41 AM

JavaScript is a programming language widely used in web development, while WebSocket is a network protocol used for real-time communication. Combining the powerful functions of the two, we can create an efficient real-time image processing system. This article will introduce how to implement this system using JavaScript and WebSocket, and provide specific code examples. First, we need to clarify the requirements and goals of the real-time image processing system. Suppose we have a camera device that can collect real-time image data

See all articles