How to use node to implement crawler function-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

How to use node to implement crawler function

php中世界最好的语言

May 23, 2018 pm 03:29 PM

node Function reptile

This time I will show you how to use node to implement the crawler function, and what are the precautions for using node to implement the crawler function. The following is a practical case, let's take a look.

Node is a server-side language, so you can crawl the website like Python. Let’s use node to crawl the blog park and get all the chapter information.

Step one: Create the crawl file and then npm init.

Second step: Create the crawl.js file. A simple code to crawl the entire page is as follows:

var http = require("http");
var url = "http://www.cnblogs.com";
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function () {
    console.log(html);
  });
}).on("error", function () {
  console.log("获取课程结果错误！");
});

Copy after login

Introduce the http module, and then use http The get request of the object, that is, once it is run, it is equivalent to the node server sending a get request to request this page, and then returning it through res, where the on binding data event is used to continuously receive data, and at the end we print it out in the background .

This is just a part of the entire page. We can inspect the elements on this page and find that they are indeed the same.

We only need to crawl the chapter title and the information of each section. .

The third step: Introduce the cheerio module, as follows: (Just install it in gitbash, cmd always has problems)

cnpm install cheerio --save-dev

Copy after login

The introduction of this module is for It is convenient for us to operate dom, just like jQuery.

Step 4: Operate dom and obtain useful information.

var http = require("http");
var cheerio = require("cheerio");
var url = "http://www.cnblogs.com";
function filterData(html) {
  var $ = cheerio.load(html); 
  var items = $(".post_item");
  var result = [];
  items.each(function (item) {
    var tit = $(this).find(".titlelnk").text();
    var aut = $(this).find(".lightblue").text();
    var one = {
      title: tit,
      author: aut
    };
    result.push(one);
  });
  return result;
}
function printInfos(allInfos) {
  allInfos.forEach(function (item) {
    console.log("文章题目 " + item["title"] + '\n' + "文章作者 " + item["author"] + '\n'+ '\n');
  });
}
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function (data) {
    var allInfos = filterData(html);
    printInfos(allInfos);
  });
}).on("error", function () {
  console.log("爬取博客园首页失败")
});

Copy after login

That is, the above process is crawling the title and author of the blog.

The final background output is as follows:

This is consistent with the content of the blog homepage:

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1668

CakePHP Tutorial

1428

Laravel Tutorial

1329

PHP Tutorial

1273

C# Tutorial

1256

Related knowledge

What functions does Doubao app have? Mar 01, 2024 pm 10:04 PM

There will be many AI creation functions in the Doubao app, so what functions does the Doubao app have? Users can use this software to create paintings, chat with AI, generate articles for users, help everyone search for songs, etc. This function introduction of the Doubao app can tell you the specific operation method. The specific content is below, so take a look! What functions does the Doubao app have? Answer: You can draw, chat, write articles, and find songs. Function introduction: 1. Question query: You can use AI to find answers to questions faster, and you can ask any kind of questions. 2. Picture generation: AI can be used to create different pictures for everyone. You only need to tell everyone the general requirements. 3. AI chat: can create an AI that can chat for users,

The difference between vivox100s and x100: performance comparison and function analysis Mar 23, 2024 pm 10:27 PM

Both vivox100s and x100 mobile phones are representative models in vivo's mobile phone product line. They respectively represent vivo's high-end technology level in different time periods. Therefore, the two mobile phones have certain differences in design, performance and functions. This article will conduct a detailed comparison between these two mobile phones in terms of performance comparison and function analysis to help consumers better choose the mobile phone that suits them. First, let’s look at the performance comparison between vivox100s and x100. vivox100s is equipped with the latest

What exactly is self-media? What are its main features and functions? Mar 21, 2024 pm 08:21 PM

With the rapid development of the Internet, the concept of self-media has become deeply rooted in people's hearts. So, what exactly is self-media? What are its main features and functions? Next, we will explore these issues one by one. 1. What exactly is self-media? We-media, as the name suggests, means you are the media. It refers to an information carrier through which individuals or teams can independently create, edit, publish and disseminate content through the Internet platform. Different from traditional media, such as newspapers, television, radio, etc., self-media is more interactive and personalized, allowing everyone to become a producer and disseminator of information. 2. What are the main features and functions of self-media? 1. Low threshold: The rise of self-media has lowered the threshold for entering the media industry. Cumbersome equipment and professional teams are no longer needed.

What are the functions of Xiaohongshu account management software? How to operate a Xiaohongshu account? Mar 21, 2024 pm 04:16 PM

As Xiaohongshu becomes popular among young people, more and more people are beginning to use this platform to share various aspects of their experiences and life insights. How to effectively manage multiple Xiaohongshu accounts has become a key issue. In this article, we will discuss some of the features of Xiaohongshu account management software and explore how to better manage your Xiaohongshu account. As social media grows, many people find themselves needing to manage multiple social accounts. This is also a challenge for Xiaohongshu users. Some Xiaohongshu account management software can help users manage multiple accounts more easily, including automatic content publishing, scheduled publishing, data analysis and other functions. Through these tools, users can manage their accounts more efficiently and increase their account exposure and attention. In addition, Xiaohongshu account management software has

Pi Node Teaching: What is a Pi Node? How to install and set up Pi Node? Mar 05, 2025 pm 05:57 PM

Detailed explanation and installation guide for PiNetwork nodes This article will introduce the PiNetwork ecosystem in detail - Pi nodes, a key role in the PiNetwork ecosystem, and provide complete steps for installation and configuration. After the launch of the PiNetwork blockchain test network, Pi nodes have become an important part of many pioneers actively participating in the testing, preparing for the upcoming main network release. If you don’t know PiNetwork yet, please refer to what is Picoin? What is the price for listing? Pi usage, mining and security analysis. What is PiNetwork? The PiNetwork project started in 2019 and owns its exclusive cryptocurrency Pi Coin. The project aims to create a one that everyone can participate

What is Discuz? Definition and function introduction of Discuz Mar 03, 2024 am 10:33 AM

"Exploring Discuz: Definition, Functions and Code Examples" With the rapid development of the Internet, community forums have become an important platform for people to obtain information and exchange opinions. Among the many community forum systems, Discuz, as a well-known open source forum software in China, is favored by the majority of website developers and administrators. So, what is Discuz? What functions does it have, and how can it help our website? This article will introduce Discuz in detail and attach specific code examples to help readers learn more about it.

PHP Tips: Quickly Implement Return to Previous Page Function Mar 09, 2024 am 08:21 AM

PHP Tips: Quickly implement the function of returning to the previous page. In web development, we often encounter the need to implement the function of returning to the previous page. Such operations can improve the user experience and make it easier for users to navigate between web pages. In PHP, we can achieve this function through some simple code. This article will introduce how to quickly implement the function of returning to the previous page and provide specific PHP code examples. In PHP, we can use $_SERVER['HTTP_REFERER'] to get the URL of the previous page

Detailed explanation of the functions and functions of GDM under Linux Mar 01, 2024 pm 04:18 PM

Detailed explanation of the functions and functions of GDM under Linux In the Linux operating system, GDM (GNOMEDisplayManager) is a graphical login manager that provides an interface for users to log in and log out of the system. GDM is usually part of the GNOME desktop environment, but can be used by other desktop environments as well. The role of GDM is not only to provide a login interface, but also includes user session management, screen saver, automatic login and other functions. The functions of GDM mainly include the following aspects:

See all articles