How to operate node to achieve crawler effect-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

How to operate node to achieve crawler effect

php中世界最好的语言

Jun 01, 2018 am 11:07 AM

node accomplish reptile

This time I will show you how to operate the node to achieve the crawler effect. What are the precautions for operating the node to achieve the crawler effect? The following is a practical case, let's take a look.

Node is a server-side language, so you can crawl the website like Python. Let’s use node to crawl the blog park and get all the chapter information.

Step one: Create the crawl file and then npm init.

Second step: Create the crawl.js file. A simple code to crawl the entire page is as follows:

var http = require("http");
var url = "http://www.cnblogs.com";
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function () {
    console.log(html);
  });
}).on("error", function () {
  console.log("获取课程结果错误！");
});

Copy after login

Introduce the http module, and then use http The get request of the object, that is, once it is run, it is equivalent to the node server sending a get request to request this page, and then returning it through res, where the on binding data event is used to continuously receive data, and at the end we print it out in the background .

This is just a part of the entire page. We can inspect the elements on this page and find that they are indeed the same.

We only need to crawl the chapter title and the information of each section. .

The third step: Introduce the cheerio module, as follows: (Just install it in gitbash, cmd always has problems)

cnpm install cheerio --save-dev

Copy after login

The introduction of this module is for It is convenient for us to operate dom, just like jQuery.

Step 4: Operate dom and obtain useful information.

var http = require("http");
var cheerio = require("cheerio");
var url = "http://www.cnblogs.com";
function filterData(html) {
  var $ = cheerio.load(html); 
  var items = $(".post_item");
  var result = [];
  items.each(function (item) {
    var tit = $(this).find(".titlelnk").text();
    var aut = $(this).find(".lightblue").text();
    var one = {
      title: tit,
      author: aut
    };
    result.push(one);
  });
  return result;
}
function printInfos(allInfos) {
  allInfos.forEach(function (item) {
    console.log("文章题目 " + item["title"] + '\n' + "文章作者 " + item["author"] + '\n'+ '\n');
  });
}
http.get(url, function (res) {
  var html = "";
  res.on("data", function (data) {
    html += data;
  });
  res.on("end", function (data) {
    var allInfos = filterData(html);
    printInfos(allInfos);
  });
}).on("error", function () {
  console.log("爬取博客园首页失败")
});

Copy after login

That is, the above process is crawling the title and author of the blog.

The final background output is as follows:

This is consistent with the content of the blog homepage:

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks ago By DDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7861

Java Tutorial

1649

CakePHP Tutorial

1403

Laravel Tutorial

1300

PHP Tutorial

1242

Related knowledge

How to implement dual WeChat login on Huawei mobile phones? Mar 24, 2024 am 11:27 AM

How to implement dual WeChat login on Huawei mobile phones? With the rise of social media, WeChat has become one of the indispensable communication tools in people's daily lives. However, many people may encounter a problem: logging into multiple WeChat accounts at the same time on the same mobile phone. For Huawei mobile phone users, it is not difficult to achieve dual WeChat login. This article will introduce how to achieve dual WeChat login on Huawei mobile phones. First of all, the EMUI system that comes with Huawei mobile phones provides a very convenient function - dual application opening. Through the application dual opening function, users can simultaneously

PHP Programming Guide: Methods to Implement Fibonacci Sequence Mar 20, 2024 pm 04:54 PM

The programming language PHP is a powerful tool for web development, capable of supporting a variety of different programming logics and algorithms. Among them, implementing the Fibonacci sequence is a common and classic programming problem. In this article, we will introduce how to use the PHP programming language to implement the Fibonacci sequence, and attach specific code examples. The Fibonacci sequence is a mathematical sequence defined as follows: the first and second elements of the sequence are 1, and starting from the third element, the value of each element is equal to the sum of the previous two elements. The first few elements of the sequence

How to implement the WeChat clone function on Huawei mobile phones Mar 24, 2024 pm 06:03 PM

How to implement the WeChat clone function on Huawei mobile phones With the popularity of social software and people's increasing emphasis on privacy and security, the WeChat clone function has gradually become the focus of people's attention. The WeChat clone function can help users log in to multiple WeChat accounts on the same mobile phone at the same time, making it easier to manage and use. It is not difficult to implement the WeChat clone function on Huawei mobile phones. You only need to follow the following steps. Step 1: Make sure that the mobile phone system version and WeChat version meet the requirements. First, make sure that your Huawei mobile phone system version has been updated to the latest version, as well as the WeChat App.

Pi Node Teaching: What is a Pi Node? How to install and set up Pi Node? Mar 05, 2025 pm 05:57 PM

Detailed explanation and installation guide for PiNetwork nodes This article will introduce the PiNetwork ecosystem in detail - Pi nodes, a key role in the PiNetwork ecosystem, and provide complete steps for installation and configuration. After the launch of the PiNetwork blockchain test network, Pi nodes have become an important part of many pioneers actively participating in the testing, preparing for the upcoming main network release. If you don’t know PiNetwork yet, please refer to what is Picoin? What is the price for listing? Pi usage, mining and security analysis. What is PiNetwork? The PiNetwork project started in 2019 and owns its exclusive cryptocurrency Pi Coin. The project aims to create a one that everyone can participate

Master how Golang enables game development possibilities Mar 16, 2024 pm 12:57 PM

In today's software development field, Golang (Go language), as an efficient, concise and highly concurrency programming language, is increasingly favored by developers. Its rich standard library and efficient concurrency features make it a high-profile choice in the field of game development. This article will explore how to use Golang for game development and demonstrate its powerful possibilities through specific code examples. 1. Golang’s advantages in game development. As a statically typed language, Golang is used in building large-scale game systems.

How to implement exact division operation in Golang Feb 20, 2024 pm 10:51 PM

Implementing exact division operations in Golang is a common need, especially in scenarios involving financial calculations or other scenarios that require high-precision calculations. Golang's built-in division operator "/" is calculated for floating point numbers, and sometimes there is a problem of precision loss. In order to solve this problem, we can use third-party libraries or custom functions to implement exact division operations. A common approach is to use the Rat type from the math/big package, which provides a representation of fractions and can be used to implement exact division operations.

PHP Game Requirements Implementation Guide Mar 11, 2024 am 08:45 AM

PHP Game Requirements Implementation Guide With the popularity and development of the Internet, the web game market is becoming more and more popular. Many developers hope to use the PHP language to develop their own web games, and implementing game requirements is a key step. This article will introduce how to use PHP language to implement common game requirements and provide specific code examples. 1. Create game characters In web games, game characters are a very important element. We need to define the attributes of the game character, such as name, level, experience value, etc., and provide methods to operate these

Using PHP to implement SaaS: a comprehensive analysis Mar 07, 2024 pm 10:18 PM

I'm really sorry that I can't provide real-time programming guidance, but I can provide you with a code example to give you a better understanding of how to use PHP to implement SaaS. The following is an article within 1,500 words, titled "Using PHP to implement SaaS: A comprehensive analysis." In today's information age, SaaS (Software as a Service) has become the mainstream way for enterprises and individuals to use software. It provides a more flexible and convenient way to access software. With SaaS, users don’t need to be on-premises

See all articles