Node crawl data example: grab the Pokémon illustrated book and generate an Excel file-JS Tutorial-php.cn

Table of Contents

Write to Excel

Reading Excel

Conclusion

Home

Web Front-end

JS Tutorial

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

青灯夜游

Aug 26, 2022 pm 08:31 PM

nodejs node

How to use Node to crawl data from web pages and write them into Excel files? The following article uses an example to explain how to use Node.js to crawl web page data and generate Excel files. I hope it will be helpful to everyone!

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

I believe that Pokémon is the childhood memory of many people born in the 90s. As a programmer, I have wanted to make a Pokémon game more than once, but before doing so, I should first Sort out how many Pokémon there are, their numbers, names, attributes and other information. In this issue, we will use Node.js to simply implement a crawling of Pokémon web data to convert these The data is generated into an Excel file until the interface is used to read Excel to access the data.

Crawling data

Since we are crawling data, let’s first find a webpage with Pokémon illustrated data, as shown below:

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

This website is written in PHP, and there is no separation between the front and back, so we will not read the interface to capture data. We use the crawler library to capture the content of the web page. elements to get the data. Let me explain in advance, the advantage of using the crawler library is that you can use jQuery to capture elements in the Node environment.

Installation:

yarn add crawler

Copy after login

Implementation:

const Crawler = require("crawler");
const fs = require("fs")
const { resolve } = require("path")

let crawler = new Crawler({
    timeout: 10000,
    jQuery: true,
});

let crawler = new Crawler({
    timeout: 10000,
    jQuery: true,
});

function getPokemon() {
    let uri = "" // 宝可梦图鉴地址
    let data = []
    return new Promise((resolve, reject) => {
        crawler.queue({
            uri,
            callback: (err, res, done) => {
                if (err) reject(err);
                let $ = res.$;
                try {
                    let $tr = $(".roundy.eplist tr");
                    $tr.each((i, el) => {
                        let $td = $(el).find("td");
                        let _code = $td.eq(1).text().split("\n")[0]
                        let _name = $td.eq(3).text().split("\n")[0]
                        let _attr = $td.eq(4).text().split("\n")[0]
                        let _other = $td.eq(5).text().split("\n")[0]
                        _attr = _other.indexOf("属性") != -1 ? _attr : `${_attr}+${_other}`
                        if (_code) {
                            data.push([_code, _name, _attr])
                        }
                    })
                    done();
                    resolve(data)
                } catch (err) {
                    done()
                    reject(err)
                }

            }
        })
    })
}

Copy after login

When generating an instance, you also need to enable the jQuery mode, and then you can use $ matches. The business of the middle part of the above code is to capture the data required in elements and crawl web pages. It is used the same as jQuery API, so I won’t go into details here.

getPokemon().then(async data => {
    console.log(data)
})

Copy after login

Finally we can execute and print the passed data data to verify that the format has been crawled and there are no errors.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Write to Excel

Now that we have crawled the data just now, next, we will use node -xlsx library to complete writing data and generating an Excel file.

First of all, let’s introduce that node-xlsx is a simple excel file parser and generator. The one built by TS relies on the SheetJS xlsx module to parse/build excel worksheets, so the two can be common in some parameter configurations.

Installation:

yarn add node-xlsx

Copy after login

Implementation:

const xlsx = require("node-xlsx")

getPokemon().then(async data => {
    let title = ["编号", "宝可梦", "属性"]
    let list = [{
        name: "关都",
        data: [
            title,
            ...data
        ]
    }];
    const sheetOptions = { '!cols': [{ wch: 15 }, { wch: 20 }, { wch: 20 }] };
    const buffer = await xlsx.build(list, { sheetOptions })
    try {
        await fs.writeFileSync(resolve(__dirname, "data/pokemon.xlsx"), buffer, "utf8")
    } catch (error) { }
})

Copy after login

The name is the column name in the Excel file, and the data If the type is an array, an array must also be passed in to form a two-dimensional array, which means that the incoming text is sorted starting from the ABCDE.... column. At the same time, you can set the column width through !cols. The first object wch:10 means that the width of the first column is 10 characters. There are many parameters that can be set. You can refer to xlsx library to learn these configuration items.

Finally, we generate buffer data through the xlsx.build method, and finally use fs.writeFileSync to write or create an Excel file , for the convenience of viewing, I have stored it in a folder named data. At this time, we will find an additional file called pokemon.xlsx in the data folder. Open it and the data will be the same. Write the data like this This step of entering Excel is complete.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Reading Excel

Reading Excel is actually very easy and you don’t even need to write fs to read. Use the xlsx.parse method to pass in the file address to read it directly.

xlsx.parse(resolve(__dirname, "data/pokemon.xlsx"));

Copy after login

Of course, in order to verify the accuracy, we directly write an interface to see if we can access the data. For convenience, I directly use the express framework to accomplish this.

Let’s install it first:

yarn add express

Copy after login

Then, create the express service. I use 3000 for the port number here, so just write a GET Just request to send the data read from the Excel file.

const express = require("express")
const app = express();
const listenPort = 3000;

app.get("/pokemon",(req,res)=>{
    let data = xlsx.parse(resolve(__dirname, "data/pokemon.xlsx"));
    res.send(data)
})

app.listen(listenPort, () => {
    console.log(`Server running at http://localhost:${listenPort}/`)
})

Copy after login

Finally, I use the postman access interface here, and you can clearly see that all Pokémon data we have received from crawling to storing in the table.

Node crawl data example: grab the Pokémon illustrated book and generate an Excel file

Conclusion

As you can see, this article uses Pokémon as an example to learn how to use Node.js to crawl data from web pages and how to write data to Excel files. , and how to read data from Excel files. In fact, it is not difficult to implement, but sometimes it is quite practical. If you are worried about forgetting, you can save it~

More node related For knowledge, please visit: nodejs tutorial!

The above is the detailed content of Node crawl data example: grab the Pokémon illustrated book and generate an Excel file. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7612

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

How to use express to handle file upload in node project Mar 28, 2023 pm 07:28 PM

How to handle file upload? The following article will introduce to you how to use express to handle file uploads in the node project. I hope it will be helpful to you!

How to delete node in nvm Dec 29, 2022 am 10:07 AM

How to delete node with nvm: 1. Download "nvm-setup.zip" and install it on the C drive; 2. Configure environment variables and check the version number through the "nvm -v" command; 3. Use the "nvm install" command Install node; 4. Delete the installed node through the "nvm uninstall" command.

An in-depth analysis of Node's process management tool 'pm2” Apr 03, 2023 pm 06:02 PM

This article will share with you Node's process management tool "pm2", and talk about why pm2 is needed, how to install and use pm2, I hope it will be helpful to everyone!

Pi Node Teaching: What is a Pi Node? How to install and set up Pi Node? Mar 05, 2025 pm 05:57 PM

Detailed explanation and installation guide for PiNetwork nodes This article will introduce the PiNetwork ecosystem in detail - Pi nodes, a key role in the PiNetwork ecosystem, and provide complete steps for installation and configuration. After the launch of the PiNetwork blockchain test network, Pi nodes have become an important part of many pioneers actively participating in the testing, preparing for the upcoming main network release. If you don’t know PiNetwork yet, please refer to what is Picoin? What is the price for listing? Pi usage, mining and security analysis. What is PiNetwork? The PiNetwork project started in 2019 and owns its exclusive cryptocurrency Pi Coin. The project aims to create a one that everyone can participate

Let's talk about how to use pkg to package Node.js projects into executable files. Dec 02, 2022 pm 09:06 PM

How to package nodejs executable file with pkg? The following article will introduce to you how to use pkg to package a Node project into an executable file. I hope it will be helpful to you!

What to do if npm node gyp fails Dec 29, 2022 pm 02:42 PM

npm node gyp fails because "node-gyp.js" does not match the version of "Node.js". The solution is: 1. Clear the node cache through "npm cache clean -f"; 2. Through "npm install -g n" Install the n module; 3. Install the "node v12.21.0" version through the "n v12.21.0" command.

Token-based authentication with Angular and Node Sep 01, 2023 pm 02:01 PM

Authentication is one of the most important parts of any web application. This tutorial discusses token-based authentication systems and how they differ from traditional login systems. By the end of this tutorial, you will see a fully working demo written in Angular and Node.js. Traditional Authentication Systems Before moving on to token-based authentication systems, let’s take a look at traditional authentication systems. The user provides their username and password in the login form and clicks Login. After making the request, authenticate the user on the backend by querying the database. If the request is valid, a session is created using the user information obtained from the database, and the session information is returned in the response header so that the session ID is stored in the browser. Provides access to applications subject to

A brief analysis of how node implements ocr Oct 31, 2022 pm 07:09 PM

How to implement OCR (optical character recognition)? The following article will introduce to you how to use node to implement OCR. I hope it will be helpful to you!

See all articles