This time I will show you how to operate the node to achieve the crawler effect. What are the precautions for operating the node to achieve the crawler effect? The following is a practical case, let's take a look.
Node is a server-side language, so you can crawl the website like Python. Let’s use node to crawl the blog park and get all the chapter information.Step one: Create the crawl file and then npm init.
Second step: Create the crawl.js file. A simple code to crawl the entire page is as follows:
var http = require("http"); var url = "http://www.cnblogs.com"; http.get(url, function (res) { var html = ""; res.on("data", function (data) { html += data; }); res.on("end", function () { console.log(html); }); }).on("error", function () { console.log("获取课程结果错误!"); });
The third step: Introduce the cheerio module, as follows: (Just install it in gitbash, cmd always has problems)
cnpm install cheerio --save-dev
Step 4: Operate dom and obtain useful information.
var http = require("http"); var cheerio = require("cheerio"); var url = "http://www.cnblogs.com"; function filterData(html) { var $ = cheerio.load(html); var items = $(".post_item"); var result = []; items.each(function (item) { var tit = $(this).find(".titlelnk").text(); var aut = $(this).find(".lightblue").text(); var one = { title: tit, author: aut }; result.push(one); }); return result; } function printInfos(allInfos) { allInfos.forEach(function (item) { console.log("文章题目 " + item["title"] + '\n' + "文章作者 " + item["author"] + '\n'+ '\n'); }); } http.get(url, function (res) { var html = ""; res.on("data", function (data) { html += data; }); res.on("end", function (data) { var allInfos = filterData(html); printInfos(allInfos); }); }).on("error", function () { console.log("爬取博客园首页失败") });
How to use Koa in Node.js to implement JWT user authentication
react-navigation use case analysis
The above is the detailed content of How to operate node to achieve crawler effect. For more information, please follow other related articles on the PHP Chinese website!