Home > Web Front-end > JS Tutorial > nodejs implements the function of crawling website images_node.js

nodejs implements the function of crawling website images_node.js

韦小宝
Release: 2017-12-16 09:15:53
Original
2126 people have browsed it

I’ll give you an example to teach you how to use nodejs to implement the pictures function of crawling websites. Friends who are interested can save it.

I will explain to you how nodejs implements the function of crawling website images through examples. The following is the full content:

Principle:

Crawler is the most powerful For obvious IO-intensive application scenarios, it is obvious to use node, which makes data mining with small I/O waiting overhead more convenient.

Use the express module to build node services

and use the request module to obtain the target page html code

Download the cheerio module to process the html code (cheerio has a syntax similar to jQuery, so it is easy to use and convenient)

Environment configuration:


npm install express request cheerio --save
Copy after login


(1)Introduce each module


var http = require('http');
var request = require('request);
var cheerio = require('cheerio');
var fs = require('fs'); //用来操作文件
var url = 'https://movie.douban.com/cinema/nowplaying/beijing/' //定义要爬的页面
Copy after login


(2) Send a request

http.get(function(res){
  var html = '';
  var titles = [];
  res.setEncoding('utf-8') //防止中文乱码
  res.on('data',function(chunk){
    html += chrunk;    //监听data事件 每次取一块数据
  })
  res.on('end',function(){
    var $ = cheerio.load(html);  //获取数据完成后,解析html
    //将获取的图片存到images文件夹中
    $('.mod-bd img').each(function(index, item){
      //获取图片属性
      var imgName = $(this).parent().next().text().trimg()
      var imgfile = imgName + '.jpeg';
      var imgSrc = $(this).attr('src')
      //采用request模块,向服务器发起请求 获取图片资源
      request.head(imgSrc, function(error, res,body){
        if(error){
          console.log('失败了')
        }
      });
      //通过管道的方式用fs模块将图片写到本地的images文件下
      request(imgSrc).pipe.(fs.createWriteStream('./images/' + imgfile));
    })
    
  })
})
Copy after login

The above is all the content of this article, I hope it will be helpful to my friends! !

Related recommendations:

NodeJS crawler instance encyclopedia of embarrassing things_node.js

NodeJs’s solution to database exception handling

How to use nodejs to implement chat function


The above is the detailed content of nodejs implements the function of crawling website images_node.js. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template