Home > Web Front-end > JS Tutorial > body text

How to crawl website images in nodejs

亚连
Release: 2018-06-20 16:40:05
Original
2750 people have browsed it

Let me teach you how to use nodejs to crawl website images through an example. Friends who are interested can save it.

I will explain to you how nodejs implements the function of crawling website images through examples. The following is the full content:

Principle:

Crawler is the most obvious IO For intensive application scenarios, it is obvious to use node, which makes data mining with small I/O waiting overhead more convenient.

Use the express module to build the node service

And use the request module to obtain the html code of the target page

Download the cheerio module to process html code (cheerio has a syntax similar to jQuery, so it is easy to use and convenient)

Environment configuration:

npm install express request cheerio --save
Copy after login

(1)Introduce each module

var http = require('http');
var request = require('request);
var cheerio = require('cheerio');
var fs = require('fs'); //用来操作文件
var url = 'https://movie.douban.com/cinema/nowplaying/beijing/' //定义要爬的页面
Copy after login

(2)Send a request

http.get(function(res){
  var html = '';
  var titles = [];
  res.setEncoding('utf-8') //防止中文乱码
  res.on('data',function(chunk){
    html += chrunk;    //监听data事件 每次取一块数据
  })
  res.on('end',function(){
    var $ = cheerio.load(html);  //获取数据完成后,解析html
    //将获取的图片存到images文件夹中
    $('.mod-bd img').each(function(index, item){
      //获取图片属性
      var imgName = $(this).parent().next().text().trimg()
      var imgfile = imgName + '.jpeg';
      var imgSrc = $(this).attr('src')
      //采用request模块,向服务器发起请求 获取图片资源
      request.head(imgSrc, function(error, res,body){
        if(error){
          console.log('失败了')
        }
      });
      //通过管道的方式用fs模块将图片写到本地的images文件下
      request(imgSrc).pipe.(fs.createWriteStream('./images/' + imgfile));
    })
    
  })
})
Copy after login

The above is what I compiled for everyone, I hope it will be helpful to everyone in the future .

Related articles:

How to implement reassignment using js

How to save the image generated by canvas in js

How to implement two-way binding in js

Details introduction to the more practical functions of webpack

How to implement a menu using jQuery Add removal function

How to configure ueditor using nodejs mongodb vue

The above is the detailed content of How to crawl website images in nodejs. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template