node.js - node crawler, request to set proxy, always reports an error for help
学习ing
学习ing 2017-06-21 10:12:18
0
2
1055

I use request to crawl images. In order to prevent the IP from being blocked, I use a proxy. However, after using the proxy, I always get an error. nodejs uses request and async modules

function download(item,cb){
  request({
    url:item.img,
    proxy:proxys[Math.random()*proxys.length|0],
    method:'GET',
    timeout:5000
  },function(err,response,body){
    if(response && response.statusCode == 200){
      cb(null,item);
    }
  }).on('error',function(){
    console.log('下载出现异常,可能是pipe有问题,再次请求...');
    download(item,cb);
    // cb(null,item);
  }).pipe(fs.createWriteStream(fileDir2+item.name+'.'+item.url_token+'.jpg'));
}

download(item,cb), cb is the callback function of the control flow in async:

async.eachLimit(items,10,function(item,cb){
    download(item,cb);
},function(){...})

Every time I download a few files, I get an error and stop running:

throw new assert.AssertionError({
  ^
AssertionError: 258 == 0
at ClientRequest.onConnect (C:\Users\fox\WebstormProjects\nodejs\实战\爬虫\node_modules\tunnel-agent\index.js:160:14)

If I remove the proxy request header, nothing will happen; if I change the above download to no longer continue the request and directly cb(), no error will be reported if the request fails.

.on('error',function(){
console.log('下载出现异常,可能是pipe有问题,再次请求...');
  // download(item,cb);
cb(null,item);
})

Please take a look and see if you can help me solve it. I have been thinking about it for a long time and have been troubleshooting it. I don’t know why.

学习ing
学习ing

reply all(2)
伊谢尔伦

I have done almost the same function as you before, directly downloading a lot of pictures. I downloaded part of them, and then reported an error. Finally, I tried to wrap a layer of setTimeout, similar to:

setTimeout(function(){
    download(item, cb);
},400);

This is actually good, I wrote a blog post about this: nodejs batch downloading pictures, you can refer to it

过去多啦不再A梦

When encountering this kind of problem, the program must have a retry mechanism.
A good retry mechanism is: on the next try, increase the sleep time appropriately to ensure correct execution.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template