javascript - node crawls Weibo
仅有的幸福
仅有的幸福 2017-06-30 10:00:08
0
1
973

I am new to node and want to write a crawler to crawl Sina Weibo comments, but I found that the page is dynamically generated by JS and cannot be crawled with the http module, so I used phantomjs to crawl it (I heard it will be slower, It has been running for nearly 15 minutes. It is too slow. I wonder if I wrote it wrong), but it still doesn’t work. Is there any way to crawl web pages similar to Sina Weibo?

let page=require("webpage").create();
let url="http://weibo.com/1713926427/Etq2WnSiR?filter=hot&root_comment_id=0&type=comment";
/*page.settings = {
    javascriptEnabled: true,
    loadImages: false,
    webSecurityEnabled: false,
    userAgent: 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER'
};*/
page.open(url,(status)=>{
    console.log("Status:"+status);
    if(status=="success"){
        let val = page.evaluate(()=>{
            var list_box=document.querySelector(".list_box");
            console.log(list_box);
            return list_box
        });
        console.log(val)
    }else{
        console.log("failed")
    }
    phantom.exit();
});
仅有的幸福
仅有的幸福

reply all(1)
扔个三星炸死你

I have written about crawling Weibo. There are two ideas

  1. If you look carefully, there should be an interface to get the corresponding data and then use regular expressions to match it

  2. Weibo provides a developer API interface, although it is more troublesome to use

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template