I am new to node and want to write a crawler to crawl Sina Weibo comments, but I found that the page is dynamically generated by JS and cannot be crawled with the http module, so I used phantomjs to crawl it (I heard it will be slower, It has been running for nearly 15 minutes. It is too slow. I wonder if I wrote it wrong), but it still doesn’t work. Is there any way to crawl web pages similar to Sina Weibo?
let page=require("webpage").create();
let url="http://weibo.com/1713926427/Etq2WnSiR?filter=hot&root_comment_id=0&type=comment";
/*page.settings = {
javascriptEnabled: true,
loadImages: false,
webSecurityEnabled: false,
userAgent: 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER'
};*/
page.open(url,(status)=>{
console.log("Status:"+status);
if(status=="success"){
let val = page.evaluate(()=>{
var list_box=document.querySelector(".list_box");
console.log(list_box);
return list_box
});
console.log(val)
}else{
console.log("failed")
}
phantom.exit();
});
I have written about crawling Weibo. There are two ideas
If you look carefully, there should be an interface to get the corresponding data and then use regular expressions to match it
Weibo provides a developer API interface, although it is more troublesome to use