javascript - Nodejs crawls website page turning judgment and speech judgment problems.
淡淡烟草味
淡淡烟草味 2017-05-16 13:42:31
0
3
648

Website http://www.everlight.com/news...
Two questions 1: How to get the url of each page
2 is to click on the content of the news,
For example http:/ /www.everlight.com/news...
If it is an English operating system, English news will be displayed.
If it is a Chinese system, Chinese news will be displayed.
I want to capture it permanently in node How to retrieve English news.

淡淡烟草味
淡淡烟草味

reply all(3)
巴扎黑

Question closed...

When posting, there are several key data in the form, which are placed in hidden variables. Specifying these variables should solve the problem.

世界只因有你

There is a language switch in the upper right corner. If you look at the code, this function is called:
function __doPostBack(eventTarget, eventArgument) {

if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
    theForm.__EVENTTARGET.value = eventTarget;
    theForm.__EVENTARGUMENT.value = eventArgument;
    theForm.submit();
}

}

In fact, you just submitted the form,
and the form is the original page sent by post
So, after you click, you will see that the page flashes, but the URL does not change.
So, if you want the English version, pass the parameter in post method: __EVENTTARGET="ctl00$ctl00$lBtnUSA" to get the English version of the page.

Get the url in the page and parse the dom.

How to get the url in the page:

var jsdom = require("jsdom");
 
jsdom.env({
  url: "http://www.everlight.com/newsdetail.aspx?pcseq=4&cseq=7&seq=291",
  scripts: ["http://code.jquery.com/jquery.js"],
  done: function (err, window) {
    var $ = window.$;
    console.log("HN Links");
    $("a").each(function() {
      //console.log(" -", $(this).text());
      var tmp=$(this).text()+"---"+$(this).attr("href");
      console.log(tmp);
    });
  }
});
某草草

Let’s analyze the header information in the request. There is an item in it that can be used to set the language

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template