I plan to use nodejs to capture all the news on the following website. According to the general idea, first get the URL of each page of news, and then get the URL of each news
Use request to get the content of each URL Just take it off and it's OK.
But all the paging information of the following URL, as well as the URL of each news clicked into, have not changed. It seems that they are all implemented through js in the background.
I can’t view it even with the newwork tab of F12 in chrome. If you have any requests, can any expert guide me how to capture it?
http://www.xxxxxxxxx.com/glob...
1. As you can see from the previous and next articles, the function bound to click: boardView(1);
2. Find the corresponding function in the page through boadrview:
…………
3. See that the data comes from the variable list, and then look for list
4. See at line 1739:
5. A constructor is called: jsList() and the corresponding code is found here: http://www.samsungsem.com/js/...
6 Look back at the code in step 2: list.artTitles-->These data are set through the cmsInit method of jsList, and in cmsInit:
The data of...
comes from the fourth parameter data
7. Look at the data passed in step 4 which is new data()
So, we find where the data function is defined.
Look up and find: <script src="/global/news/data.js.jsp"></script>
8. Open it and take a look: http://www.samsungsem.com/glo...
It feels so strange. Why is it so strange?
Right-click to view the source code:
view-source: http://www.samsungsem.com/glo...
You can see that the data function is defined here, and the data you see is also on this page.
Thanks for the answer, I’ll go take a look first...
I basically understand it. There are still some things that I don’t understand very well. I’ll take my time to look at it. Thank you very much..