Page address:
http://acm.hdu.edu.cn/showpro...
Crawling target:
If you want to crawl the code of these formulas, Chrome
Press F12
to see the code:
But the crawled code is as follows:
This code does not display the correct formula. It seems that these codes are generated by js
. How to crawl these codes.
This is parsed by the MathJax tool.
Look in the HTML code for the script with the next ID of formula p being MathJax-Element-X, copy the content inside, and add two $$ before and after the formula (two dollar signs before and after, so there are four in total ) Finally, just use MathJax to parse.
I can’t say more than the picture above:
First capture the packet and capture the ajax request. The key is to see how to construct the request, mainly to determine some parameters. The routines are as follows: 1. Search the context to see if the relevant parameter exists. If it is returned by the server, directly request the server to obtain the parameter; 2. If the parameter is obviously unchanged or changes regularly, you can directly forge it; 3. If the parameter It is very complicated and irregular, so you need to search for the key of the parameter, find the encrypted js, and then construct the value of the parameter to get the ajax url; 4. If it is too difficult, use automated tools such as selenium to drive the browser to access it, and it will directly give you Render all js