crawler - How to complete JavaScript function page turning in Python crawler?
typecho
typecho 2017-06-13 09:24:39
0
2
1473

When I crawled a web page, I noticed that its page turning was implemented by such a function. After turning the page, the page URL did not change:

<input class="buttonJump" name="goto2" onclick="dirGroupMblogToPage(document.getElementById('dirGroupMblogcp2').value)" type="button" value="Go"/>
     </input>
     
     
function dirGroupMblogToPage(currentPage){

    jQuery.post("dirGroupMblog.action", {"page.currentPage":currentPage,gid:MI.TalkBox.gid}, function(data){$("#talkMain").html(data);
        window.scrollTo(0, $css.getY(MI.talkList._body)-65);
    });

}

Written a function like this to try to achieve page turning:

def login_page(login_url, content_url, usr_name="******@126.com", passwd="******"):
    # 实现登录, 返回Session对象和获得的页面
    post_data = {'r': 'on', 'u': usr_name, 'p': passwd}
    s = requests.Session()
    s.post(login_url, post_data)
    r = s.get(content_url)
    return s, r

def turn_page(s, next_page, content_url):
    post_url = "http://sns.icourses.cn/dirGroupMblog.action"
    post_headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", 
               "X-Requested-With":"XMLHttpRequest"}
    post_data = {"page.currentPage": next_page, "gid": 2632}
    s.post(post_url, data=post_data, headers = post_headers)
    res = s.get(content_url)
    return res

But page turning failed after calling turn_page(). How should we solve this problem? Also, what kind of knowledge do we need to learn by ourselves to solve this kind of problem? Thank you!

typecho
typecho

Following the voice in heart.

reply all(2)
阿神
  • Recommended to use selenium

  • For example, if you need to click the next page button on the interface, or you need to enter the up, down, left, and right keys, the page can be turned, selenium webdriver can do it, and give a reference (I used to crawl the novels on Qidian Chinese website )

  • Selenium can interact with the page, click, double-click, enter, wait for the page to load (implicit wait, and explicit wait). . . .

from selenium import webdriver
# from selenium.webdriver.common.keys import Keys

#driver = webdriver.PhantomJS(executable_path="D:\phantomjs-2.1.1-windows\bin\phantomjs")
# 我的windows 已配置环境变量,不需指定 executable_path,使用 Chrome需要对应的浏览器以及驱动程序
driver = webdriver.Chrome()

# url 为你需要加载的页面url
url = 'http://sns.icourses.cn/*****'

# 打开页面
driver.get(url)

# 在你的例子中,是需要点击 button ,通过class 属性获取到button,然后执行单击 .click()
# 如果需要准确定位,可以自行搜索其他的 find_
driver.find_element_by_class_name("buttonJump").click()

# selenium webdriver 还有很多其它高级的用法,自行谷歌,你这个问题,搜索应该是能得到答案的,
Ty80

There are several situations,
1. The page can be turned by sliding or clicking through the js effect;
2. The page can be turned by clicking on the hyperlink;

You can use the network analysis in Chrome's developer tools to get the result, whether it is an html page or a feedback json rendering.
json is easier to handle, just get the result directly. Ordinary html pages need to use regular matching to page breaks. Then put the link into the pool to be crawled.

/a/11...

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template