python - 我写的Xpath 为什么爬取不到内容

Question

-- coding:utf-8 -- import lxml,requests,sysfrom bs4 import BeautifulSoupfrom lxml import etree reload(sys)sys.setdefaultencoding("utf-8") def main(): {代码...} # soup = BeautifulSoup(req.conte

天蓬老师 · Answer

When writing a crawler, you must use xpath to confirm whether there is data in the source code of the web page. If not, it means it is loaded asynchronously

1. Enter this link in the browser to view the source code, ctrl+f to find the location of imgid

view-source:https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=%E6%9A%B4%E8%B5%B0%E6%BC%AB%E7%94%BB&pn=0

2. Discover

The following picture list was not found. We can determine that the pictures are loaded by js

3. Find

Looking at the network in F12 (you can only see it after refreshing), I did not find the image information loaded by the asynchronous request, so I guessed that the data should be in the html, but it was placed in js and processed when loading the image

The same way to view the source code as above, search for the parameter objURL and find the real url

//很多，集中在html下半部分
http://img3.duitang.com/uploads/item/201608/06/20160806110540_MAcru.jpeg

Solution

The rest is up to you~ Find a way to parse the real url below!