python - 怎麼看網址做的是什麼反爬蟲
ringa_lee
ringa_lee 2017-06-12 09:27:51
0
4
1311

網址:https://www.nvshens.com/g/22377/,網站直接遊覽器打開然後,點擊圖片右鍵是可以下載的,然後我爬蟲直接請求下來的圖片就已經被屏蔽了,然後我改了headers跟設定了ip代理,還是沒用。但抓包來看也不是動態載入的資料呀! ! !求解答= =

#
ringa_lee
ringa_lee

ringa_lee

全部回覆(4)
过去多啦不再A梦

妹子挺漂亮的哈。
右鍵確實能打開,但是刷新一下就成盜鏈圖片了。一般防盜鏈,伺服器端是會檢查請求頭裡面的Referer字段,這就是為什麼刷新後就不是原圖的原因(刷新後Referer變了)。

img_url = "https://t1.onvshen.com:85/gallery/21501/22377/s/003.jpg"
r = requests.get(img_url, headers={'Referer':"https://www.nvshens.com/g/22377/"}).content
with open("00.jpg",'wb') as f:
    f.write(r)
学霸

取得圖片時抓包看漏什麼參數沒。

我想大声告诉你

光顧著看網站內容,差點忘了正式了。
你可以把你要求的資訊全部按照

然後在試試

女神的闺蜜爱上我

Referer 照這網站的設計應該是各別的頁面會比較符合假裝是人的行為,而並不是用單一的Referer
以下是完整能跑的代碼,抓18頁所有的圖片

# Putting all together
def url_guess_src_large (u):
    return ("https://www.nvshens.com/img.html?img=" +  '/'.join(u.split('/s/')))
# 下载函数
def get_img_using_requests(url, fn ):
    import shutil
    headers ['Referer'] = url_guess_src_large(url) #"https://www.nvshens.com/g/22377/" 
    print (headers)
    response = requests.get(url, headers = headers, stream=True)
    with open(fn, 'wb') as out_file:
        shutil.copyfileobj(response.raw, out_file)
    del response

import requests
# 用xpath擷取內容
from lxml import etree
url_ = 'https://www.nvshens.com/g/22377/{p}.html'  
headers = {
    "Connection" : "close",  # one way to cover tracks
    "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2900.1 Iron Safari/537.36}"
}

for i in range(1,18+1):
    url = url_.format(p=i)
    r = requests.get(url, headers=headers)
    html = requests.get(url,headers=headers).content.decode('utf-8')
    selector = etree.HTML(html)
    xpaths = '//*[@id="hgallery"]/img/@src'
    content = [x for x in selector.xpath(item)]
    urls_2get = [url_guess_src_large(x) for x in content]
    filenames = [os.path.split(x)[0].split('/gallery/')[1].replace("/","_") + "_" + os.path.split(x)[1] for x in urls_2get]
    for i, x in enumerate(content):
        get_img_using_requests (content[i], filenames[i])
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板