Crawler | Batch download of HD wallpapers (source code + tools included)-Python Tutorial-php.cn

Crawler | Batch download of HD wallpapers (source code + tools included)

Release： 2023-08-10 15:46:01

forward

1574 people have browsed it

#Unsplash is a free high-quality photo website. They are all real photography photos. The photo resolution is also very large. It is very good for designer friends. The material is also very useful for some illustration copywriting friends, and it also works well as wallpaper. The corresponding function code has been encapsulated into an exe tool. I hope it will be helpful to you. The code tool acquisition method is attached at the end of the article.

1. Import module

1.1 Import module

##Code:

Crawler | Batch download of HD wallpapers (source code + tools included)

#Let’s take a look at the manual download process first. Note that you do not right-click the image to save as. The image obtained by right-clicking the save method is compressed at a certain ratio, and the clarity will be reduced a lot. Take Nature as an example, click Download free and select the download path. The image size is 1.43M.

##Next,

analyze specific web pages

First of all, we observed that there is a page number selection option at the bottom of the web page. We tried to pull down the web page slider and found that the

pictures were dynamically loaded

. That is to say, when we pull down the web page, subsequent pictures will be displayed one after another.

After several operations, I found that when the page is pulled down, the web page will issue the following requests, click on one of them, You can see the total number of pictures

: 10000, the total number of pages: 500

Let’s take a look at a few URLs:

The above links are only page parameters are different, and they are increasing in sequence, which is relatively friendly. Just traverse them in sequence when requesting.

The page number problem has been solved. Next, analyze the link of each picture:

Crawler | Batch download of HD wallpapers (source code + tools included)

We see that the result list length is exactly 20, With the same per_page value in the request, there is no doubt that the link to each image we are looking for is here.

Analyzing web pages is often time-consuming, but overall it goes smoothly. Now we officially crawl the images.

#2. Crawl images

##2.1 Import module

import time
import random
import json
import requests
from fake_useragent import UserAgent

Copy after login

##time: Timing
random: Generate random numbers
json: Process json format data
requests：Web page requests
fake_useragent：代理

2.2 获取图片

模拟代理，以网页的身份访问服务器，避免请求被服务器判定为机器爬虫而不响应请求。

ua = UserAgent(verify_ssl=False)
headers = {&#39;User-Agent&#39;: ua.random}

Copy after login

根据响应，获取所有图片链接：

def getpicurls(i,headers):
    picurls = []
    url = &#39;https://unsplash.com/napi/search/photos?query=nature&per_page=20&page={}&xp=feedback-loop-v2%3Aexperiment&#39;.format(i)
    r = requests.get(url, headers=headers, timeout=5)
    time.sleep(random.uniform(3.1, 4.5))
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    allinfo = json.loads(r.text)
    results = allinfo[&#39;results&#39;]
    for result in results:
        href = result[&#39;urls&#39;][&#39;full&#39;]
        picurls.append(href)
    return picurls

Copy after login

2.3 保存图片

保存图片文件：

def getpic(count,url):
    r = requests.get(url, headers=headers, timeout=5)
    with open(&#39;pictures/{}.jpg&#39;.format(count), &#39;wb&#39;) as f:
        f.write(r.content)

Copy after login

效果：

3. EXE爬取

exe工具运行结果：

Note:

Try not to crawl frequently to avoid affecting the network order!

The picture is a high-definition picture from the external network. The crawling speed depends on the network and is generally not too fast.

You can build a proxy pool to crawl faster.

The above is the detailed content of Crawler | Batch download of HD wallpapers (source code + tools included). For more information, please follow other related articles on the PHP Chinese website!