Images du robot - S'il vous plaît dites-moi : problème d'encodage du robot python, version 3.6, win10 64 bits ?

Question

Voici le message d'erreur : {Code...} De nombreux endroits ont été modifiés. La raison principale peut être que le site Web cible est codé en gb2312. Ce programme peut télécharger des images normalement sur d'autres sites Web. , il y aura des problèmes. S'il vous plaît dites-moi, s'il vous plaît, où est le problème ? J'ai essayé plusieurs méthodes mais rien n'a fonctionné. Le code source est le suivant : {code...

天蓬老师 · Answer

# coding: utf-8

import urllib
import requests
from pyquery import PyQuery as Q
import os

base_url = 'http://www.shop2255.com/'


url_all =['http://www.shop2255.com/showpro/2603.html']


for url in url_all:
    _, file_name = os.path.split(url)
    dir_name, _ = os.path.splitext(file_name)

    if not os.path.exists(dir_name):
        os.mkdir(dir_name)

    r = requests.get(url)
    for _ in Q(r.text).find('img'):
        src = Q(_).attr('src')
        image_url = src if src.startswith('http') else os.path.join(base_url, src)
        _, image_name = os.path.split(image_url)

        image_path = os.path.join(dir_name, image_name)
        urllib.urlretrieve(image_url, image_path)

漂亮男人 · Answer

Tout d'abord, dans votre code local=r'D:%s%s.jpg' % (filename,imgurl.splite("/")[-1]) >split s'écrit splite local=r'D:%s%s.jpg' % (filename,imgurl.splite("/")[-1])中split写成了splite.

还有 urllib.request.urlretrieve(imgurl,local)这个imgurl不是一个合法的
url,只是一个相对 url, 要改成绝对 url,需要加上 base_url = 'http://www.shop2255.com/'.

Il existe également urllib.request.urlretrieve(imgurl,local)Ce imgurl n'est pas une URL légale
, juste une URL relative strong>, pour passer en url absolue, vous devez ajouter base_url = 'http://www.shop2255.com/'

Il semble également y avoir un problème avec le chemin du fichier généré.#🎜🎜#
# -*- coding: utf-8 -*- import urllib.request from urllib.request import urlopen, urlretrieve import urllib import urllib.parse import re import os from bs4 import BeautifulSoup base_url = 'http://www.shop2255.com/' url_all =[ 'http://www.shop2255.com/showpro/2603.html', 'http://www.shop2255.com/showpro/1558.html', 'http://www.shop2255.com/showpro/1564.html', 'http://www.shop2255.com/showpro/2411.html', 'http://www.shop2255.com/showpro/2409.html', 'http://www.shop2255.com/showpro/1561.html', 'http://www.shop2255.com/showpro/2414.html', 'http://www.shop2255.com/showpro/2609.html', 'http://www.shop2255.com/showpro/2413.html', 'http://www.shop2255.com/showpro/2604.html', 'http://www.shop2255.com/showpro/2605.html', 'http://www.shop2255.com/showpro/2606.html', 'http://www.shop2255.com/showpro/2608.html', 'http://www.shop2255.com/showpro/2607.html', 'http://www.shop2255.com/showpro/2610.html'] def getHtml(url): response = urlopen(url) # print(response.read()) html = response.read().decode("gbk") print(html) return html def getImg(html): reg = 'src="(.+?\.jpg)"' imgre = re.compile(reg) imglist = re.findall(imgre, html) return imglist for i in range(len(url_all)): html = getHtml(url_all[i]) # 注意: 我这里没有你那个错误,我只需要改这个就行了 # list = getImg(html.decode()) list = getImg(html) # print(list) x = 0 for imgurl in list: print(x) file_path = url_all[i] (filepath, tempfilename) = os.path.split(file_path) (filename, extension) = os.path.splitext(tempfilename) if not os.path.exists('d:\%s' % filename): os.mkdir('d:\%s' % filename) # os.mkdir('D:\%s' % filename2) local = r'D:\%s\%s.jpg' % (filename, imgurl.split("/")[-1]) try: urllib.request.urlretrieve(base_url + imgurl, local) except: print("can't retrieve the" + base_url + imgurl) x += 1 print("done")