python - 爬虫中的图片该怎么处理?
PHPz
PHPz 2017-04-17 17:53:05
0
6
448

如题,比如爬取新闻类,该新闻中含图片,图片该怎么处理,如果有多张图片呢

类似

     [文字]  
     [图片]  
     [文字]

或者

     [文字]  
     [图片]  
     [文字]
     [图片]
     [文字]

需要下载图片到本地?还是直接用该网站的图片源,如果要下载到本地,文字内容上又该怎么处理。


多谢各位的回答,其实我想问得一点是怎么将图片保持在原位置,比如scrapy中可以使用

p.xpath('p/text()').extract()

得到文字内容

p.xpath('p/img/@src').extract()

定位图片,那么怎么保证图片的位置和原来的位置一样呢

PHPz
PHPz

学习是最好的投资!

reply all(6)
左手右手慢动作

If there is no need to save or collect (for example, you are afraid that the website will be closed or the original image will become invalid), you can directly use the image source of the website. There are no problems in terms of space, management, or copyright. Of course, the difficulty of doing this is also relatively low.

黄舟

If you can externally link, do so, but be careful to prevent hotlinking. The safest way is to download it locally

左手右手慢动作

You can use Bs4 to select the corresponding node, xpath can also be used, and you can extract anything you want

迷茫

Download to local, then replace src in the web page with the local relative directory

左手右手慢动作

News? Portal sites basically have anti-leeching protection

It is better to download the fake Referer to the local first, and then replace the image address in the original text with the local address

巴扎黑

http://blog.csdn.net/qq_34844199/article/details/51468841 After reading this, everything is clear

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template