如题,比如爬取新闻类,该新闻中含图片,图片该怎么处理,如果有多张图片呢
类似
[文字]
[图片]
[文字]
或者
[文字]
[图片]
[文字]
[图片]
[文字]
需要下载图片到本地?还是直接用该网站的图片源,如果要下载到本地,文字内容上又该怎么处理。
多谢各位的回答,其实我想问得一点是怎么将图片保持在原位置,比如scrapy中可以使用
p.xpath('p/text()').extract()
得到文字内容
p.xpath('p/img/@src').extract()
定位图片,那么怎么保证图片的位置和原来的位置一样呢
If there is no need to save or collect (for example, you are afraid that the website will be closed or the original image will become invalid), you can directly use the image source of the website. There are no problems in terms of space, management, or copyright. Of course, the difficulty of doing this is also relatively low.
If you can externally link, do so, but be careful to prevent hotlinking. The safest way is to download it locally
You can use Bs4 to select the corresponding node, xpath can also be used, and you can extract anything you want
Download to local, then replace src in the web page with the local relative directory
News? Portal sites basically have anti-leeching protection
It is better to download the fake Referer to the local first, and then replace the image address in the original text with the local address
http://blog.csdn.net/qq_34844199/article/details/51468841 After reading this, everything is clear