python - 爬虫中的图片该怎么处理?

Question

如题，比如爬取新闻类，该新闻中含图片,图片该怎么处理，如果有多张图片呢 类似 {代码...} 或者 {代码...} 需要下载图片到本地?还是直接用该网站的图片源,如果要下载到本地,文字内容上又该怎么处理。 多谢各位的...

ringa_lee · Answer

If there is no need to save or collect (for example, you are afraid that the website will be closed or the original image will become invalid), you can directly use the image source of the website. There are no problems in terms of space, management, or copyright. Of course, the difficulty of doing this is also relatively low.

黄舟 · Answer

If you can externally link, do so, but be careful to prevent hotlinking. The safest way is to download it locally

ringa_lee · Answer

You can use Bs4 to select the corresponding node, xpath can also be used, and you can extract anything you want

迷茫 · Answer

Download to local, then replace src in the web page with the local relative directory

ringa_lee · Answer

News? Portal sites basically have anti-leeching protection

It is better to download the fake Referer to the local first, and then replace the image address in the original text with the local address

巴扎黑 · Answer

http://blog.csdn.net/qq_34844199/article/details/51468841 After reading this, everything is clear