After get off work yesterday, I suddenly had the idea to write a crawler to capture things on the web page. I spent an hour briefly learning the basic syntax of python, and then wrote a crawler by referring to examples on the Internet. (Recommended learning: Python video tutorial)
Climb down the python data and save it locally, usually in a file or database, but the file form is simpler than that. If you just do it yourself When writing a crawler, you can save data in file form.
#coding=utf-8 import urllib.request import re import os ''' Urllib 模块提供了读取web页面数据的接口,我们可以像读取本地文件一样读取www和ftp上的数据 urlopen 方法用来打开一个url read方法 用于读取Url上的数据 ''' def getHtml(url): page = urllib.request.urlopen(url); html = page.read(); return html; def getImg(html): imglist = re.findall('img src="(http.*?)"',html) return imglist html = getHtml("https://www.zhihu.com/question/34378366").decode("utf-8"); imagesUrl = getImg(html); if os.path.exists("D:/imags") == False: os.mkdir("D:/imags"); count = 0; for url in imagesUrl: print(url) if(url.find('.') != -1): name = url[url.find('.',len(url) - 5):]; bytes = urllib.request.urlopen(url); f = open("D:/imags/"+str(count)+name, 'wb'); f.write(bytes.read()); f.flush(); f.close(); count+=1;
After testing, the basic functions can still be achieved. The most time spent is on regular matching, because I am not very familiar with regular expressions. So it still took some time.
Note: The above program is based on python 3.5. There are some differences between python3 and python2. When I first started looking at basic grammar, I fell into some pitfalls.
The above is the detailed content of Where is the Python data crawled and saved?. For more information, please follow other related articles on the PHP Chinese website!