Where is the Python data crawled and saved?

(*-*)浩
Release: 2019-10-30 14:03:30
Original
4330 people have browsed it

Where is the Python data crawled and saved?

After get off work yesterday, I suddenly had the idea to write a crawler to capture things on the web page. I spent an hour briefly learning the basic syntax of python, and then wrote a crawler by referring to examples on the Internet. (Recommended learning: Python video tutorial)

Climb down the python data and save it locally, usually in a file or database, but the file form is simpler than that. If you just do it yourself When writing a crawler, you can save data in file form.

#coding=utf-8
import urllib.request
import re
import os
 
'''
Urllib 模块提供了读取web页面数据的接口,我们可以像读取本地文件一样读取www和ftp上的数据
urlopen 方法用来打开一个url
read方法 用于读取Url上的数据
'''
 
def getHtml(url):
    page = urllib.request.urlopen(url);
    html = page.read();
    return html;
 
def getImg(html):
    imglist = re.findall('img src="(http.*?)"',html)
    return imglist
 
html = getHtml("https://www.zhihu.com/question/34378366").decode("utf-8");
imagesUrl = getImg(html);
 
if os.path.exists("D:/imags") == False:
    os.mkdir("D:/imags");
    
count = 0;
for url in imagesUrl:
    print(url)
    if(url.find('.') != -1):
        name = url[url.find('.',len(url) - 5):];
        bytes = urllib.request.urlopen(url);
        f = open("D:/imags/"+str(count)+name, 'wb');
        f.write(bytes.read());
        f.flush();
        f.close();
        count+=1;
Copy after login

After testing, the basic functions can still be achieved. The most time spent is on regular matching, because I am not very familiar with regular expressions. So it still took some time.

Note: The above program is based on python 3.5. There are some differences between python3 and python2. When I first started looking at basic grammar, I fell into some pitfalls.

The above is the detailed content of Where is the Python data crawled and saved?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template