This article mainly introduces the method of collecting proxy IP in Python and judging whether it is available and updating regularly. It has a certain reference value. Now I share it with you. Friends in need can refer to it.
There are online Many free IP addresses are available, but if it is too troublesome to obtain them manually, we can automatically capture them through Python and obtain them in batches.
The code is as follows:
# -*- coding: utf-8 -*- import re import urllib2 import json import os import time import socket class ProxyIp(object): def __init__(self): self.path = os.path.split(os.path.realpath(__file__))[0] # Get latest proxy ip and download to json def update_ip(self): print 'Update Ip' url = 'http://www.ip3366.net/free/' req = urllib2.Request(url) response = urllib2.urlopen(req) matches = re.findall( ur'(\d+.\d+.\d+.\d+)</td>\s+<td>(\d+)</td>\s+<td>.*?</td>\s+<td>(HTTPS?)</td>', response.read(), re.I ) ls = [] for match in matches: if self.is_open(match[0], match[1]): ls.append({'ip':match[0], 'port':match[1], 'protocol': match[2]}) with open('%s/ip.json' % self.path, 'w') as f: json.dump(ls, f) return ls # whether the ips is last or old. def is_last(self): m_time = int(os.path.getmtime('%s/ip.json' % self.path)) now_time = int(time.time()) return (now_time - m_time) > 60*60*4 # 4 hours @staticmethod def is_open(ip, port): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: s.connect(ip, int(port)) return True except: print 'Faild IP: %s:%s' % (ip, port) return False def get_proxy_ips(self): if not self.is_last(): return self.update_ip() else: with open('%s/ip.json' % self.path, 'r') as f: return json.load(f)
Related recommendations:
Python collection--data storage
python collection QQ screenshot file uploaded in the blog
The above is the detailed content of Python collects proxy IP and determines whether it is available and updates it regularly. For more information, please follow other related articles on the PHP Chinese website!