This article will share with you a small tool made using python that can brush web page traffic. The detailed code is attached to you. Friends in need can refer to it
Preparation
Required environment:
Python3
Start
First implement a simple version, directly The above code:
import urllib.request import urllib.error #创建get方法 def get(url): code=urllib.request.urlopen(url).code return code if __name__ == '__main__': #设置一些基本属性 url = "http://shua.jb51.net" user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36" headers = {'User-Agent':user_agent} req = urllib.request.Request(url, headers=headers) #记录次数 i = 1 while 1: code = get(url) print('访问:'+str(code)) i = i+1
is simple and crude. It only brushes the pv and the IP remains unchanged. It is easy to be discovered by search engines. Let’s improve it next.
Add proxy function
Add the following code to the get method:
random_proxy = random.choice(proxies) proxy_support = urllib.request.ProxyHandler({"http":random_proxy}) opener = urllib.request.build_opener(proxy_support) urllib.request.install_opener(opener)
Modify the main method:
if __name__ == '__main__': url = "http://shua.jb51.net" #添加代理列表,可以自行去百度获取 proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"] user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36" headers = {'User-Agent':user_agent} req = urllib.request.Request(url, headers=headers) i = 1 while 1: #添加参数 code = get(url,proxies) print('第'+str(i)+'次代理访问:'+str(code)) i = i+1
This is almost the same, but there is a bug. If the page cannot be opened or the proxy If it fails, the program will automatically end. Next, we add the exception handling function
Exception handling
Define the mail method to send email reminders
def mail(txt): _user = "你的账号" _pwd = "你的密码" _to = "收件账号" msg = MIMEText(txt, 'plain', 'utf-8') #标题 msg["Subject"] = "代理失效!" msg["From"] = _user msg["To"] = _to try: #这里我用的qq邮箱 s = smtplib.SMTP_SSL("smtp.qq.com", 465) s.login(_user, _pwd) s.sendmail(_user, _to, msg.as_string()) s.quit() print("Success!") except smtplib.SMTPException as e: print("Falied,%s" % e)
Then we modify the main method:
if __name__ == '__main__': url = "http://shua.jb51.net" proxies = ["124.88.67.22:80","124.88.67.82:80","124.88.67.81:80","124.88.67.31:80","124.88.67.19:80","58.23.16.240:80"] user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36" headers = {'User-Agent':user_agent} req = urllib.request.Request(url, headers=headers) i = 1 while 1: try: code = get(url,proxies) print('第'+str(i)+'次代理访问:'+str(code)) i = i+1 except urllib.error.HTTPError as e: print(e.code) #添加mail方法 mail(e.code) except urllib.error.URLError as err: print(err.reason) #添加mail方法 mail(err.reason)
Done!
Conclusion
The code is only 50 lines long, and the program can be improved:
For example: automatically obtain the agent list, add an interface, and expand it Multi-threading, etc.
Finally, I will share with you the work of another friend
import urllib2 import timeit import thread import time i = 0 mylock = thread.allocate_lock() def test(no,r): global i url = 'http://blog.csdn.net' for j in range(1,r): req=urllib2.Request(url) req.add_header("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)") file = urllib2.urlopen(req) print file.getcode(); mylock.acquire() i+=1 mylock.release() print i; thread.exit_thread() def fast(): thread.start_new_thread(test,(1,50)) thread.start_new_thread(test,(2,50)) fast() time.sleep(15)
After testing, a 503 error will occur on the server if there are more than two threads, so 2 threads are just right
The above is the detailed content of Python code example for making a web page traffic tool. For more information, please follow other related articles on the PHP Chinese website!