When capturing packets, I initially used Network in the Chrome development tools, but failed to capture them. Later, I used Fiddler to successfully capture the data. The above process will be detailed step by step below.
Before simulating Zhihu login, first take a look at the environment and tools used in this case:
Windows 7 Python 2.75
Chrome Fiddler: Used to monitor the communication between the client and the server and find the location of relevant parameters.
Use Google browser combined with Fiddler to monitor the communication process between the client and the server;
According to the monitoring results, construct the parameters passed in the process of requesting the server;
Use Python to simulate the parameter passing process.
Several key points in the communication process between the client and the server:
The url address when logging in.
There are two main ways to obtain the parameters [params] submitted when logging in: The first is to find the form tags and attributes by analyzing the page source code;. Adapt to relatively simple pages. Second, use a packet capture tool to view the submitted URL and parameters. Usually, Network, Fiddler, etc. in Chrome's developer tools are used.
The URL to jump to after logging in.
First look at this login page, which is the url address when we log in.
Seeing this page, we can also roughly guess that several fields are passed when requesting the server. Obviously: user name, password, verification code and "Remember I" these values. So which ones actually are there? Let’s analyze it below.
First check the HTML source code. You can use CTRL U to view it in Google, and then use CTRL F to enter input to see what field values there are. The details are as follows:
When requesting the server, the source code indicates that there is also a hidden field "_xsrf". The question now is what name the parameters are passed through, so other tools need to be used to capture the data packets for analysis. Here, I use Fiddler, which can work on Windows systems. Of course, you can use other tools as well.
Due to the large amount of information obtained from packet capture, it becomes more difficult to find the required information, and the packet capture process becomes more cumbersome. Regarding fiddler, it is very easy to use. If you have no experience, you can search it on Baidu. In order to prevent other information from interfering, we first clear the records in fiddler, and then enter the user name (the author uses an email to log in), password and other information to log in. The corresponding results in fiddler will be as follows:
Note: If you use a mobile phone to log in, the corresponding url in fiddler is "/login/phone_num".
In order to view the detailed request parameters, we left-click "/login/email" and you can see the following information:
The request method is POST. The requested url is https://www.zhihu.com/login/email
. As can be seen from From Data, the corresponding field names are as follows:
_xsrf
captcha
password
remember
For these five fields, email in the code , password and captcha are all entered manually, and remember is initialized to true. You can obtain the value named _xsrf in the input tag based on the source file of the login page, thereby obtaining the remaining _xsrf.
For the verification code, an additional request is required. The link can be seen by viewing the source code at a fixed point:
# #The link ishttps://www.zhihu.com/captcha.gif?type=login, ts is omitted here (can be omitted after testing). It is now possible to simulate login using code.
https://www.zhihu.com/login/phone_num, and the email field name will become "phone_num".
# -*- coding: utf-8 -*- """ Created on Thu Nov 02 14:01:17 2016 @author: liudiwei """ import urllib import urllib2 import cookielib import logging class WSpider(object): def __init__(self): #init params self.url_path = None self.post_data = None self.header = None self.domain = None self.operate = None #init cookie self.cookiejar = cookielib.LWPCookieJar() self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cookiejar)) urllib2.install_opener(self.opener) def setRequestData(self, url_path=None, post_data=None, header=None): self.url_path = url_path self.post_data = post_data self.header = header def getHtmlText(self, is_cookie=False): if self.post_data == None and self.header == None: request = urllib2.Request(self.url_path) else: request = urllib2.Request(self.url_path, urllib.urlencode(self.post_data), self.header) response = urllib2.urlopen(request) if is_cookie: self.operate = self.opener.open(request) resText = response.read() return resText """ Save captcha to local """ def saveCaptcha(self, captcha_url, outpath, save_mode='wb'): picture = self.opener.open(captcha_url).read() #用openr访问验证码地址,获取cookie local = open(outpath, save_mode) local.write(picture) local.close() def getHtml(self, url): page = urllib.urlopen(url) html = page.read() return html """ 功能:将文本内容输出至本地 @params content:文本内容 out_path: 输出路径 """ def output(self, content, out_path, save_mode="w"): fw = open(out_path, save_mode) fw.write(content) fw.close() """#EXAMPLE logger = createLogger('mylogger', 'temp/logger.log') logger.debug('logger debug message') logger.info('logger info message') logger.warning('logger warning message') logger.error('logger error message') logger.critical('logger critical message') """ def createLogger(self, logger_name, log_file): # 创建一个logger logger = logging.getLogger(logger_name) logger.setLevel(logging.INFO) # 创建一个handler,用于写入日志文件 fh = logging.FileHandler(log_file) # 再创建一个handler,用于输出到控制台 ch = logging.StreamHandler() # 定义handler的输出格式formatter formatter = logging.Formatter('%(asctime)s | %(name)s | %(levelname)s | %(message)s') fh.setFormatter(formatter) ch.setFormatter(formatter) # 给logger添加handler logger.addHandler(fh) logger.addHandler(ch) return logger
# -*- coding: utf-8 -*- """ Created on Thu Nov 02 17:07:17 2016 @author: liudiwei """ import urllib from WSpider import WSpider from bs4 import BeautifulSoup as BS import getpass import json import WLogger as WLog """ 2016.11.03 由于验证码问题暂时无法正常登陆 2016.11.04 成功登录,期间出现下列问题 验证码错误返回:{ "r": 1, "errcode": 1991829, "data": {"captcha":"验证码错误"}, "msg": "验证码错误" } 验证码过期:{ "r": 1, "errcode": 1991829, "data": {"captcha":"验证码回话无效 :(","name":"ERR_VERIFY_CAPTCHA_SESSION_INVALID"}, "msg": "验证码回话无效 :(" } 登录:{"r":0, "msg": "登录成功"} """ def zhiHuLogin(): spy = WSpider() logger = spy.createLogger('mylogger', 'temp/logger.log') homepage = r"https://www.zhihu.com/" html = spy.opener.open(homepage).read() soup = BS(html, "html.parser") _xsrf = soup.find("input", {'type':'hidden'}).get("value") #根据email和手机登陆得到的参数名不一样,email登陆传递的参数是‘email',手机登陆传递的是‘phone_num' username = raw_input("Please input username: ") password = getpass.getpass("Please input your password: ") account_name = None if "@" in username: account_name = 'email' else: account_name = 'phone_num' #保存验证码 logger.info("save captcha to local machine.") captchaURL = r"https://www.zhihu.com/captcha.gif?type=login" #验证码url spy.saveCaptcha(captcha_url=captchaURL, outpath="temp/captcha.jpg") #temp目录需手动创建 #请求的参数列表 post_data = { '_xsrf': _xsrf, account_name: username, 'password': password, 'remember_me': 'true', 'captcha':raw_input("Please input captcha: ") } #请求的头内容 header ={ 'Accept':'*/*' , 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8', 'X-Requested-With':'XMLHttpRequest', 'Referer':'https://www.zhihu.com/', 'Accept-Language':'en-GB,en;q=0.8,zh-CN;q=0.6,zh;q=0.4', 'Accept-Encoding':'gzip, deflate, br', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36', 'Host':'www.zhihu.com' } url = r"https://www.zhihu.com/login/" + account_name spy.setRequestData(url, post_data, header) resText = spy.getHtmlText() jsonText = json.loads(resText) if jsonText["r"] == 0: logger.info("Login success!") else: logger.error("Login Failed!") logger.error("Error info ---> " + jsonText["msg"]) text = spy.opener.open(homepage).read() #重新打开主页,查看源码可知此时已经处于登录状态 spy.output(text, "out/home.html") #out目录需手动创建 if __name__ == '__main__': zhiHuLogin()
Run python zhiHuLogin.py in the console, and then enter the corresponding content as prompted. Finally, you can get the following different results (three examples are given):
The above is the detailed content of How to use Python to implement simulated login to Zhihu. For more information, please follow other related articles on the PHP Chinese website!