How to use Python to implement simulated login to Zhihu-Python Tutorial-php.cn

Environment and Development Tools

When capturing packets, I initially used Network in the Chrome development tools, but failed to capture them. Later, I used Fiddler to successfully capture the data. The above process will be detailed step by step below.

Before simulating Zhihu login, first take a look at the environment and tools used in this case:

Windows 7 Python 2.75
Chrome Fiddler: Used to monitor the communication between the client and the server and find the location of relevant parameters.

Overview of the simulation process

Use Google browser combined with Fiddler to monitor the communication process between the client and the server;
According to the monitoring results, construct the parameters passed in the process of requesting the server;
Use Python to simulate the parameter passing process.

Several key points in the communication process between the client and the server:

The url address when logging in.
There are two main ways to obtain the parameters [params] submitted when logging in: The first is to find the form tags and attributes by analyzing the page source code;. Adapt to relatively simple pages. Second, use a packet capture tool to view the submitted URL and parameters. Usually, Network, Fiddler, etc. in Chrome's developer tools are used.
The URL to jump to after logging in.

Parameter exploration

First look at this login page, which is the url address when we log in.

How to use Python to implement simulated login to Zhihu

Seeing this page, we can also roughly guess that several fields are passed when requesting the server. Obviously: user name, password, verification code and "Remember I" these values. So which ones actually are there? Let’s analyze it below.

First check the HTML source code. You can use CTRL U to view it in Google, and then use CTRL F to enter input to see what field values there are. The details are as follows:

How to use Python to implement simulated login to Zhihu

When requesting the server, the source code indicates that there is also a hidden field "_xsrf". The question now is what name the parameters are passed through, so other tools need to be used to capture the data packets for analysis. Here, I use Fiddler, which can work on Windows systems. Of course, you can use other tools as well.

Due to the large amount of information obtained from packet capture, it becomes more difficult to find the required information, and the packet capture process becomes more cumbersome. Regarding fiddler, it is very easy to use. If you have no experience, you can search it on Baidu. In order to prevent other information from interfering, we first clear the records in fiddler, and then enter the user name (the author uses an email to log in), password and other information to log in. The corresponding results in fiddler will be as follows:

How to use Python to implement simulated login to Zhihu

Note: If you use a mobile phone to log in, the corresponding url in fiddler is "/login/phone_num".

In order to view the detailed request parameters, we left-click "/login/email" and you can see the following information:

How to use Python to implement simulated login to Zhihu

The request method is POST. The requested url is https://www.zhihu.com/login/email. As can be seen from From Data, the corresponding field names are as follows:

_xsrf
captcha
email
password
remember

For these five fields, email in the code , password and captcha are all entered manually, and remember is initialized to true. You can obtain the value named _xsrf in the input tag based on the source file of the login page, thereby obtaining the remaining _xsrf.

How to use Python to implement simulated login to Zhihu

For the verification code, an additional request is required. The link can be seen by viewing the source code at a fixed point:

How to use Python to implement simulated login to Zhihu

# #The link is

https://www.zhihu.com/captcha.gif?type=login, ts is omitted here (can be omitted after testing). It is now possible to simulate login using code.

Warm reminder: If you use a mobile phone number to log in, the requested url is

https://www.zhihu.com/login/phone_num, and the email field name will become "phone_num".

Simulation source code

In the process of writing code to implement Zhihu login, the author encapsulated some functions into a simple class WSpider for reuse. The file name is WSpider.py.

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 02 14:01:17 2016
@author: liudiwei
"""
import urllib
import urllib2
import cookielib
import logging  

class WSpider(object):
    def __init__(self):
        #init params
        self.url_path = None
        self.post_data = None
        self.header = None
        self.domain = None
        self.operate = None

        #init cookie
        self.cookiejar = cookielib.LWPCookieJar()
        self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cookiejar))
        urllib2.install_opener(self.opener)

    def setRequestData(self, url_path=None, post_data=None, header=None):
        self.url_path = url_path
        self.post_data = post_data
        self.header = header

    def getHtmlText(self, is_cookie=False):
        if self.post_data == None and self.header == None:
            request = urllib2.Request(self.url_path)
        else:
            request = urllib2.Request(self.url_path, urllib.urlencode(self.post_data), self.header)
        response = urllib2.urlopen(request)
        if is_cookie: 
            self.operate = self.opener.open(request)
        resText = response.read()
        return resText

    """
    Save captcha to local    
    """    
    def saveCaptcha(self, captcha_url, outpath, save_mode=&#39;wb&#39;):
        picture = self.opener.open(captcha_url).read() #用openr访问验证码地址,获取cookie
        local = open(outpath, save_mode)
        local.write(picture)
        local.close()    

    def getHtml(self, url):
        page = urllib.urlopen(url)
        html = page.read()
        return html


    """
    功能：将文本内容输出至本地
    @params
        content：文本内容
        out_path: 输出路径
    """
    def output(self, content, out_path, save_mode="w"):
        fw = open(out_path, save_mode)
        fw.write(content)
        fw.close()
        
    """#EXAMPLE
    logger = createLogger(&#39;mylogger&#39;, &#39;temp/logger.log&#39;)
    logger.debug(&#39;logger debug message&#39;)  
    logger.info(&#39;logger info message&#39;)  
    logger.warning(&#39;logger warning message&#39;)  
    logger.error(&#39;logger error message&#39;)  
    logger.critical(&#39;logger critical message&#39;)  
    """    
    def createLogger(self, logger_name, log_file):
        # 创建一个logger
        logger = logging.getLogger(logger_name)  
        logger.setLevel(logging.INFO)  

        # 创建一个handler，用于写入日志文件    
        fh = logging.FileHandler(log_file)  

        # 再创建一个handler，用于输出到控制台    
        ch = logging.StreamHandler()  
        # 定义handler的输出格式formatter    

        formatter = logging.Formatter(&#39;%(asctime)s | %(name)s | %(levelname)s | %(message)s&#39;)  
        fh.setFormatter(formatter)  
        ch.setFormatter(formatter)  
        # 给logger添加handler    

        logger.addHandler(fh)  
        logger.addHandler(ch)  
        return logger

Copy after login

The source code for simulated login to Zhihu is saved in the zhiHuLogin.py file. The content is as follows:

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 02 17:07:17 2016
@author: liudiwei

"""
import urllib
from WSpider import WSpider
from bs4 import BeautifulSoup as BS
import getpass
import json
import WLogger as WLog
"""
2016.11.03 由于验证码问题暂时无法正常登陆
2016.11.04 成功登录，期间出现下列问题
验证码错误返回：{ "r": 1, "errcode": 1991829, "data": {"captcha":"验证码错误"}, "msg": "验证码错误" }
验证码过期：{ "r": 1, "errcode": 1991829, "data": {"captcha":"验证码回话无效 :(","name":"ERR_VERIFY_CAPTCHA_SESSION_INVALID"}, "msg": "验证码回话无效 :(" }
登录：{"r":0, "msg": "登录成功"}
"""
def zhiHuLogin():
    spy = WSpider()
    logger = spy.createLogger(&#39;mylogger&#39;, &#39;temp/logger.log&#39;)
    homepage = r"https://www.zhihu.com/"    
    html = spy.opener.open(homepage).read()
    soup = BS(html, "html.parser")
    _xsrf = soup.find("input", {&#39;type&#39;:&#39;hidden&#39;}).get("value")

    #根据email和手机登陆得到的参数名不一样，email登陆传递的参数是‘email&#39;，手机登陆传递的是‘phone_num&#39;
    username = raw_input("Please input username: ")
    password = getpass.getpass("Please input your password: ")
    account_name = None
    if "@" in username:
        account_name = &#39;email&#39;
    else:
        account_name = &#39;phone_num&#39; 

    #保存验证码
    logger.info("save captcha to local machine.")
    captchaURL = r"https://www.zhihu.com/captcha.gif?type=login" #验证码url
    spy.saveCaptcha(captcha_url=captchaURL, outpath="temp/captcha.jpg") #temp目录需手动创建

    #请求的参数列表
    post_data = {
        &#39;_xsrf&#39;: _xsrf,
        account_name: username,
        &#39;password&#39;: password,
        &#39;remember_me&#39;: &#39;true&#39;,
        &#39;captcha&#39;:raw_input("Please input captcha: ")

    }

    #请求的头内容
    header ={
        &#39;Accept&#39;:&#39;*/*&#39; ,
        &#39;Content-Type&#39;:&#39;application/x-www-form-urlencoded; charset=UTF-8&#39;,
        &#39;X-Requested-With&#39;:&#39;XMLHttpRequest&#39;,
        &#39;Referer&#39;:&#39;https://www.zhihu.com/&#39;,
        &#39;Accept-Language&#39;:&#39;en-GB,en;q=0.8,zh-CN;q=0.6,zh;q=0.4&#39;,
        &#39;Accept-Encoding&#39;:&#39;gzip, deflate, br&#39;,
        &#39;User-Agent&#39;:&#39;Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36&#39;,
        &#39;Host&#39;:&#39;www.zhihu.com&#39;
    }

    url = r"https://www.zhihu.com/login/" + account_name
    spy.setRequestData(url, post_data, header)
    resText = spy.getHtmlText()
    jsonText = json.loads(resText)

    if jsonText["r"] == 0:
        logger.info("Login success!")
    else:
        logger.error("Login Failed!")
        logger.error("Error info ---> " + jsonText["msg"])

    text = spy.opener.open(homepage).read() #重新打开主页，查看源码可知此时已经处于登录状态
    spy.output(text, "out/home.html") #out目录需手动创建

if __name__ == &#39;__main__&#39;:
    zhiHuLogin()

Copy after login

For source code analysis, you can refer to the annotations in the code.

Run results

Run python zhiHuLogin.py in the console, and then enter the corresponding content as prompted. Finally, you can get the following different results (three examples are given):

Results One: Incorrect password

How to use Python to implement simulated login to Zhihu

Result two: Incorrect verification code

How to use Python to implement simulated login to Zhihu

Result three: Successful login

How to use Python to implement simulated login to Zhihu

The above is the detailed content of How to use Python to implement simulated login to Zhihu. For more information, please follow other related articles on the PHP Chinese website!