Implementation of python crawler web page login-Python Tutorial-php.cn

Implementation of python crawler web page login

coldplay.xixi

Release： 2020-11-30 17:56:04

forward

6480 people have browsed it

# Python video tutorial Introduction to realize Python's reptile webpage login.

Implementation of python crawler web page login

Free recommendation: python video tutorial

I believe that when you write a python crawler, you will crawl the website. I encounter some login problems, such as entering a verification code when logging in, or encountering image dragging and other verifications when logging in. How to solve such problems? Generally there are two options.

Use cookies to log in

We can log in using cookies, first get the browser cookie, and then use the requests library to directly log in to the cookie. The server will think you are a real logged-in user, so just It will return you a logged-in status. This method is very useful. Basically, most websites that require verification codes to log in can be solved through cookie login.

 #! -*- encoding:utf-8 -*-
    import requests    import random    import requests.adapters    # 要访问的目标页面
    targetUrlList = [
        "https://httpbin.org/ip",
        "https://httpbin.org/headers",
        "https://httpbin.org/user-agent",
    ]

    # 代理服务器
    proxyHost = "t.16yun.cn"
    proxyPort = "31111"

    # 代理隧道验证信息
    proxyUser = "username"
    proxyPass = "password"

    proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host": proxyHost,
        "port": proxyPort,
        "user": proxyUser,
        "pass": proxyPass,
    }

    # 设置 http和https访问都是用HTTP代理
    proxies = {
        "http": proxyMeta,
        "https": proxyMeta,
    }

    # 访问三次网站，使用相同的Session(keep-alive)，均能够保持相同的外网IP
    s = requests.session()

    # 设置cookie
    cookie_dict = {"JSESSION":"123456789"}
    cookies = requests.utils.cookiejar_from_dict(cookie_dict, cookiejar=None, overwrite=True)
    s.cookies = cookies    for i in range(3):
        for url in targetUrlList:
            r = s.get(url, proxies=proxies)
            print r.text
若存在验证码，此时采用resp**e = requests_session.post(url=url_login, data=data)是不行的，做法应该如下：

resp**e_captcha = requests_session.get(url=url_login, cookies=cookies)resp**e1 = requests.get(url_login) # 未登陆resp**e2 = requests_session.get(url_login) # 已登陆，因为之前拿到了Resp**e Cookie！resp**e3 = requests_session.get(url_results) # 已登陆，因为之前拿到了Resp**e Cookie！

Copy after login

Simulated login
I have to say an old saying here, the ancestors planted trees, and the descendants enjoy the shade. At that time, I wanted to read the article of Zhihu Yanxuan, but I was stuck on the login. Unexpectedly, after searching, I found a library for simulating login, which is very good. Yes, but in line with the principle of not sharing good things to prevent harmony, I won’t talk about it here.
The specific idea is to simulate login through requests, then return the verification code, and then pass in the verification code to successfully log in.

The above is the detailed content of Implementation of python crawler web page login. For more information, please follow other related articles on the PHP Chinese website!