This article brings you an introduction to the method of simulating Weibo login in Python (with code). It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you. Helps.
Today I want to make a tool for crawling personal pages on Weibo to satisfy some ulterior secrets. Then do that must-do thing first! Simulated login...
I optimized the code, refactored it into Python 3.6 version, and added a lot of comments to facilitate everyone's learning.
When logging in to Sina Weibo on PC, the user name and password are pre-encrypted using js on the client, and a set of parameters will be GET before POST, which will also be used as part of POST_DATA. In this way, you cannot use the usual simple method to simulate POST login (such as Renren).
1. Before submitting the POST request, you need to obtain two parameters through GET.
The address is:
http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.3.18)
In the obtained data There are servertime and nonce values, which are random, and other values seem to be of no use.
def get_servertime(): url = 'http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=dW5kZWZpbmVk&client=ssologin.js(v1.3.18)&_=1329806375939' # 返回出来的是一个Response对象,无法直接获取,text后,可以通过正则匹配到 # 大概长这样子的:sinaSSOController.preloginCallBack({"retcode":0,"servertime":1545606770, ...}) data = requests.request('GET', url).text p = re.compile('\((.*)\)') try: json_data = p.search(data).group(1) data = json.loads(json_data) servertime = str(data['servertime']) nonce = data['nonce'] return servertime, nonce except: print('获取 severtime 失败!') return None
2. Observe the POST data through httpfox. The parameters are more complex, where "su" is the encrypted username, and sp is the encrypted password. servertime and nonce are obtained from the previous step. Other parameters are unchanged.
username has been calculated by BASE64:
username = base64.encodestring( urllib.quote(username) )[:-1]
password has been SHA1 encrypted three times, and the values of servertime and nonce have been added to interfere.
That is: After SHA1 encryption twice, add the servertime and nonce values to the result, and then SHA1 is calculated again.
def get_pwd(pwd, servertime, nonce): # 第一次计算,注意Python3 的加密需要encode,使用bytes pwd1 = hashlib.sha1(pwd.encode()).hexdigest() # 使用pwd1的结果在计算第二次 pwd2 = hashlib.sha1(pwd1.encode()).hexdigest() # 使用第二次的结果再加上之前计算好的servertime和nonce值,hash一次 pwd3_ = pwd2 + servertime + nonce pwd3 = hashlib.sha1(pwd3_.encode()).hexdigest() return pwd3 def get_user(username): # 将@符号转换成url中能够识别的字符 _username = urllib.request.quote(username) # Python3中的base64计算也是要字节 # base64出来后,最后有一个换行符,所以用了切片去了最后一个字符 username = base64.encodebytes(_username.encode())[:-1] return username
3. Organize the parameters and make a POST request. There has been no successful login since then.
The content obtained after POST contains the sentence:
location.replace("http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3")
This is the result when the login fails. The result after the login is successful is similar, but the value of retcode is 0.
Next, request this URL again, and you will successfully log in to Weibo.
Remember to build the cache in advance.
The above is the detailed content of Introduction to the method of Python simulating Weibo login (with code). For more information, please follow other related articles on the PHP Chinese website!