A small project to automatically log in to the Taobao Alliance to capture data. I have seen similar code written in Python on Github before, so I chose to write it in Python. The first time I used Python to formally write a program, I was still impressed by its "simpleness" "I was shocked. Of course, when I used it, I was still troubled by its (version 2.7) encoding, migration environment and other issues. Fortunately, they were all solved later.
Getting back to the subject, the first thing to solve when capturing Taobao Alliance data is the login problem. In the past, we usually encountered the trouble of verification codes. Now that we support QR code scanning to log in, it is simpler. , the following is the Python code for logging in, mainly to get the QR code to print, and then constantly check the scanning status. If it expires, re-request the QR code (mainly depends on the logic, because some common methods are encapsulated, so there is no guarantee that they can be executed directly )
def getQRCode(enableCmdQR): payload = {'_ksTS': str(time.time()), 'from': 'alimama'} qrCodeObj = utils.fetchAPI('https://qrlogin.taobao.com/qrcodelogin/generateQRCode4Login.do', payload, "json", None, True, True) print(qrCodeObj) utils.printQRCode('http:' + qrCodeObj['url'], enableCmdQR) lgToken = qrCodeObj['lgToken'] return lgToken def login(enableCmdQR=False): lgToken = getQRCode(enableCmdQR) code = 0 successLoginURL = "" while code != 10006: payload = {'lgToken': lgToken, 'defaulturl': 'http%3A%2F%2Flogin.taobao.com%2Fmember%2Ftaobaoke%2Flogin.htm%3Fis_login%3D1&_ksTS=' + str( time.time())} rObj = utils.fetchAPI('https://qrlogin.taobao.com/qrcodelogin/qrcodeLoginCheck.do', payload, "json", True, False) code = int(rObj['code']) if 10000 == code: # print("请扫描二维码登录") continue elif 10001 == code: print("已扫描二维码,请在确认登录") elif 10004 == code: print("已过期请重新扫描") login() elif 10006 == code: successLoginURL = rObj["url"] print("登录成功,正在跳转") else: print("未知错误,退出执行") sys.exit(0) time.sleep(5) print "登录成功跳转:" + successLoginURL r = utils.fetchAPI(successLoginURL, None, "raw", True, False, True) utils.fetchAPI(r.headers['Location'], None, "raw", True, True, False)
To solve the login problem, the next step is to solve the problem of saving the state. Python’s Requests library is very powerful. If it is simple, you can use request.session directly. Perform session operations, but since many operations in the project are asynchronous, it is necessary to solve the storage and reading of cookies, and use pickel to serialize and deserialize objects. The saved cookie is updated incrementally by default
def save_cookies(cookies, overWrite=False): try: currentCookie = requests.utils.dict_from_cookiejar(cookies) if len(currentCookie) < 1: return oldCookie = requests.utils.dict_from_cookiejar(load_cookies()) with open(config.COOKIE_FILE, 'w') as f: if not overWrite: cookieDict = dict(oldCookie, **currentCookie) else: cookieDict = requests.utils.dict_from_cookiejar(cookies) pickle.dump(cookieDict, f) print 'Saved cookie' print cookieDict f.close() except: print 'Save cookies failed', sys.exc_info()[0] sys.exit(99) def load_cookies(): try: with open(config.COOKIE_FILE, 'r') as f: cookies = requests.utils.cookiejar_from_dict(pickle.load(f)) f.close() except: cookies = [] return cookies
After encapsulation, load the cookie and save the cookie when requests.Session is requested
s = requests.Session() # 统一请求API def fetchAPI(url, params=None, resultFormat="text", isNeedCookie=True, allowRedirects=True, saveCookie=False, method='GET'): try: cookies = load_cookies() if 'POST' == method: response = s.post(url, data=params, headers=config.Headers, cookies=cookies) else: response = s.get(url, params=params, headers=config.Headers, cookies=cookies, allow_redirects=allowRedirects) if "json" == resultFormat: result = response.json() elif "raw" == resultFormat: result = response else: result = response.text # if saveCookie: # print 'save cookie:' + str(response.cookies) save_cookies(response.cookies) return result except Exception, e: print e return False
After completing these two steps, basically follow-up requests can be made directly using the unified API request method, and the effect is also very good. Screenshot of the running effect:
Of course, there is still an unresolved problem: how to automatically reapply after the session expires (not sure whether Taoding supports it). Since Taobao uses unified login and is an independent service, it can be accessed through the browser. Automatic refresh or continuous updating of cookies during the request process have not obtained the server-side update ticket. I wonder if anyone has any ideas on this issue.
For more python implementation of QR code scanning to automatically log in to Taobao related articles, please pay attention to the PHP Chinese website!