Automatic acquisition and expiration of cookies through web crawlers (detailed tutorial)-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Automatic acquisition and expiration of cookies through web crawlers (detailed tutorial)

亚连

Jun 01, 2018 am 10:02 AM

cookie ie automatic

This article mainly introduces the implementation method of automatic acquisition of cookies and automatic update of expired cookies by web crawlers. Friends in need can refer to the following

This article implements automatic acquisition of cookies and automatic update of cookies when expired.

A lot of information on social networking sites requires logging in to get it. Take Weibo as an example. Without logging in, you can only see the top ten Weibo posts of big Vs. To stay logged in, cookies are required. Take logging in to www.weibo.cn as an example:

Enter in chrome: http://login.weibo.cn/login/

Analysis Control When the header request from the station is returned, you will see several sets of cookies returned by weibo.cn.

Implementation steps:

1, use selenium to automatically log in to obtain cookies, save them to a file;

2, read cookie, compare the validity period of the cookie, and if it expires, perform step 1 again;

3, when requesting other web pages, fill in the cookie to maintain the login status.

1, Get cookies online

Use selenium PhantomJS to simulate browser login and obtain cookies;

There are usually multiple cookies, and the cookies are stored one by one with the .weibo suffix. document.

def get_cookie_from_network():
 from selenium import webdriver
 url_login = &#39;http://login.weibo.cn/login/&#39; 
 driver = webdriver.PhantomJS()
 driver.get(url_login)
 driver.find_element_by_xpath(&#39;//input[@type="text"]&#39;).send_keys(&#39;your_weibo_accout&#39;) # 改成你的微博账号
 driver.find_element_by_xpath(&#39;//input[@type="password"]&#39;).send_keys(&#39;your_weibo_password&#39;) # 改成你的微博密码
 driver.find_element_by_xpath(&#39;//input[@type="submit"]&#39;).click() # 点击登录
 # 获得 cookie信息
 cookie_list = driver.get_cookies()
 print cookie_list
 cookie_dict = {}
 for cookie in cookie_list:
  #写入文件
  f = open(cookie[&#39;name&#39;]+&#39;.weibo&#39;,&#39;w&#39;)
  pickle.dump(cookie, f)
  f.close()
  if cookie.has_key(&#39;name&#39;) and cookie.has_key(&#39;value&#39;):
   cookie_dict[cookie[&#39;name&#39;]] = cookie[&#39;value&#39;]
 return cookie_dict

Copy after login

2, get cookies from files

Traverse files ending with .weibo, that is, cookie files, from the current directory. Use pickle to unpack it into a dict, compare the expiry value with the current time, and return empty if it expires;

def get_cookie_from_cache():
 cookie_dict = {}
 for parent, dirnames, filenames in os.walk(&#39;./&#39;):
  for filename in filenames:
   if filename.endswith(&#39;.weibo&#39;):
    print filename
    with open(self.dir_temp + filename, &#39;r&#39;) as f:
     d = pickle.load(f)
     if d.has_key(&#39;name&#39;) and d.has_key(&#39;value&#39;) and d.has_key(&#39;expiry&#39;):
      expiry_date = int(d[&#39;expiry&#39;])
      if expiry_date > (int)(time.time()):
       cookie_dict[d[&#39;name&#39;]] = d[&#39;value&#39;]
      else:
       return {}
 return cookie_dict

Copy after login

3, if the cached cookie expires, obtain the cookie from the network again

def get_cookie():
 cookie_dict = get_cookie_from_cache()
 if not cookie_dict:
  cookie_dict = get_cookie_from_network()
 return cookie_dict

Copy after login

4, Use cookies to request other Weibo homepages

def get_weibo_list(self, user_id):
 import requests
 from bs4 import BeautifulSoup as bs
 cookdic = get_cookie()
 url = &#39;http://weibo.cn/stocknews88&#39; 
 headers = {&#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36&#39;}
 timeout = 5
 r = requests.get(url, headers=headers, cookies=cookdic,timeout=timeout)
 soup = bs(r.text, &#39;lxml&#39;)
 ...
 # 用BeautifulSoup 解析网页
 ...

Copy after login

The above is what I compiled for everyone. I hope it will be helpful to everyone in the future.

How to use v-for in vue to traverse a two-dimensional array

Data of v-for in Vue Grouping instance

vue2.0 computed instance of calculating the accumulated value after list loop

The above is the detailed content of Automatic acquisition and expiration of cookies through web crawlers (detailed tutorial). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7612

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Where are cookies stored? Dec 20, 2023 pm 03:07 PM

Cookies are usually stored in the cookie folder of the browser. Cookie files in the browser are usually stored in binary or SQLite format. If you open the cookie file directly, you may see some garbled or unreadable content, so it is best to use Use the cookie management interface provided by your browser to view and manage cookies.

Where are the cookies on your computer? Dec 22, 2023 pm 03:46 PM

Cookies on your computer are stored in specific locations on your browser, depending on the browser and operating system used: 1. Google Chrome, stored in C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data\Default \Cookies etc.

Where are the mobile cookies? Dec 22, 2023 pm 03:40 PM

Cookies on the mobile phone are stored in the browser application of the mobile device: 1. On iOS devices, Cookies are stored in Settings -> Safari -> Advanced -> Website Data of the Safari browser; 2. On Android devices, Cookies Stored in Settings -> Site settings -> Cookies of Chrome browser, etc.

Linux Tips: Cancel automatic indentation when pasting in vim Mar 07, 2024 am 08:30 AM

Preface: vim is a powerful text editing tool, which is very popular on Linux. Recently, I encountered a strange problem when using vim on another server: when I copied and pasted a locally written script into a blank file on the server, automatic indentation occurred. To use a simple example, the script I wrote locally is as follows: aaabbbcccddd. When I copy the above content and paste it into a blank file on the server, what I get is: aabbbcccddd. Obviously, this is what vim does automatically for us. Format indentation. However, this automatic is a bit unintelligent. Record the solution here. Solution: Set the .vimrc configuration file in our home directory, new

Automount drives on Linux Mar 20, 2024 am 11:30 AM

If you are using a Linux operating system and want the system to automatically mount the drive on boot, you can do this by adding the device's unique identifier (UID) and mount point path to the fstab configuration file. fstab is a file system table file located in the /etc directory. It contains information about the file systems that need to be mounted when the system starts. By editing the fstab file, you can ensure that the required drives are loaded correctly every time the system starts, thus ensuring stable system operation. Automatically mounting drivers can be conveniently used in a variety of situations. For example, I plan to back up my system to an external storage device. To achieve automation, ensure that the device remains connected to the system, even at startup. Likewise, many applications will directly

What should I do if win11 cannot use ie11 browser? (win11 cannot use IE browser) Feb 10, 2024 am 10:30 AM

More and more users are starting to upgrade the win11 system. Since each user has different usage habits, many users are still using the ie11 browser. So what should I do if the win11 system cannot use the ie browser? Does windows11 still support ie11? Let’s take a look at the solution. Solution to the problem that win11 cannot use the ie11 browser 1. First, right-click the start menu and select "Command Prompt (Administrator)" to open it. 2. After opening, directly enter "Netshwinsockreset" and press Enter to confirm. 3. After confirmation, enter "netshadvfirewallreset&rdqu

How cookies work Sep 20, 2023 pm 05:57 PM

The working principle of cookies involves the server sending cookies, the browser storing cookies, and the browser processing and storing cookies. Detailed introduction: 1. The server sends a cookie, and the server sends an HTTP response header containing the cookie to the browser. This cookie contains some information, such as the user's identity authentication, preferences, or shopping cart contents. After the browser receives this cookie, it will be stored on the user's computer; 2. The browser stores cookies, etc.

Detailed explanation of where browser cookies are stored Jan 19, 2024 am 09:15 AM

With the popularity of the Internet, we use browsers to surf the Internet have become a way of life. In the daily use of browsers, we often encounter situations where we need to enter account passwords, such as online shopping, social networking, emails, etc. This information needs to be recorded by the browser so that it does not need to be entered again the next time you visit. This is when cookies come in handy. What are cookies? Cookie refers to a small data file sent by the server to the user's browser and stored locally. It contains user behavior of some websites.

See all articles