How to implement Python to crawl website code examples that require login

黄舟
Release: 2017-08-20 10:26:40
Original
3490 people have browsed it

This article mainly introduces Python to implement crawling of websites that need to be logged in. It combines a complete example to analyze the Python login website and data capture related operation skills. Friends in need can refer to the following

Examples of this article Python implementation method for crawling websites that require login. Share it with everyone for your reference, the details are as follows:


import requests
from lxml import html
# 创建 session 对象。这个对象会保存所有的登录会话请求。
session_requests = requests.session()
# 提取在登录时所使用的 csrf 标记
login_url = "https://bitbucket.org/account/signin/?next=/"
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='csrfmiddlewaretoken']/@value")))[0]
payload = {
  "username": "<你的用户名>",
  "password": "<你的密码>",
  "csrfmiddlewaretoken": authenticity_token # 在源代码中,有一个名为 “csrfmiddlewaretoken” 的隐藏输入标签。
}
# 执行登录
result = session_requests.post(
  login_url,
  data = payload,
  headers = dict(referer=login_url)
)
# 已经登录成功了,然后从 bitbucket dashboard 页面上爬取内容。
url = &#39;https://bitbucket.org/dashboard/overview&#39;
result = session_requests.get(
  url,
  headers = dict(referer = url)
)
# 测试爬取的内容
tree = html.fromstring(result.content)
bucket_elems = tree.findall(".//span[@class=&#39;repo-name&#39;]/")
bucket_names = [bucket.text_content.replace("n", "").strip() for bucket in bucket_elems]
print(bucket_names)
Copy after login

The above is the detailed content of How to implement Python to crawl website code examples that require login. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template