How to set request headers for python crawler

爱喝马黛茶的安东尼
Release: 2019-06-20 14:30:38
Original
2827 people have browsed it

When requesting web crawling, words such as "Sorry, Unable to Access" will appear in the output text information. This means that crawling is prohibited. This problem needs to be solved through the anti-crawling mechanism.

Headers are one of the ways to solve the problem of anti-crawling of requests. It is equivalent to entering the server itself of this web page and pretending that it is crawling data.

For anti-crawler web pages, you can set some header information to simulate a browser accessing the website.

How to set request headers for python crawler

headers

Google or Firefox browser, click on the web page: right click – Inspect; click More Tools – Development or tool; you can also directly F12. Then press Fn F5 to refresh the web page to display the elements

Some browsers click: right click->View elements, refresh

Related recommendations: "Python Video Tutorial

How to set request headers for python crawler

Note: There are many contents in headers, the main ones commonly used are user-agent and host. They are displayed in the form of key pairs. If user-agent is If the dictionary key pair form is used as the content of headers, the reverse crawling can be successful and no other key pairs are needed; otherwise, more key pair forms under headers need to be added.

Settings

import urllib2
import urllib
values={"username":"xxxx","password":"xxxxx"}
data=urllib.urlencode(values)
url= "https://ssl.gstatic.com/gb/images/v2_730ffe61.png"
user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"
referer='http://www.google.com/'
headers={"User-Agent":user_agent,'Referer':referer}
request=urllib2.Request(url,data,headers)
response=urllib2.urlopen(request)
print response.read()
Copy after login

The above is the detailed content of How to set request headers for python crawler. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!