Detailed explanation of examples of using Python to write crawlers using the Requests library

零下一度
Release: 2017-06-30 18:00:56
Original
2684 people have browsed it

Basic Get request:

#-*- coding:utf-8 -*-import requests
url = 'www.baidu.com'r = requests.get(url)print r.text
Copy after login

Get request with parameters:

#-*- coding:utf-8 -*-import requests
url = 'http://www.baidu.com'payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get(url, params=payload)print r.text
Copy after login

POST request to simulate login and some methods of returning objects:

#-*- coding:utf-8 -*-import requests
url1 = 'www.exanple.com/login'#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的地址data={"user":"user","password":"pass"}
headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
            }
res1 = requests.post(url1, data=data, headers=headers)
res2 = requests.get(url2, cookies=res1.cookies, headers=headers)print res2.conten
Copy after login

t#Get the binary response content print res2.raw#Get the original response content, stream=True is required print res2.raw.read(50)print type(res2.text)#Return the content decoded into unicode print res2.urlprint res2.history#Track redirection print res2. cookiesprint res2.cookies['example_cookie_name']print res2.headersprint res2.headers['Content-Type']print res2.headers.get('content-type')print res2.json#The returned content is encoded as jsonprint res2.encoding #Return content encoding print res2.status_code#Return http status code print res2.raise_for_status()#Return error status code

Use Session() object writing method (Prepared Requests):

#-*- coding:utf-8 -*-import requests
s = requests.Session()
url1 = 'www.exanple.com/login'#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的地址data={"user":"user","password":"pass"}
headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"http://www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
            }
prepped1 = requests.Request('POST', url1,
    data=data,
    headers=headers
).prepare()
s.send(prepped1)'''
也可以这样写
res = requests.Request('POST', url1,
data=data,
headers=headers
)
prepared = s.prepare_request(res)
# do something with prepped.body
# do something with prepped.headers
s.send(prepared)
'''prepare2 = requests.Request('POST', url2,
    headers=headers
).prepare()
res2 = s.send(prepare2)print res2.content
Copy after login

Another way of writing:

#-*- coding:utf-8 -*-import requestss = requests.Session()url1 = 'www.exanple.com/login'#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的页面地址data={"user":"user","password":"pass"}headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"http://www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"            }res1 = s.post(url1, data=data)res2 = s.post(url2)print(resp2.content)
SessionApi
其他的一些请求方式
>>> r = requests.put("http://httpbin.org/put")>>> r = requests.delete("http://httpbin.org/delete")>>> r = requests.head("http://httpbin.org/get")>>> r = requests.options("http://httpbin.org/get")
Copy after login

Problems encountered:

When executing under cmd, a small error was encountered:

UnicodeEncodeError:'gbk' codec can' t encode character u'\xbb' in position 23460: illegal multibyte sequence

Analysis:
1. Is Unicode encoding or decoding

UnicodeEncodeError

Obviously there was an error during encoding

2. What encoding was used

'gbk' codec can't encode character

Use GBK Encoding error

Solution:

Determine the current string. For example,

#-*- coding:utf-8 -*-import requests
url = 'www.baidu.com'r = requests.get(url)print r.encoding
>utf-8
Copy after login

has determined that the html string is utf-8, you can directly pass utf-8 Go code.

print r.text.encode('utf-8')


The above is the detailed content of Detailed explanation of examples of using Python to write crawlers using the Requests library. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template