Detailed explanation of examples of using Python to write crawlers using the Requests library-Python Tutorial-php.cn

Detailed explanation of examples of using Python to write crawlers using the Requests library

零下一度

Release： 2017-06-30 18:00:56

Original

2788 people have browsed it

Basic Get request:

#-*- coding:utf-8 -*-import requests
url = &#39;www.baidu.com&#39;r = requests.get(url)print r.text

Copy after login

Get request with parameters:

#-*- coding:utf-8 -*-import requests
url = &#39;http://www.baidu.com&#39;payload = {&#39;key1&#39;: &#39;value1&#39;, &#39;key2&#39;: &#39;value2&#39;}
r = requests.get(url, params=payload)print r.text

Copy after login

POST request to simulate login and some methods of returning objects:

#-*- coding:utf-8 -*-import requests
url1 = &#39;www.exanple.com/login&#39;#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的地址data={"user":"user","password":"pass"}
headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
            }
res1 = requests.post(url1, data=data, headers=headers)
res2 = requests.get(url2, cookies=res1.cookies, headers=headers)print res2.conten

Copy after login

t#Get the binary response content print res2.raw#Get the original response content, stream=True is required print res2.raw.read(50)print type(res2.text)#Return the content decoded into unicode print res2.urlprint res2.history#Track redirection print res2. cookiesprint res2.cookies['example_cookie_name']print res2.headersprint res2.headers['Content-Type']print res2.headers.get('content-type')print res2.json#The returned content is encoded as jsonprint res2.encoding #Return content encoding print res2.status_code#Return http status code print res2.raise_for_status()#Return error status code

Use Session() object writing method (Prepared Requests):

#-*- coding:utf-8 -*-import requests
s = requests.Session()
url1 = &#39;www.exanple.com/login&#39;#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的地址data={"user":"user","password":"pass"}
headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"http://www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
            }
prepped1 = requests.Request(&#39;POST&#39;, url1,
    data=data,
    headers=headers
).prepare()
s.send(prepped1)&#39;&#39;&#39;
也可以这样写
res = requests.Request(&#39;POST&#39;, url1,
data=data,
headers=headers
)
prepared = s.prepare_request(res)
# do something with prepped.body
# do something with prepped.headers
s.send(prepared)
&#39;&#39;&#39;prepare2 = requests.Request(&#39;POST&#39;, url2,
    headers=headers
).prepare()
res2 = s.send(prepare2)print res2.content

Copy after login

Another way of writing:

#-*- coding:utf-8 -*-import requestss = requests.Session()url1 = &#39;www.exanple.com/login&#39;#登陆地址url2 = "www.example.com/main"#需要登陆才能访问的页面地址data={"user":"user","password":"pass"}headers = { "Accept":"text/html,application/xhtml+xml,application/xml;",            "Accept-Encoding":"gzip",            "Accept-Language":"zh-CN,zh;q=0.8",            "Referer":"http://www.example.com/",            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"            }res1 = s.post(url1, data=data)res2 = s.post(url2)print(resp2.content)
SessionApi
其他的一些请求方式
>>> r = requests.put("http://httpbin.org/put")>>> r = requests.delete("http://httpbin.org/delete")>>> r = requests.head("http://httpbin.org/get")>>> r = requests.options("http://httpbin.org/get")

Copy after login

Problems encountered:

When executing under cmd, a small error was encountered:

UnicodeEncodeError:'gbk' codec can' t encode character u'\xbb' in position 23460: illegal multibyte sequence

Analysis:
1. Is Unicode encoding or decoding

UnicodeEncodeError

Obviously there was an error during encoding

2. What encoding was used

'gbk' codec can't encode character

Use GBK Encoding error

Solution:

Determine the current string. For example,

#-*- coding:utf-8 -*-import requests
url = &#39;www.baidu.com&#39;r = requests.get(url)print r.encoding
>utf-8

Copy after login

has determined that the html string is utf-8, you can directly pass utf-8 Go code.

print r.text.encode('utf-8')

The above is the detailed content of Detailed explanation of examples of using Python to write crawlers using the Requests library. For more information, please follow other related articles on the PHP Chinese website!