This article mainly introduces relevant information about common commands for python to access and crawl web pages. Friends who need it can refer to
Common commands for python to access and crawl web pages
Simple crawling of web pages:
import urllib.request url="http://google.cn/" response=urllib.request.urlopen(url) #返回文件对象 page=response.read()
Save the URL directly as a local file:
import urllib.request url="http://google.cn/" response=urllib.request.urlopen(url) #返回文件对象 page=response.read()
POST method:
import urllib.parse import urllib.request url="http://liuxin-blog.appspot.com/messageboard/add" values={"content":"命令行发出网页请求测试"} data=urllib.parse.urlencode(values) #创建请求对象 req=urllib.request.Request(url,data) #获得服务器返回的数据 response=urllib.request.urlopen(req) #处理数据 page=response.read()
GET method:
import urllib.parse import urllib.request url="http://www.google.cn/webhp" values={"rls":"ig"} data=urllib.parse.urlencode(values) theurl=url+"?"+data #创建请求对象 req=urllib.request.Request(theurl) #获得服务器返回的数据 response=urllib.request.urlopen(req) #处理数据 page=response.read()
There are two commonly used methods, geturl(), info()
geturl() is set to Identify whether there is a server-side URL redirection, and info() contains a series of information.
To handle Chinese problems, encode() encoding and dencode() decoding will be used:
The above is the detailed content of Detailed explanation of examples of common commands used to access and crawl web pages in Python. For more information, please follow other related articles on the PHP Chinese website!