Python method to extract hyperlinks from web pages

高洛峰
Release: 2017-02-22 16:52:18
Original
3147 people have browsed it

Many people initially learn Python and plan to use it for crawler development. Since you want to do a crawler, you must first crawl the web page and extract the hyperlink address from the web page. This article will share with you a simple method, which you can refer to if necessary.

The following is the simplest implementation method. First, capture the target web page, and then obtain the hyperlink by regularly matching the href attribute in the a tag.

The code is as follows:

import urllib2
import re
 
url = 'http://www.sunbloger.com/'
 
req = urllib2.Request(url)
con = urllib2.urlopen(req)
doc = con.read()
con.close()
 
links = re.findall(r'href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"', doc)
for a in links:
  print a
Copy after login


For more related articles on how Python extracts hyperlinks from web pages, please pay attention to the PHP Chinese website!


Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!