Community

Learn

Tools Library

AI Tools

Leisure

English

Home > Backend Development > Python Tutorial > Python method to extract hyperlinks from web pages

Python method to extract hyperlinks from web pages

高洛峰

Release： 2017-02-22 16:52:18

Original

3328 people have browsed it

Many people initially learn Python and plan to use it for crawler development. Since you want to do a crawler, you must first crawl the web page and extract the hyperlink address from the web page. This article will share with you a simple method, which you can refer to if necessary.

The following is the simplest implementation method. First, capture the target web page, and then obtain the hyperlink by regularly matching the href attribute in the a tag.

The code is as follows:

import urllib2
import re
 
url = &#39;http://www.sunbloger.com/&#39;
 
req = urllib2.Request(url)
con = urllib2.urlopen(req)
doc = con.read()
con.close()
 
links = re.findall(r&#39;href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"&#39;, doc)
for a in links:
  print a

Copy after login

For more related articles on how Python extracts hyperlinks from web pages, please pay attention to the PHP Chinese website!

Related labels：

python 网页超链接

Previous article：Example of list initialization method in Python Next article：python method to solve Chinese garbled characters when Fedora decompresses zip

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn