Many people initially learn Python and plan to use it for crawler development. Since you want to do a crawler, you must first crawl the web page and extract the hyperlink address from the web page. This article will share with you a simple method, which you can refer to if necessary.
The following is the simplest implementation method. First, capture the target web page, and then obtain the hyperlink by regularly matching the href attribute in the a tag.
The code is as follows:
import urllib2 import re url = 'http://www.sunbloger.com/' req = urllib2.Request(url) con = urllib2.urlopen(req) doc = con.read() con.close() links = re.findall(r'href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"', doc) for a in links: print a
For more related articles on how Python extracts hyperlinks from web pages, please pay attention to the PHP Chinese website!