How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?-Python Tutorial-php.cn

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Patricia Arquette

Release： 2024-12-08 00:12:11

Original

1113 people have browsed it

How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?

Retrieving Web Page Links with Python and BeautifulSoup

Question: How do I extract the hyperlinks from a webpage and obtain their URLs using Python?

Answer:

To efficiently extract the links and URL addresses from a webpage using Python and BeautifulSoup, you can utilize the SoupStrainer class. Here's a code snippet:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

Copy after login

This code first fetches the HTML content of a webpage (using the httplib2 library). Then, it employs BeautifulSoup to parse the HTML, filtering only for a tags using the SoupStrainer class for better efficiency. Finally, it iterates over the a tags and prints the href attribute of each, effectively extracting the link URLs.

Refer to the BeautifulSoup documentation for more detailed information on various parsing scenarios:

[BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

The above is the detailed content of How Can I Extract Hyperlinks and URLs from a Webpage Using Python and BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!