Home > Backend Development > Python Tutorial > How to extract \'href\' attributes using BeautifulSoup in Python?

How to extract \'href\' attributes using BeautifulSoup in Python?

DDD
Release: 2024-10-28 21:42:02
Original
322 people have browsed it

How to extract

Extracting HREF Attribute with BeautifulSoup

In this scenario, you want to extract the "some_url" href attribute from the following HTML content:

<code class="html"><a href="some_url">next</a>
<span class="class">...</span></code>
Copy after login

Utilizing BeautifulSoup's find_all() Method

To retrieve this specific attribute, employ the find_all() method as follows:

<code class="python">from bs4 import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print("Found the URL:", a['href'])</code>
Copy after login

Python 2 to Python 3 Compatibility

Note that this code works for both Python 2 and Python 3. However, in older versions of BeautifulSoup (prior to version 4), the find_all() method was named findAll.

Retrieving All Tags with HREF Attributes

If you desire to retrieve all tags that possess an href attribute, regardless of their tag name, simply omit the tag name parameter:

<code class="python">href_tags = soup.find_all(href=True)</code>
Copy after login

The above is the detailed content of How to extract \'href\' attributes using BeautifulSoup in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template