如何使用 BeautifulSoup 從 HTML 中提取 href 屬性？-Python教學-PHP中文網

如何使用 BeautifulSoup 從 HTML 中提取 href 屬性？

Linda Hamilton

發布： 2024-10-29 11:51:02

原創

785 人瀏覽過

How can I extract href attributes from HTML using BeautifulSoup?

使用 BeautifulSoup 從 HTML 提取 Href

在網頁抓取中，從 HTML 中提取特定資訊是一項常見任務。其中一個資訊可以是錨標記 () 的 href 屬性。 BeautifulSoup 是一個廣泛使用的 Python 函式庫，提供了各種方法來導覽 HTML 和檢索所需元素。

考慮這樣一種情況，我們需要從包含多個標籤的 HTML 中提取 href，包括和 ;標籤。使用 BeautifulSoup，我們可以使用 find_all 方法來尋找所有具有 href 屬性的標籤：

<code class="python">from bs4 import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print("Found the URL:", a['href'])</code>

登入後複製

find_all 方法有兩個參數：要搜尋的標籤名稱和要過濾的可選屬性字典。在本例中，我們使用 href 屬性搜尋「a」標籤，然後列印每個符合標籤的 href 屬性值。

對於舊版的 BeautifulSoup，方法名稱為「findAll」而非 'find_all'。

請注意，如果我們想要提取所有具有href 屬性的標籤，無論其名稱如何，我們可以省略標籤名稱參數：

<code class="python">href_tags = soup.find_all(href=True)</code>

登入後複製

這將返回HTML 中帶有href屬性的所有標籤的清單。

以上是如何使用 BeautifulSoup 從 HTML 中提取 href 屬性？的詳細內容。更多資訊請關注PHP中文網其他相關文章！