如何使用 PHP DOMDocument 和 DOMXpath 有效地從 HTML 中提取特定文字？-php教程-PHP中文網

如何使用 PHP DOMDocument 和 DOMXpath 有效地從 HTML 中提取特定文字？

Susan Sarandon

發布： 2024-10-31 01:18:29

原創

337 人瀏覽過

How can I efficiently extract specific text from HTML using PHP DOMDocument and DOMXpath?

使用PHP DOMDocument 解析HTML

與使用正規表示式相比，利用PHP 中的DOMDocument 類別提供了一個更有效率、類型更可靠的解析HTML 的方法。要從 HTML 文件中提取特定文本，DOMXpath 類別起著至關重要的作用。

範例：

考慮以下HTML 字串：

<code class="html"><div class="main">
    <div class="text">
        Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
        Capture this text 2
    </div>
</div></code>

登入後複製

我們的目標是擷取文字「Capture this text 1」和「Capture this text 2」。

XPath 查詢方法：

而不是依賴 DOMDocument ::getElementsByTagName，它檢索具有給定名稱的所有標籤，XPath 允許我們根據其結構定位特定元素。

<code class="php">$html = <<<HTML
<div class="main">
    <div class="text">
        Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
        Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);</code>

登入後複製

使用XPath，我們可以執行以下查詢：

<code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}</code>

登入後複製

此查詢擷取嵌套在類別「main」的div 標籤內的所有類別為「text」的div標籤。

輸出：

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

登入後複製

這展示了使用 PHP 的 DOMDocument 和 DOMXpath 進行準確的 HTML 解析和提取特定內容的有效性。

以上是如何使用 PHP DOMDocument 和 DOMXpath 有效地從 HTML 中提取特定文字？的詳細內容。更多資訊請關注PHP中文網其他相關文章！