How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?-PHP Tutorial-php.cn

How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?

Barbara Streisand

Release： 2024-11-02 08:48:29

Original

558 people have browsed it

How to Efficiently Extract Text from Specific HTML Elements Using PHP's DOMDocument and XPath?

Parsing HTML with PHP's HTML DOMDocument

Question:

Utilizing the DOMDocument object, capture text within specific HTML elements. For example, extracting "Capture this text 1" and "Capture this text 2" from the following HTML:

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

Copy after login

Answer:

Using DOMDocument::getElementsByTagName to retrieve all tags with a specific name may prove inefficient for this task. Instead, consider employing an XPath query on the document, leveraging the DOMXPath class.

Implementation:

Load HTML into a DOMDocument Object:

<code class="php">$html = <<<HTML
<div class="main">
 <div class="text">
 Capture this text 1
 </div>
</div>

<div class="main">
 <div class="text">
 Capture this text 2
 </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);</code>

Copy after login

Instantiate DOMXPath Object:

<code class="php">$xpath = new DOMXPath($dom);</code>

Copy after login

Execute XPath Query:

<code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');</code>

Copy after login

Retrieve Text Values:

<code class="php">foreach ($tags as $tag) {
 var_dump(trim($tag->nodeValue));
}</code>

Copy after login

This approach effectively extracts "Capture this text 1" and "Capture this text 2" from the provided HTML.

The above is the detailed content of How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!