Parsing HTML with PHP's HTML DOMDocument
Question:
Utilizing the DOMDocument object, capture text within specific HTML elements. For example, extracting "Capture this text 1" and "Capture this text 2" from the following HTML:
<div class="main"> <div class="text"> Capture this text 1 </div> </div> <div class="main"> <div class="text"> Capture this text 2 </div> </div>
Answer:
Using DOMDocument::getElementsByTagName to retrieve all tags with a specific name may prove inefficient for this task. Instead, consider employing an XPath query on the document, leveraging the DOMXPath class.
Implementation:
Load HTML into a DOMDocument Object:
<code class="php">$html = <<<HTML <div class="main"> <div class="text"> Capture this text 1 </div> </div> <div class="main"> <div class="text"> Capture this text 2 </div> </div> HTML; $dom = new DOMDocument(); $dom->loadHTML($html);</code>
Instantiate DOMXPath Object:
<code class="php">$xpath = new DOMXPath($dom);</code>
Execute XPath Query:
<code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');</code>
Retrieve Text Values:
<code class="php">foreach ($tags as $tag) { var_dump(trim($tag->nodeValue)); }</code>
This approach effectively extracts "Capture this text 1" and "Capture this text 2" from the provided HTML.
The above is the detailed content of How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!