How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?

Barbara Streisand
Release: 2024-11-02 08:48:29
Original
439 people have browsed it

How to Efficiently Extract Text from Specific HTML Elements Using PHP's DOMDocument and XPath?

Parsing HTML with PHP's HTML DOMDocument

Question:

Utilizing the DOMDocument object, capture text within specific HTML elements. For example, extracting "Capture this text 1" and "Capture this text 2" from the following HTML:

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
Copy after login

Answer:

Using DOMDocument::getElementsByTagName to retrieve all tags with a specific name may prove inefficient for this task. Instead, consider employing an XPath query on the document, leveraging the DOMXPath class.

Implementation:

  1. Load HTML into a DOMDocument Object:

    <code class="php">$html = <<<HTML
    <div class="main">
     <div class="text">
     Capture this text 1
     </div>
    </div>
    
    <div class="main">
     <div class="text">
     Capture this text 2
     </div>
    </div>
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);</code>
    Copy after login
  2. Instantiate DOMXPath Object:

    <code class="php">$xpath = new DOMXPath($dom);</code>
    Copy after login
  3. Execute XPath Query:

    <code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');</code>
    Copy after login
  4. Retrieve Text Values:

    <code class="php">foreach ($tags as $tag) {
     var_dump(trim($tag->nodeValue));
    }</code>
    Copy after login

This approach effectively extracts "Capture this text 1" and "Capture this text 2" from the provided HTML.

The above is the detailed content of How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!