How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?-PHP Tutorial-php.cn

How do I extract specific text from HTML using PHP's DOMDocument and XPath?

Parse HTML with PHP's DOMDocument

To extract specific text elements from HTML using PHP's DOMDocument, leveraging XPath queries can be more effective than relying solely on DOMDocument::getElementsByTagName. XPath queries allow for precise selection based on specific criteria within the document structure.

Capturing Text from Nested DIVs

The example HTML provided contains nested

tags, where the target text is located within

elements with class "text", which are in turn nested within

elements with class "main".

To capture the target text, an XPath query can be employed:

<code class="php">$xpath->query('//div[@class="main"]/div[@class="text"]');</code>

Copy after login

This query selects all

elements that have a class attribute set to "text" and are descendants of

elements with a class attribute set to "main". The result is a list of the matching elements.

Iterating and Extracting Node Values

To access the actual text content, each matching element can be iterated over and its nodeValue property accessed:

<code class="php">foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}</code>

Copy after login

The trim() function is used to remove any leading or trailing whitespace from the extracted text.

Execution Output

Executing the code will output the following:

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

Copy after login

The above is the detailed content of How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!

Previous article：How to Calculate the Total Price of Items in a User\'s Cart with Eloquent? Next article：Cookies vs. Sessions: Which Is Best for Managing Application State?

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn