PHP HTML Parsing: Extracting Text Between Headings
In PHP, parsing HTML code can be accomplished through various methods. When dealing with HTML stored in a variable, it's advisable to avoid regular expressions for reliability purposes.
Using PHP Document Object Model (DOM)
The PHP DOM provides a structured approach to parsing HTML:
$str = '<h1T1</h1>Lorem ipsum.<h1T2</h1>The quick red fox...<h1T3</h1>... jumps over the lazy brown FROG'; $DOM = new DOMDocument; $DOM->loadHTML($str); $items = $DOM->getElementsByTagName('h1'); for ($i = 0; $i < $items->length; $i++) echo $items->item($i)->nodeValue . "<br/>";
This will output:
T1 T2 T3
Alternative Approach: Regular Expression
If the desired output is the text between headings, a regular expression can be utilized:
$str = '<h1T1</h1>Lorem ipsum.<h1T2</h1>The quick red fox...<h1T3</h1>... jumps over the lazy brown FROG'; echo preg_replace("#<h1.*?>.*?</h1>#", "", $str);
This expression removes all HTML tags and retrieves the text:
Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG
The above is the detailed content of How to Extract Text Between HTML Headings in PHP?. For more information, please follow other related articles on the PHP Chinese website!