Truncating HTML Text without Breaking Tags
When truncating text containing HTML, it is crucial to ensure that tags are properly handled to avoid breaking the layout and content flow.
The Problem:
In traditional methods, tags are included in the truncated text, resulting in incomplete or broken tags. This can disrupt formatting, create confusing content, and potentially trigger Tidy cleanup issues.
The Solution:
To address this problem, it is necessary to parse the HTML and keep track of open tags. By closing open tags before truncating the text, one can ensure tag integrity.
PHP Implementation:
The following PHP code demonstrates how to truncate HTML text while preserving tag structure:
function printTruncated($maxLength, $html, $isUtf8=true) { // Initialization $printedLength = 0; $position = 0; $tags = array(); // Regex pattern for matching HTML tags and entities $re = $isUtf8 ? '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}' : '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}'; // Iterate through the HTML while ($printedLength < $maxLength && preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position)) { // Extract tag and tag position list($tag, $tagPosition) = $match[0]; // Print text leading up to the tag $str = substr($html, $position, $tagPosition - $position); $printedLength += strlen($str); // Handle the tag if ($tag[0] == '&' || ord($tag) >= 0x80) { // Pass entity or UTF-8 sequence unchanged print($tag); $printedLength++; } else { if ($tag[1] == '/') { // Closing tag assert(array_pop($tags) == $match[1][0]); // Check for nested tags print($tag); } else if ($tag[strlen($tag) - 2] == '/') { // Self-closing tag print($tag); } else { // Opening tag print($tag); $tags[] = $match[1][0]; } } // Continue after the tag $position = $tagPosition + strlen($tag); } // Print any remaining text if ($position < strlen($html)) print(substr($html, $position, $maxLength - $printedLength)); // Close open tags while (!empty($tags)) printf('</%s>', array_pop($tags)); }
Usage:
printTruncated(10, '<b>&lt;Hello&gt;</b> <img src="world.png" alt="" /> world!'); print("\n"); printTruncated(10, '<table><tr><td>Heck, </td><td>throw</td></tr><tr><td>in a</td><td>table</td></tr></table>'); print("\n"); printTruncated(10, "<em><b>Hello</b>&#20;w\xC3\xB8rld!</em>"); print("\n");
The above is the detailed content of How to Truncate HTML Text Without Breaking Tags?. For more information, please follow other related articles on the PHP Chinese website!