Truncating Text with Embedded HTML
When dealing with text containing HTML tags, it's essential to ensure proper handling during truncation to avoid breaking tags or displaying invalid content. Here's how you can truncate text while maintaining the integrity of HTML:
PHP Implementation:
The following PHP function uses regular expressions to parse HTML and maintains a stack of open tags:
function printTruncated($maxLength, $html, $isUtf8 = true) { ... }
This function scans the HTML input, identifying tags and character entities. It ensures that tags are closed properly and counts character entities as single characters. This approach ensures that truncation occurs at a valid point without breaking any HTML structure.
Example Usage:
printTruncated(10, '<b><Hello&gt;</b> <img src="world.png" alt="" /> world!'); // Outputs: 'Hello<b></b> <img src="world.png" alt="" />'
Python Implementation:
HTML parsing libraries like BeautifulSoup can assist with this task in Python:
from bs4 import BeautifulSoup, NavigableString def truncate_html(text, max_length): soup = BeautifulSoup(text, 'lxml') truncated = soup.new_tag("div") tail = soup.new_string('') node_len = 0 for node in soup.children: if isinstance(node, NavigableString): node_len += len(node) if node_len <= max_length: truncated.append(node) else: tail.append(node.string[:max_length - node_len]) break else: node_len += len(str(node)) truncated.append(node) return str(truncated) + str(tail)
Example Usage:
print(truncate_html('<b><Hello&gt;</b> <img src="world.png" alt="" /> world!', 10)) # Outputs: 'Hello<b></b> <img src="world.png" alt="" />'
Conclusion:
By parsing and handling HTML tags during truncation, these methods ensure that the resulting text maintains its intended structure and content validity.
The above is the detailed content of How to Truncate Text with Embedded HTML Without Breaking Tags?. For more information, please follow other related articles on the PHP Chinese website!