Home > Backend Development > PHP Tutorial > How to Truncate Text with Embedded HTML Without Breaking Tags?

How to Truncate Text with Embedded HTML Without Breaking Tags?

Linda Hamilton
Release: 2024-11-10 04:37:02
Original
801 people have browsed it

How to Truncate Text with Embedded HTML Without Breaking Tags?

Truncating Text with Embedded HTML

When dealing with text containing HTML tags, it's essential to ensure proper handling during truncation to avoid breaking tags or displaying invalid content. Here's how you can truncate text while maintaining the integrity of HTML:

PHP Implementation:

The following PHP function uses regular expressions to parse HTML and maintains a stack of open tags:

function printTruncated($maxLength, $html, $isUtf8 = true) { ... }
Copy after login

This function scans the HTML input, identifying tags and character entities. It ensures that tags are closed properly and counts character entities as single characters. This approach ensures that truncation occurs at a valid point without breaking any HTML structure.

Example Usage:

printTruncated(10, '<b><Hello&amp;gt;</b> <img src="world.png" alt="" /> world!'); // Outputs: 'Hello<b></b> <img src="world.png" alt="" />'
Copy after login

Python Implementation:

HTML parsing libraries like BeautifulSoup can assist with this task in Python:

from bs4 import BeautifulSoup, NavigableString

def truncate_html(text, max_length):
    soup = BeautifulSoup(text, 'lxml')
    truncated = soup.new_tag("div")
    tail = soup.new_string('')

    node_len = 0
    for node in soup.children:
        if isinstance(node, NavigableString):
            node_len += len(node)
            if node_len <= max_length:
                truncated.append(node)
            else:
                tail.append(node.string[:max_length - node_len])
                break
        else:
            node_len += len(str(node))
            truncated.append(node)
        
    return str(truncated) + str(tail)
Copy after login

Example Usage:

print(truncate_html('<b><Hello&amp;gt;</b> <img src="world.png" alt="" /> world!', 10)) # Outputs: 'Hello<b></b> <img src="world.png" alt="" />'
Copy after login

Conclusion:

By parsing and handling HTML tags during truncation, these methods ensure that the resulting text maintains its intended structure and content validity.

The above is the detailed content of How to Truncate Text with Embedded HTML Without Breaking Tags?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template