How Can I Remove All HTML Tags from a String Efficiently?-C++-php.cn

How Can I Remove All HTML Tags from a String Efficiently?

DDD

Release： 2025-01-05 11:53:41

Original

806 people have browsed it

How Can I Remove All HTML Tags from a String Efficiently?

Stripping HTML from Strings without Specifying Tags

Question:

How can I effortlessly eradicate all HTML tags from a string, regardless of the specific tags involved?

Example:

Consider the following HTML-rich string:

string title = "<b>Hulk Hogan's Celebrity Championship Wrestling     <font color=\"#228b22\">[Proj # 206010]</font></b>     (Reality Series,)

Copy after login

Ideally, we want the output to be:

"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"

Copy after login

Solution:

Regex Approach:

One efficient solution utilizes regular expressions:

public static string StripHTML(string input)
{
   return Regex.Replace(input, "<.*?>", String.Empty);
}

Copy after login

This regex matches any string enclosed in angle brackets (< and >) and replaces it with an empty string.

HTML Agility Pack:

Alternatively, you can leverage the HTML Agility Pack library:

HTMLDocument doc = new HTMLDocument();
doc.LoadHtml(input);
string stripped = doc.DocumentNode.InnerText;

Copy after login

This method parses the HTML string and returns only the text content, excluding all tags and attributes.

Caveats:

While these methods effectively remove HTML tags, they have limitations:

The regex approach can miss some complex HTML structures.
The HTML Agility Pack can be slower for large HTML documents.

It's important to choose the appropriate solution based on your specific requirements and trade-offs.

The above is the detailed content of How Can I Remove All HTML Tags from a String Efficiently?. For more information, please follow other related articles on the PHP Chinese website!