Remove HTML tags from strings in ASP.NET
In ASP.NET, removing HTML tags from strings can be achieved through the following methods:
Although the regular expression replacement method has some limitations, it can still reliably remove HTML tags from strings:
Find and replace "1*(>|$)".
Normalize the string, replacing "[srn]" with a single space.
Remove leading and trailing spaces from the result string.
Example:
Input = "
" cleaned = Regex.Replace(input, "1*(>|$)").Normalize().Trim() Console.WriteLine(cleaned); // Output: "Hello"
Note: This method has limitations when encountering HTML/XML that contains ">" in the attribute value.
Consider using a mature HTML parsing library, such as:
These libraries provide comprehensive and customizable HTML parsing and sanitizing capabilities.
Example (using HTMLAgilityPack):
using HtmlAgilityPack; ... HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(input); Console.WriteLine(doc.DocumentNode.InnerText); // Output: "Hello"
The above is the detailed content of How to Effectively Remove HTML Tags from Strings in ASP.NET?. For more information, please follow other related articles on the PHP Chinese website!