Question:
How can I effortlessly eradicate all HTML tags from a string, regardless of the specific tags involved?
Example:
Consider the following HTML-rich string:
string title = "<b>Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series,)
Ideally, we want the output to be:
"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"
Solution:
Regex Approach:
One efficient solution utilizes regular expressions:
public static string StripHTML(string input) { return Regex.Replace(input, "<.*?>", String.Empty); }
This regex matches any string enclosed in angle brackets (< and >) and replaces it with an empty string.
HTML Agility Pack:
Alternatively, you can leverage the HTML Agility Pack library:
HTMLDocument doc = new HTMLDocument(); doc.LoadHtml(input); string stripped = doc.DocumentNode.InnerText;
This method parses the HTML string and returns only the text content, excluding all tags and attributes.
Caveats:
While these methods effectively remove HTML tags, they have limitations:
It's important to choose the appropriate solution based on your specific requirements and trade-offs.
The above is the detailed content of How Can I Remove All HTML Tags from a String Efficiently?. For more information, please follow other related articles on the PHP Chinese website!