How to remove HTML tags from string in C#
I will show you three different methods to remove HTML tags from string in C#:
1. By using Regex:
public static string RemoveHTMLTags(string html) { return Regex.Replace(html, "<.*?>", string.Empty); }
2. By using Compiled Regex for better performance:
static Regex htmlRegex = new Regex("<.*?>", RegexOptions.Compiled); public static string RemoveHTMLTagsCompiled(string html) { return htmlRegex.Replace(html, string.Empty); }
3. By using Char Array for faster performance for several HTML files:
public static string RemoveHTMLTagsCharArray(string html) { char[] charArray = new char[html.Length]; int index = 0; bool isInside = false; for (int i = 0; i < html.Length; i++) { char left = html[i]; if (left == '<') { isInside = true; continue; } if (left == '>') { isInside = false; continue; } if (!isInside) { charArray[index] = left; index++; } } return new string(charArray, 0, index); }
Comments (3)
Until your html contains Then ” 3 < 5″ will return “5”
Hello Dear, are you truly visiting this website regularly, if so afterward you will without doubt get pleasant knowledge.
Very nice post. I simply stumbled upon your blog and wanted to mention that I have really loved browsing your weblog posts. After all I’ll be subscribing to your feed and I hope you write again very soon!