How to remove HTML tags from string in C#

How to remove HTML tags from string in C#

I will show you three different methods to remove HTML tags from string in C#:

1. By using Regex:

public static string RemoveHTMLTags(string html)
{
 return Regex.Replace(html, "<.*?>", string.Empty);
}

2. By using Compiled Regex for better performance:

static Regex htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
   
public static string RemoveHTMLTagsCompiled(string html)
{
 return htmlRegex.Replace(html, string.Empty);
}

3. By using Char Array for faster performance for several HTML files:

public static string RemoveHTMLTagsCharArray(string html)
{
 char[] charArray = new char[html.Length];
 int index = 0;
 bool isInside = false;
  
 for (int i = 0; i < html.Length; i++)
  {
   char left = html[i];

   if (left == '<')
    {
     isInside = true;
      continue;
    }

   if (left == '>')
    {
     isInside = false;
     continue;
    }

   if (!isInside)
    {
     charArray[index] = left;
     index++;
    }
   }

 return new string(charArray, 0, index);
}

Share this post

Comments (3)

  • Aaron Reply

    Until your html contains Then ” 3 < 5″ will return “5”

    November 12, 2021 at 2:24 AM
  • Shirlee Elfrida Reply

    Hello Dear, are you truly visiting this website regularly, if so afterward you will without doubt get pleasant knowledge.

    January 17, 2022 at 5:46 PM
  • Steven Laurence Reply

    Very nice post. I simply stumbled upon your blog and wanted to mention that I have really loved browsing your weblog posts. After all I’ll be subscribing to your feed and I hope you write again very soon!

    February 5, 2022 at 9:00 PM

Leave a Reply

Your email address will not be published. Required fields are marked *