Unexpected string performance in .Net 1.1

A project I started recently centered around improving performance of some existing code. My first inclination was to scan the code for obvious low hanging fruit and one of the first methods I hit upon was this:

 1:   public static string CleanString(string Text)
 2:   {
 3:    // cleans a string for fuzzy matching
 4:    // trims white space, and removes numbers
 5: 
 6:    Text = Text.Trim();
 7:    Text = Text.Replace("0","");
 8:    Text = Text.Replace("1","");
 9:    Text = Text.Replace("2","");
10:    Text = Text.Replace("3","");  
11:    Text = Text.Replace("4","");
12:    Text = Text.Replace("5","");
13:    Text = Text.Replace("6","");
14:    Text = Text.Replace("7","");
15:    Text = Text.Replace("8","");
16:    Text = Text.Replace("9","");
17:    //Text = Text.Replace(".","");
18: 
19:    return Text;
20:   }

This was pretty obviously bad. Strings are immutable, so the author, in effect, allocated 11 new strings. This function is called quite often in code so it seemed like a worthwhile place to optimize. My first inclination was to throw a RegEx at it, as follows:

 1:  /// <summary>
 2:  /// Summary description for Test.
 3:  /// </summary>
 4:  public sealed class Test
 5:  {
 6:   private static Regex _scrubExpression = new Regex(@"\d*");
 7: 
 8:   /// <summary>
 9:   ///  Cleans a string: trims white space, and removes numbers.
10:   /// </summary>
11:   /// <param name="source"></param>
12:   /// <returns></returns>
13:   public static string CleanStringOptimized1(string source)
14:   {
15:    return _scrubExpression.Replace(source.Trim(), "");
16:   }
17:  }
18: }

You'd expect that to just work. Think again. Over 100,000 runs, the RegEx code turned out to be nearly three times slower. Ok, so maybe RegEx object creation overhead just kills this. Fine. Lets try something simpler, like the good old StringBuilder:

1: public static string CleanStringOptimized3(string source)
2: {
3: StringBuilder output = new StringBuilder(source.Trim());
4:
5: output.Replace("0", "")
6: .Replace("1", "")
7: .Replace("2", "")
8: .Replace("3", "")
9: .Replace("4", "")
10: .Replace("5", "")
11: .Replace("6", "")
12: .Replace("7", "")
13: .Replace("8", "")
14: .Replace("9", "");
15: return output.ToString();
16: }

Yup. This was slower too. So maybe the answer was to go simple and do just one allocation and work with bytes. Lets try this:

1: public static string CleanStringOptimized2(string source)
2: {
3: char[] result = new string(' ', source.Length).ToCharArray();
4: char[] sourceChars = source.ToCharArray();
5:
6: int resultIndex = 0;
7: for (int count = 0; count < source.Length; count++)
8: {
9: char c = sourceChars[count];
10: if (! char.IsDigit(c))
11: {
12: result[resultIndex] = c;
13: resultIndex++;
14: }
15: }
16:
17: return new string(result).Trim();
18: }

Yes, you guessed it. This, too, runs slower than the original. In fact, the results look like this:

Original comparison took 359.375ms over 100000 runs.
Optimized 1 comparison took 10203.125ms over 100000 runs.
Optimized 2 comparison took 421.875ms over 100000 runs.
Optimized 3 comparison took 718.75ms over 100000 runs.

If anyone has any insight into why my optimizations aren't optimizing, I would love to hear it! Incidentally, dropping down to IL revealed nothing out of the orginary.

posted @ Monday, June 14, 2004 4:32 PM

Print

Comments on this entry:

# re: Unexpected string performance in .Net 1.1

Left by Jeff Atwood at 8/10/2004 10:12 PM
Gravatar
I love stories like this. It is why I always tell other programmers, NEVER assume, ALWAYS measure. Because what you *think* is true rarely is. Depressing, but true. :)

# re: Unexpected string performance in .Net 1.1

Left by Jeff Atwood at 8/11/2004 11:02 AM
Gravatar
Also did you consider compiled regex, eg OPTION_COMPILED? I don't know if that would matter or not..

# re: Unexpected string performance in .Net 1.1

Left by Michael Teper at 8/11/2004 11:16 AM
Gravatar
I'll have to keep that in mind next time I consider RegEx's. When I was running these tests, I assumed that instantiating a RegEx object caused the expression to compile. The doc's certainly didn't steer me otherwise.

# re: Unexpected string performance in .Net 1.1

Left by Roy Dictus at 9/6/2004 5:00 AM
Gravatar
What about the following?

public static string CleanString(string text)
{
return text.Trim().Replace("0", "").Replace("1", ""). ...;
}


I haven't measured this, but I remember from the old VB6 days that if you did two string operations in a single statement, that would actually be compiled differently from doing string operations one line at a time.

I think Jeff makes a great point when he says that you should never assume anything. I used to optimize classes' performance and memory usage in VB6 by replacing recursive methods by iterative methods that manage their own stack, but using .Net, there seems to be no measurable performance difference!

Roy

# re: Unexpected string performance in .Net 1.1

Left by Michael Teper at 9/6/2004 9:37 AM
Gravatar
Roy, that was the "Optimized3" version above. See my follow-up post for solution: http://michaelteper.com/archive/2004/06/14/174.aspx.
Comments have been closed on this topic.