Performance question

sunama · 4 Jan 2012 at 20:46

Performance question/ideas (c#)

I have 2 pieces of code.
On their own they do nothing.
Think of this merely as a block of code used to make a performance comparison.

Code:

int indexMax = 999999999;
string temp = "";
for (var index = 0; index <= indexMax; index++)
{
                temp = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";
                //other code which uses temp will go here
}

Code:

int indexMax = 999999999;
for (var index = 0; index <= indexMax; index++)
{
                string temp1 = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";
                //other code which uses temp will go here
}

I came up with the 2 simple pieces of code above, in order to test which is faster when iterated many many times.

The difference between those 2 code blocks is that in the 2nd block, a string is created and then initialised a billion times. In the top block, the string is created once and then once inside the for loop, the same string is reused (the value is re-assigned).

Now I ran some tests and the block which re-uses the same string is marginally quicker (on my computer, when iterated a billion times, the time difference is about 10ms, on average).

Question: is it better to code for the marginal speed increase OR is it better to create a new string, a billion times?

I ask this, because I am creating an AI and I am craving every little bit of performance I can gain. I am in the process of altering any piece of code which could possibly give a benefit in the performance.

Your opinions please.

durbs · 4 Jan 2012 at 21:25

Depends on the language and compiler. For example c#'s .net compiler will clock the loop and I bet the intermediate code for both examples would end up the same. Therefore the answer is either. I'd say though that if the variable isn't used outside of the loop then it shouldn't be outside of the loop.

What are you using for your real app language-wise?

sunama · 4 Jan 2012 at 21:45

C#, .Net 4.0.

Could a compiler really be "that" intelligent that it will place a variable in the correct place, during compilation - ie. would it detect inefficient code and then compile a program which is fully optimised?

If this is the case, then it would explain why after a billion iterations, the time difference between the 2 methods gives near identical durations.

Logically, it would be better to place the string, temp, inside the For loop, but if there is a performance benefit...I would quite happily put it outside.

Haircut · 4 Jan 2012 at 21:59

sunama said:
C#, .Net 4.0.

Could a compiler really be "that" intelligent that it will place a variable in the correct place, during compilation - ie. would it detect inefficient code and then compile a program which is fully optimised?

It certainly is the case that the compiler, and indeed the processor, can move your code around. That's why the .NET framework contains things like Thread.MemoryBarrier() and other constructs to help synchronisation.

In your example the compiler also knows that because you're concatenating several strings that are known at compile time then the resulting string is also known at compile time and it will likely be stored as a constant in your class and this constant simply references instead.

As durbs says if you check out the IL then you should see what it's doing.

yer_averagejoe · 4 Jan 2012 at 22:09

I'd declare the string outside the loop.

At the end of the day, its up to you. For any loop more than a few iterations, I'd feel uneasy about declaring it inside.

In this case, it might be that the compiler is doing something nifty in the background. In the future you might come across a different compiler and be less fortunate.

sunama · 4 Jan 2012 at 22:12

In that case, rather than using a simple string as I have to test the code, I would need to use a random string generator, of equal length.

do something like (pseudocode):

Code:

for (i=1 to 999999999)
{
  string stringOne = GenerateString(5) //length 5
  string stringTwo = GenerateString(5) //length 5
  string stringThree = stringOne  + stringTwo ;
}

In the reuse string version, we would declare the strings outside the for loop and then reuse them on every iteration.

I will give this a go and see if I get different results.

ZombieFan · 4 Jan 2012 at 22:28

If you are performing a large number of string operations in a loop then you should use a StringBuilder rather than string concatination.

In your first example (declaring the string outside the loop), .NET is still creating 10 string objects on each iteration of the loop - a string is being created for each character in the following line:

"a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";

...the result of this being placed into a tenth temporary string object created by .NET, and then copied to the 'temp' variable (I think!)

However, this may be cleaned up when it compiles into IL.

Try it with a StringBuilder. It should be quite a bit quicker.

Also, if you want to perform more accurate timing on the code, use the Stopwatch class to time exactly how long a block of code takes to execute.

sunama · 4 Jan 2012 at 22:37

Right then. I've done a quick test:

Results (iterating 999999 times)
method1: Reusing 3 string variables vs
method2: Declaring every string variable, in every loop

method2 was faster by 101 milliseconds
method1 was faster by 6 milliseconds
method2 was faster by 138 milliseconds
method1 was faster by 87 milliseconds
method1 was faster by 3 milliseconds
method2 was faster by 21 milliseconds

For reference the 2 sets of iterations are taking about 12 seconds, so the performance difference is minimal.

What I can summarise from this is that the performance benefit of one method over the other is minimal. And that reusing variables to increase performance is not the right way to go.

Here's the code:

Code:

private void TestButton2_Click(object sender, EventArgs e)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            int indexMax = 999999;


            Stopwatch stp = new Stopwatch();

            stp.Start();

            string temp1;
            string temp2;
            string temp3;
            for (var index = 0; index <= indexMax; index++)
            {

                temp1 = MathProcessingClass.GetRandomString(5);
                temp2 = MathProcessingClass.GetRandomString(5);
                temp3 = temp1 + temp2;

            }
            stp.Stop();

            var elapsedMS_fastMethod = stp.ElapsedMilliseconds;


            stp.Reset();
            stp.Start();
            for (var index = 0; index <= indexMax; index++)
            {

                string temp4 = MathProcessingClass.GetRandomString(5);
                string temp5 = MathProcessingClass.GetRandomString(5);
                string temp6 = temp4 + temp5;

            }
            stp.Stop();

            var elapsedMS_slowMethod = stp.ElapsedMilliseconds;


            MessageBox.Show("number of milliseconds faster method is faster than show method: " + (elapsedMS_slowMethod - elapsedMS_fastMethod).ToString());


        }





public static string GetRandomString(int size, bool lowerCase = true)
        {
            StringBuilder builder = new StringBuilder();
            Random random = new Random();
            char ch;
            for (int i = 0; i < size; i++)
            {
                ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
                builder.Append(ch);
            }
            if (lowerCase)
                return builder.ToString().ToLower();
            return builder.ToString();
        }

sunama · 4 Jan 2012 at 22:43

ZombieFan said:
If you are performing a large number of string operations in a loop then you should use a StringBuilder rather than string concatination.

I understand. The purpose of this experiment is to see the effect of re-using variables, as opposed to declaring them repeatedly. In my actual (fully optimised) code, I use stringbuilders where a long string is being built (as seen in my random string generator...which is copied from my actual project).

ZombieFan said:
.NET is still creating 10 string objects on each iteration of the loop

This would go some way to explaining why there appears to be minimum performance difference between the 2 methods.

ZombieFan said:
Also, if you want to perform more accurate timing on the code, use the Stopwatch class to time exactly how long a block of code takes to execute.

Yep...I do in fact use stopwatch, as can be seen in my code above. The code above has been ripped directly from my project.

PS. If anybody thinks I can make GetRandomString faster, then I'm all ears. That method is used by my program.

durbs · 4 Jan 2012 at 22:52

ZombieFan said:
In your first example (declaring the string outside the loop), .NET is still creating 10 string objects on each iteration of the loop - a string is being created for each character in the following line:

"a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";

...the result of this being placed into a tenth temporary string object created by .NET, and then copied to the 'temp' variable (I think!)

However, this may be cleaned up when it compiles into IL.

Try it with a StringBuilder. It should be quite a bit quicker.

Don't think that would be the case, as Haircut said, the compiler would see those are constants and compile it as:

String x = "abbbbbbbb"

So if you used stringbuilder, it'd probably be slower.

As for generating the random string, have you tried newguid()? Don't know whether generating a guid would be faster but it's worth a go.

ZombieFan · 4 Jan 2012 at 23:19

durbs said:
Don't think that would be the case, as Haircut said, the compiler would see those are constants and compile it as:

String x = "abbbbbbbb"

So if you used stringbuilder, it'd probably be slower.

As for generating the random string, have you tried newguid()? Don't know whether generating a guid would be faster but it's worth a go.

Yea, you could be right. It needs profiling to know for sure.

ZombieFan · 4 Jan 2012 at 23:26

sunama said:
PS. If anybody thinks I can make GetRandomString faster, then I'm all ears. That method is used by my program.

This should be significantly faster:

Code:

        public static string GetRandomString(Random random, int size, bool lowerCase = true)
        {
            char[] ch = new char[size];

            for (int i = 0; i < size; i++)
            {
                ch[i] = (char)random.Next(65, 90);
            }
            if (lowerCase)
                return new string(ch).ToLower();
            return new string(ch);
        }

I'm passing in the Random object as a parameter since there is some overhead on creating it each time you enter the method. It will also prevent the method from producing the same random strings over and over if it's called multiple times in a short period of time. But you can always add the original Random variable back it if required.

Hope this helps

Goofball · 5 Jan 2012 at 08:47

durbs said:
Don't think that would be the case, as Haircut said, the compiler would see those are constants and compile it as:

String x = "abbbbbbbb"

That's true for constants.

There's another optimization if there's non-constants involved. If it's done in a single statement, then the compiler will replace it with a call to string.concat(), so you can do multiple plusses in a singe expression and still only get a single new string allocated.

If you do:

Code:

var s = "a" + GetString() + "c";

And look at the IL, you'll see something like this for that statement:

Code:

  IL_0001:  ldstr      "a"
  IL_0006:  call       string MyAssembly.Program::GetString()
  IL_000b:  ldstr      "c"
  IL_0010:  calll       string [mscorlib]System.String::Concat(string,
                                                              string,
                                                              string)

The stringbuilder are beneficial when concatenations across multiple statements. Single statements are always handled nicely by the compiler.

peterwalkley · 5 Jan 2012 at 14:48

I think you could be seeing the trees instead of the wood here. Correct choices of algorithms and data structures will have far more benefit than this sort of micro-optimisation.

Have you run your code with a profiler so you know where its spending it time ? No matter how experienced a developer you are, it is extremely easy to fall into the trap of solving the wrong problem

Dj_Jestar · 5 Jan 2012 at 14:58

Inside the loop. Keep the scope as limited as possible.

As above, this is really, really inconsequential - even for your AI program.

sunama · 5 Jan 2012 at 17:59

peterwalkley said:
I think you could be seeing the trees instead of the wood here. Correct choices of algorithms and data structures will have far more benefit than this sort of micro-optimisation.

I understand this. But, when I have reached a limit to how far I can take an algorithm, the next step is to look for other areas to improve performance.

The only way to improve the algorithms is to have a completely fresh mind (different person) take a look at the code.

peterwalkley said:
Have you run your code with a profiler so you know where its spending it time ? No matter how experienced a developer you are, it is extremely easy to fall into the trap of solving the wrong problem

I tried briefly using ANTS profiler and I could not make head nor tail of it. To be fair, I only tried it for about 10 minutes and it seemed more trouble than it was worth.

I really should spend more time with it and look for areas to improve.

I find it fun to take little algorithms and make them run faster. Using a profiler didnt quite feel like "fun" to me.

Another way and perhaps easier way of improving performance is to simply upgrade the hardware running the program.

sunama · 5 Jan 2012 at 18:01

ZombieFan said:

Code:

        public static string GetRandomString(Random random, int size, bool lowerCase = true)
        {
            char[] ch = new char[size];

            for (int i = 0; i < size; i++)
            {
                ch[i] = (char)random.Next(65, 90);
            }
            if (lowerCase)
                return new string(ch).ToLower();
            return new string(ch);
        }

I shall run this code and report back my results.

Dj_Jestar · 5 Jan 2012 at 18:10

use dotTrace from JetBrains to profile. Best profiler out there.

The Asgard · 5 Jan 2012 at 18:18

sunama said:
I have 2 pieces of code.
On their own they do nothing.
Think of this merely as a block of code used to make a performance comparison.

Code:

int indexMax = 999999999; string temp = ""; for (var index = 0; index <= indexMax; index++) { temp = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b"; //other code which uses temp will go here }

Code:

int indexMax = 999999999; for (var index = 0; index <= indexMax; index++) { string temp1 = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b"; //other code which uses temp will go here }

I came up with the 2 simple pieces of code above, in order to test which is faster when iterated many many times.

The difference between those 2 code blocks is that in the 2nd block, a string is created and then initialised a billion times. In the top block, the string is created once and then once inside the for loop, the same string is reused (the value is re-assigned).

Now I ran some tests and the block which re-uses the same string is marginally quicker (on my computer, when iterated a billion times, the time difference is about 10ms, on average).

Question: is it better to code for the marginal speed increase OR is it better to create a new string, a billion times?

I ask this, because I am creating an AI and I am craving every little bit of performance I can gain. I am in the process of altering any piece of code which could possibly give a benefit in the performance.

Your opinions please.

Use the first snip. Create the string outside the loop if it's being reused within the loop. Only create objects when you need to.

sunama · 5 Jan 2012 at 18:34

I've used zombie fan's method (using a new Random object, created inside the method) and here are the results, vs the old method:

loops: 999999
zombie method was faster by 126 milliseconds
zombie method was faster by 144 milliseconds
zombie method was faster by 121 milliseconds
zombie method was faster by 140 milliseconds
zombie method was faster by 82 milliseconds

It seems pretty consistent.

And these are the results when we create the Random object outside the the method and pass it into the method (ie. we re-use the original Random object).

loops: 999999
zombie method was faster by 3141 milliseconds
zombie method was faster by 3130 milliseconds
zombie method was faster by 3084 milliseconds

The above set of results are little unrealistic, simply because we are not going to loop this many times, using the same Random object.

So, what I have done is created a single static Random object in the main declarations module of the program (along with a Monitor to prevent 2 threads using the same object). My program can definitely use this system, as is. Here are the results:

loops: 999999
zombie modified method was faster by 2959 milliseconds
zombie modified method was faster by 2986 milliseconds
zombie modified method was faster by 2982 milliseconds

Thank you mr zombiefan.

Here is the code for the last version (modified zombie method)

Code:

        private void TestButton2_Click(object sender, EventArgs e)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            int indexMax = 999999;


            Stopwatch stp = new Stopwatch();

            stp.Start();
            

            for (var index = 0; index <= indexMax; index++)
            {
                string temp1 = MathProcessingClass.GetRandomString(5);
            }
            stp.Stop();

            var elapsedMS_oldMethod = stp.ElapsedMilliseconds;


            stp.Reset();
            stp.Start();
            //Random rand = new Random();
            for (var index = 0; index <= indexMax; index++)
            {

                string temp2 = MathProcessingClass.GetRandomStringV3(5);

            }
            stp.Stop();

            var elapsedMS_newMethod = stp.ElapsedMilliseconds;


            MessageBox.Show("number of milliseconds new method is faster than old method: " + (elapsedMS_oldMethod - elapsedMS_newMethod).ToString());
        }

Code:

        public static string GetRandomStringV3(int size, bool lowerCase = true)
        {
            char[] ch = new char[size];


            Monitor.Enter(DeclarationsModule.randLock);
            for (int i = 0; i < size; i++)
            {
                ch[i] = (char)DeclarationsModule.random.Next(65, 90);
            }
            Monitor.Exit(DeclarationsModule.randLock);



            if (lowerCase)
                return new string(ch).ToLower();
            return new string(ch);
        }