Performance question

sunama · 4 Jan 2012 at 20:46

Performance question/ideas (c#)

I have 2 pieces of code.
On their own they do nothing.
Think of this merely as a block of code used to make a performance comparison.

Code:

int indexMax = 999999999;
string temp = "";
for (var index = 0; index <= indexMax; index++)
{
                temp = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";
                //other code which uses temp will go here
}

Code:

int indexMax = 999999999;
for (var index = 0; index <= indexMax; index++)
{
                string temp1 = "a" + "b" + "b" + "b" + "b" + "b" + "b" + "b" + "b";
                //other code which uses temp will go here
}

I came up with the 2 simple pieces of code above, in order to test which is faster when iterated many many times.

The difference between those 2 code blocks is that in the 2nd block, a string is created and then initialised a billion times. In the top block, the string is created once and then once inside the for loop, the same string is reused (the value is re-assigned).

Now I ran some tests and the block which re-uses the same string is marginally quicker (on my computer, when iterated a billion times, the time difference is about 10ms, on average).

Question: is it better to code for the marginal speed increase OR is it better to create a new string, a billion times?

I ask this, because I am creating an AI and I am craving every little bit of performance I can gain. I am in the process of altering any piece of code which could possibly give a benefit in the performance.

Your opinions please.

sunama · 4 Jan 2012 at 21:45

C#, .Net 4.0.

Could a compiler really be "that" intelligent that it will place a variable in the correct place, during compilation - ie. would it detect inefficient code and then compile a program which is fully optimised?

If this is the case, then it would explain why after a billion iterations, the time difference between the 2 methods gives near identical durations.

Logically, it would be better to place the string, temp, inside the For loop, but if there is a performance benefit...I would quite happily put it outside.

sunama · 4 Jan 2012 at 22:12

In that case, rather than using a simple string as I have to test the code, I would need to use a random string generator, of equal length.

do something like (pseudocode):

Code:

for (i=1 to 999999999)
{
  string stringOne = GenerateString(5) //length 5
  string stringTwo = GenerateString(5) //length 5
  string stringThree = stringOne  + stringTwo ;
}

In the reuse string version, we would declare the strings outside the for loop and then reuse them on every iteration.

I will give this a go and see if I get different results.

sunama · 4 Jan 2012 at 22:37

Right then. I've done a quick test:

Results (iterating 999999 times)
method1: Reusing 3 string variables vs
method2: Declaring every string variable, in every loop

method2 was faster by 101 milliseconds
method1 was faster by 6 milliseconds
method2 was faster by 138 milliseconds
method1 was faster by 87 milliseconds
method1 was faster by 3 milliseconds
method2 was faster by 21 milliseconds

For reference the 2 sets of iterations are taking about 12 seconds, so the performance difference is minimal.

What I can summarise from this is that the performance benefit of one method over the other is minimal. And that reusing variables to increase performance is not the right way to go.

Here's the code:

Code:

private void TestButton2_Click(object sender, EventArgs e)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            int indexMax = 999999;


            Stopwatch stp = new Stopwatch();

            stp.Start();

            string temp1;
            string temp2;
            string temp3;
            for (var index = 0; index <= indexMax; index++)
            {

                temp1 = MathProcessingClass.GetRandomString(5);
                temp2 = MathProcessingClass.GetRandomString(5);
                temp3 = temp1 + temp2;

            }
            stp.Stop();

            var elapsedMS_fastMethod = stp.ElapsedMilliseconds;


            stp.Reset();
            stp.Start();
            for (var index = 0; index <= indexMax; index++)
            {

                string temp4 = MathProcessingClass.GetRandomString(5);
                string temp5 = MathProcessingClass.GetRandomString(5);
                string temp6 = temp4 + temp5;

            }
            stp.Stop();

            var elapsedMS_slowMethod = stp.ElapsedMilliseconds;


            MessageBox.Show("number of milliseconds faster method is faster than show method: " + (elapsedMS_slowMethod - elapsedMS_fastMethod).ToString());


        }





public static string GetRandomString(int size, bool lowerCase = true)
        {
            StringBuilder builder = new StringBuilder();
            Random random = new Random();
            char ch;
            for (int i = 0; i < size; i++)
            {
                ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
                builder.Append(ch);
            }
            if (lowerCase)
                return builder.ToString().ToLower();
            return builder.ToString();
        }

sunama · 4 Jan 2012 at 22:43

ZombieFan said:
If you are performing a large number of string operations in a loop then you should use a StringBuilder rather than string concatination.

I understand. The purpose of this experiment is to see the effect of re-using variables, as opposed to declaring them repeatedly. In my actual (fully optimised) code, I use stringbuilders where a long string is being built (as seen in my random string generator...which is copied from my actual project).

ZombieFan said:
.NET is still creating 10 string objects on each iteration of the loop

This would go some way to explaining why there appears to be minimum performance difference between the 2 methods.

ZombieFan said:
Also, if you want to perform more accurate timing on the code, use the Stopwatch class to time exactly how long a block of code takes to execute.

Yep...I do in fact use stopwatch, as can be seen in my code above. The code above has been ripped directly from my project.

PS. If anybody thinks I can make GetRandomString faster, then I'm all ears. That method is used by my program.

sunama · 5 Jan 2012 at 17:59

peterwalkley said:
I think you could be seeing the trees instead of the wood here. Correct choices of algorithms and data structures will have far more benefit than this sort of micro-optimisation.

I understand this. But, when I have reached a limit to how far I can take an algorithm, the next step is to look for other areas to improve performance.

The only way to improve the algorithms is to have a completely fresh mind (different person) take a look at the code.

peterwalkley said:
Have you run your code with a profiler so you know where its spending it time ? No matter how experienced a developer you are, it is extremely easy to fall into the trap of solving the wrong problem

I tried briefly using ANTS profiler and I could not make head nor tail of it. To be fair, I only tried it for about 10 minutes and it seemed more trouble than it was worth.

I really should spend more time with it and look for areas to improve.

I find it fun to take little algorithms and make them run faster. Using a profiler didnt quite feel like "fun" to me.

Another way and perhaps easier way of improving performance is to simply upgrade the hardware running the program.

sunama · 5 Jan 2012 at 18:01

ZombieFan said:

Code:

        public static string GetRandomString(Random random, int size, bool lowerCase = true)
        {
            char[] ch = new char[size];

            for (int i = 0; i < size; i++)
            {
                ch[i] = (char)random.Next(65, 90);
            }
            if (lowerCase)
                return new string(ch).ToLower();
            return new string(ch);
        }

I shall run this code and report back my results.

sunama · 5 Jan 2012 at 18:34

I've used zombie fan's method (using a new Random object, created inside the method) and here are the results, vs the old method:

loops: 999999
zombie method was faster by 126 milliseconds
zombie method was faster by 144 milliseconds
zombie method was faster by 121 milliseconds
zombie method was faster by 140 milliseconds
zombie method was faster by 82 milliseconds

It seems pretty consistent.

And these are the results when we create the Random object outside the the method and pass it into the method (ie. we re-use the original Random object).

loops: 999999
zombie method was faster by 3141 milliseconds
zombie method was faster by 3130 milliseconds
zombie method was faster by 3084 milliseconds

The above set of results are little unrealistic, simply because we are not going to loop this many times, using the same Random object.

So, what I have done is created a single static Random object in the main declarations module of the program (along with a Monitor to prevent 2 threads using the same object). My program can definitely use this system, as is. Here are the results:

loops: 999999
zombie modified method was faster by 2959 milliseconds
zombie modified method was faster by 2986 milliseconds
zombie modified method was faster by 2982 milliseconds

Thank you mr zombiefan.

Here is the code for the last version (modified zombie method)

Code:

        private void TestButton2_Click(object sender, EventArgs e)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            int indexMax = 999999;


            Stopwatch stp = new Stopwatch();

            stp.Start();
            

            for (var index = 0; index <= indexMax; index++)
            {
                string temp1 = MathProcessingClass.GetRandomString(5);
            }
            stp.Stop();

            var elapsedMS_oldMethod = stp.ElapsedMilliseconds;


            stp.Reset();
            stp.Start();
            //Random rand = new Random();
            for (var index = 0; index <= indexMax; index++)
            {

                string temp2 = MathProcessingClass.GetRandomStringV3(5);

            }
            stp.Stop();

            var elapsedMS_newMethod = stp.ElapsedMilliseconds;


            MessageBox.Show("number of milliseconds new method is faster than old method: " + (elapsedMS_oldMethod - elapsedMS_newMethod).ToString());
        }

Code:

        public static string GetRandomStringV3(int size, bool lowerCase = true)
        {
            char[] ch = new char[size];


            Monitor.Enter(DeclarationsModule.randLock);
            for (int i = 0; i < size; i++)
            {
                ch[i] = (char)DeclarationsModule.random.Next(65, 90);
            }
            Monitor.Exit(DeclarationsModule.randLock);



            if (lowerCase)
                return new string(ch).ToLower();
            return new string(ch);
        }

sunama · 5 Jan 2012 at 18:40

The Asgard said:
Use the first snip. Create the string outside the loop if it's being reused within the loop. Only create objects when you need to.

Asgard, you are thinking in the same way I thought.

But after running the tests (see above), the difference in performance is minimal. Hence, it is better to ignore the possible performance benefits which may happen by reusing the same variable and go for the advantage of declaring the variable as close to its actual usage as possible.

In summary, from my findings, it is better to use the 2nd version (ie. declare a new string in every separate loop), rather than re-use the same variable in each loop.

sunama · 5 Jan 2012 at 19:18

fez said:
As long as you can remember that in some languages you will pay a much larger price for redeclaring variables inside a loop every time it runs you should be fine.

I must admit that I was quite surprised that re-declaring a string (in the way it was done in the loop above), had virtually no time/performance penalty. I had always assumed, almost without question that everytime you declare a variable/object that new location in memory is allocated to that object and this action uses resources.

sunama · 8 Jan 2012 at 17:21

I was tinkering today as I thought I would once and for all deal with the '(string) .Remove' poor performance.

So I came up with a method which uses faster techniques to remove chars from a string. And if those techniques are slow, it reverts back to the original .Remove method.

In future. I will be using .Substring and my new method: RemoveUsingSubstring (code show below)

For those of you who don't know, in c#, the .Remove method on strings is very slow and takes about 3 times as long as .Substring.

If any of you have any advice on ways to improve the code below, then please let me know. If you want to use the method then go ahead.

Code:

        /// <summary>
        /// to be used instead of ".Remove". On average, this method is faster
        /// typically, it is always faster than ".Remove", when startIndex = 0
        /// see descriptions inside the method for more information
        /// </summary>
        /// <param name="theString"></param>
        /// <param name="startIndex"></param>
        /// <param name="lengthOfStringToRemove"></param>
        /// <returns></returns>
        internal static string RemoveUsingSubstring(string theString, int startIndex, int lengthOfStringToRemove)
        {

            if (string.IsNullOrEmpty(theString))
                throw new ArgumentNullException();


            if (lengthOfStringToRemove < 1)
                return theString;


            if (startIndex < 0)
            {
                //DeclarationsModule.errorLog.Add("ERROR337: Unable to remove chars from string (" + theString + "), becuase startIndex (" + startIndex + ") is negative.");
                throw new ArgumentOutOfRangeException();
            }


            if (theString.Length <= startIndex)
                return theString;


            if (startIndex == 0)
                return theString.Substring(lengthOfStringToRemove, theString.Length - lengthOfStringToRemove); //this is faster than .Remove
                    

            if (theString.Length <= startIndex + lengthOfStringToRemove)
                return theString.Substring(0, startIndex); //this is faster than .Remove


            //if we reach here, we use the default remove method provided by .NET
            return theString.Remove(startIndex, lengthOfStringToRemove);
        }

The above method was tested on random strings (looped 999999 times, compared against the standard .Remove method), removing 2 chars at random locations from a random string of length 5. The above method, on average, was faster than the bog standard .Remove method. Obviously, where possible, you should try and use .Substring

sunama · 8 Jan 2012 at 21:54

Haircut said:
I have to say, if you've got some code where the bottleneck is the performance of removing parts of a string then it sounds as though you're doing something wrong.

No bottlenecks...just tinkering, that's all.

Also consider that as I working on language processing some strings may be processed (in some shape or form), 1000s of times per second. In this case, some string processing methods will be used many many times. Every millisecond saved in a single call of that method can end up saving decent chunks of time, if it is called 1000s of times per second.

At present my program is able to respond to approximately 100 queries, per second, but that doesn't mean I can't optimise it further.

sunama · 8 Jan 2012 at 22:51

Dj_Jestar said:
You also only ever Substring() when you substring from other than the beginning of the string.

So you are suggesting that substring should not be used to get a sub-string at the start of the string?

I would've thought that was the best place to use the substring.

EG: if I want to get the first 3 letters of a string temp= "Hello";

The best best way to do this is temp.substring(0,3);

Or are you suggesting that there is a better way of doing this?

I also use substring when taking a substring from the middle of a string, eg: temp.substring(1, 3);

I read up that .remove takes 3 times as long to complete, than substring. Hence it is faster to use 2 substring statements than a single .remove command.

I also tested the code (I wrote and this is the most important thing to me - actual working results) and on average it is quicker to use my method than to use the .remove command. The reason is because in many cases, my methods make use of the faster substring method. If I use the .remove call, it will always use the .remove command (which is slower than substring).

Here is an explanation of why substring is faster than remove.