Performance question

sunama · 5 Jan 2012 at 18:40

The Asgard said:
Use the first snip. Create the string outside the loop if it's being reused within the loop. Only create objects when you need to.

Asgard, you are thinking in the same way I thought.

But after running the tests (see above), the difference in performance is minimal. Hence, it is better to ignore the possible performance benefits which may happen by reusing the same variable and go for the advantage of declaring the variable as close to its actual usage as possible.

In summary, from my findings, it is better to use the 2nd version (ie. declare a new string in every separate loop), rather than re-use the same variable in each loop.

fez · 5 Jan 2012 at 19:11

As long as you can remember that in some languages you will pay a much larger price for redeclaring variables inside a loop every time it runs you should be fine.

sunama · 5 Jan 2012 at 19:18

fez said:
As long as you can remember that in some languages you will pay a much larger price for redeclaring variables inside a loop every time it runs you should be fine.

I must admit that I was quite surprised that re-declaring a string (in the way it was done in the loop above), had virtually no time/performance penalty. I had always assumed, almost without question that everytime you declare a variable/object that new location in memory is allocated to that object and this action uses resources.

Goofball · 5 Jan 2012 at 19:44

I must admit that I was quite surprised that re-declaring a string (in the way it was done in the loop above), had virtually no time/performance penalty.

Make a small test program, build it with release settings and dissasemble it, and you'll see immidiately why that's the case.

http://msdn.microsoft.com/en-us/library/f7dy01k1(v=vs.80).aspx

It's easy to see that there shouldn't be any difference between them once you realize that a local variable declaration in a static programming language is just a handle to a location on the stack, and actually not something makes anything happen in your program.

Dynamic languages where the variable is itself an object (not to be confused with the object to which it's referencing!) you might get a different situation .

ng93 · 5 Jan 2012 at 20:06

Interesting results. I remember when making an Android (Java) game engine the garbage collection would cause a lot of stuttering if anything was declared after the game had loaded. Does C#/.Net have any similar issues? (I'm a complete noob at .Net

).

Haircut · 5 Jan 2012 at 20:49

The Asgard said:
Use the first snip. Create the string outside the loop if it's being reused within the loop. Only create objects when you need to.

sunama said:
Asgard, you are thinking in the same way I thought.

But after running the tests (see above), the difference in performance is minimal. Hence, it is better to ignore the possible performance benefits which may happen by reusing the same variable and go for the advantage of declaring the variable as close to its actual usage as possible.

In summary, from my findings, it is better to use the 2nd version (ie. declare a new string in every separate loop), rather than re-use the same variable in each loop.

No, no, no - this is a fundamental misunderstanding of how objects are used and memory is managed in .Net.

Where you declare the variable makes no difference to the objects that are created.
It is the new() operator that allocates memory and creates the object.

For a string this is a little opaque as you can't create an instance of a string directly by using the new() operator, but this is done implicitly.

string s1 = "my string";

is the same as being able to do

string s1 = new string("my string");

Your variable is just a pointer on the stack of the current method.
When you assign something (I'm assuming a reference type) to it you're simply making that pointer point to a location on the heap containing your object.

If you assign something else to that same variable it still has to create a new object on the heap and potentially garbage collect the old object at some point if nothing else references it.
Your pointer will then simply be updated to point to the new object.

The only difference between the two will be a bit of pushing and popping from the thread's stack for the member in the method. Even then the compiler would likely optimise that away as has already been mentioned, thus the minuscule differences you're seeing.

Dj_Jestar · 6 Jan 2012 at 09:55

The Asgard said:
Use the first snip. Create the string outside the loop if it's being reused within the loop. Only create objects when you need to.

Declaring a variable is not creating the object. But someone with as much experience as you would know that, right? Right?

Goofball · 6 Jan 2012 at 11:18

ng93 said:
Interesting results. I remember when making an Android (Java) game engine the garbage collection would cause a lot of stuttering if anything was declared after the game had loaded. Does C#/.Net have any similar issues? (I'm a complete noob at .Net ).

It's "created" and not "declared" you mean.

.NET CLR and various JVMs would do about the same for this (as would C++ with the default new operator). Creating a reference type (an object) requires memory to be allocated on the heap of the program. It involves updating datastructures and possibly calls to the OS to assign more space, the last bit is expensive and probably the source to your stuttering.

The CLR is very similar to the various JVMs in that respect, differences would be the one with the better memory allocation scheme, but they do the same sort of stuff. Mobile devices, like Android/Dalvik, have to be more careful with memory than the desktop equivalents, so you'll end up asking the OS for more quite often, because it doesn't want to assign you more than you absolutely need.

What .NET does offer that JVMs dont is complex, user defined, value types. So you can allocate complex objects on the stack, and bypass the heap altogether, which is a big boon when doing realtime stuff like games (and the reason why all the complex types in realtime libraries for .NET, like XNA, are all structs and not classes)

sunama · 8 Jan 2012 at 17:21

I was tinkering today as I thought I would once and for all deal with the '(string) .Remove' poor performance.

So I came up with a method which uses faster techniques to remove chars from a string. And if those techniques are slow, it reverts back to the original .Remove method.

In future. I will be using .Substring and my new method: RemoveUsingSubstring (code show below)

For those of you who don't know, in c#, the .Remove method on strings is very slow and takes about 3 times as long as .Substring.

If any of you have any advice on ways to improve the code below, then please let me know. If you want to use the method then go ahead.

Code:

        /// <summary>
        /// to be used instead of ".Remove". On average, this method is faster
        /// typically, it is always faster than ".Remove", when startIndex = 0
        /// see descriptions inside the method for more information
        /// </summary>
        /// <param name="theString"></param>
        /// <param name="startIndex"></param>
        /// <param name="lengthOfStringToRemove"></param>
        /// <returns></returns>
        internal static string RemoveUsingSubstring(string theString, int startIndex, int lengthOfStringToRemove)
        {

            if (string.IsNullOrEmpty(theString))
                throw new ArgumentNullException();


            if (lengthOfStringToRemove < 1)
                return theString;


            if (startIndex < 0)
            {
                //DeclarationsModule.errorLog.Add("ERROR337: Unable to remove chars from string (" + theString + "), becuase startIndex (" + startIndex + ") is negative.");
                throw new ArgumentOutOfRangeException();
            }


            if (theString.Length <= startIndex)
                return theString;


            if (startIndex == 0)
                return theString.Substring(lengthOfStringToRemove, theString.Length - lengthOfStringToRemove); //this is faster than .Remove
                    

            if (theString.Length <= startIndex + lengthOfStringToRemove)
                return theString.Substring(0, startIndex); //this is faster than .Remove


            //if we reach here, we use the default remove method provided by .NET
            return theString.Remove(startIndex, lengthOfStringToRemove);
        }

The above method was tested on random strings (looped 999999 times, compared against the standard .Remove method), removing 2 chars at random locations from a random string of length 5. The above method, on average, was faster than the bog standard .Remove method. Obviously, where possible, you should try and use .Substring

ng93 · 8 Jan 2012 at 18:32

Goofball said:
It's "created" and not "declared" you mean.

.NET CLR and various JVMs would do about the same for this (as would C++ with the default new operator). Creating a reference type (an object) requires memory to be allocated on the heap of the program. It involves updating datastructures and possibly calls to the OS to assign more space, the last bit is expensive and probably the source to your stuttering.

The CLR is very similar to the various JVMs in that respect, differences would be the one with the better memory allocation scheme, but they do the same sort of stuff. Mobile devices, like Android/Dalvik, have to be more careful with memory than the desktop equivalents, so you'll end up asking the OS for more quite often, because it doesn't want to assign you more than you absolutely need.

What .NET does offer that JVMs dont is complex, user defined, value types. So you can allocate complex objects on the stack, and bypass the heap altogether, which is a big boon when doing realtime stuff like games (and the reason why all the complex types in realtime libraries for .NET, like XNA, are all structs and not classes)

Tad delayed but cheers for clearing it up

Haircut · 8 Jan 2012 at 19:19

sunama said:
I was tinkering today as I thought I would once and for all deal with the '(string) .Remove' poor performance.

So I came up with a method which uses faster techniques to remove chars from a string. And if those techniques are slow, it reverts back to the original .Remove method.

In future. I will be using .Substring and my new method: RemoveUsingSubstring (code show below)

For those of you who don't know, in c#, the .Remove method on strings is very slow and takes about 3 times as long as .Substring.

I have to say, if you've got some code where the bottleneck is the performance of removing parts of a string then it sounds as though you're doing something wrong.

sunama · 8 Jan 2012 at 21:54

Haircut said:
I have to say, if you've got some code where the bottleneck is the performance of removing parts of a string then it sounds as though you're doing something wrong.

No bottlenecks...just tinkering, that's all.

Also consider that as I working on language processing some strings may be processed (in some shape or form), 1000s of times per second. In this case, some string processing methods will be used many many times. Every millisecond saved in a single call of that method can end up saving decent chunks of time, if it is called 1000s of times per second.

At present my program is able to respond to approximately 100 queries, per second, but that doesn't mean I can't optimise it further.

Dj_Jestar · 8 Jan 2012 at 22:18

sunama.. stop assuming your code is better than the .NET code. It's not.

The reason the .NET Remove() method appears slower, is because it does some thread safety checks, and some Contract assertions for diagnostics, then uses an external (i.e. C/C++ component) to perform the removal. It is also concatenating the remainder of the strings - something you are not doing with Substring(). If you were to use it to remove from "34" "12345" - the return value would be: "12" - no "5".

Your code is also doubling up on the checks that the .NET methods you subsequently call upon do. Here is the actual code from the .NET class String:

Substring:

Code:

        // Returns a substring of this string. 
        // 
#if !FEATURE_CORECLR
        [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")] 
#endif
        public String Substring (int startIndex) {
            return this.Substring (startIndex, Length-startIndex);
        } 

        // Returns a substring of this string. 
        // 
        [System.Security.SecuritySafeCritical]  // auto-generated
#if !FEATURE_CORECLR 
        [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
#endif
        public String Substring(int startIndex, int length) {
            // okay to not enforce copying in the case of Substring(0, length), since we assume 
            // String instances are immutable.
            return InternalSubStringWithChecks(startIndex, length, false); 
        } 

 
        [System.Security.SecurityCritical]  // auto-generated
        internal String InternalSubStringWithChecks (int startIndex, int length, bool fAlwaysCopy) {

            //Bounds Checking. 
            if (startIndex < 0) {
                throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex")); 
            } 

            if (startIndex > Length) { 
                throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndexLargerThanLength"));
            }

            if (length < 0) { 
                throw new ArgumentOutOfRangeException("length", Environment.GetResourceString("ArgumentOutOfRange_NegativeLength"));
            } 
 
            if (startIndex > Length - length) {
                throw new ArgumentOutOfRangeException("length", Environment.GetResourceString("ArgumentOutOfRange_IndexLength")); 
            }
            Contract.EndContractBlock();

            if( length == 0) { 
                return String.Empty;
            } 
            return InternalSubString(startIndex, length, fAlwaysCopy); 
        }
 
        [System.Security.SecurityCritical]  // auto-generated
        unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy) {
            Contract.Assert( startIndex >= 0 && startIndex <= this.Length, "StartIndex is out of range!");
            Contract.Assert( length >= 0 && startIndex <= this.Length - length, "length is out of range!"); 

            if( startIndex == 0 && length == this.Length && !fAlwaysCopy)  { 
                return this; 
            }
 
            String result = FastAllocateString(length);

            fixed(char* dest = &result.m_firstChar)
                fixed(char* src = &this.m_firstChar) { 
                    wstrcpy(dest, src + startIndex, length);
                } 
 
            return result;
        }

Remove:

Code:

        [System.Security.SecurityCritical]  // auto-generated
        [ResourceExposure(ResourceScope.None)] 
        [MethodImplAttribute(MethodImplOptions.InternalCall)] 
        private extern String RemoveInternal(int startIndex, int count);
 
        [System.Security.SecuritySafeCritical]  // auto-generated
        public String Remove(int startIndex, int count)
        {
            if (startIndex < 0) 
                throw new ArgumentOutOfRangeException("startIndex",
                    Environment.GetResourceString("ArgumentOutOfRange_StartIndex")); 
            Contract.Ensures(Contract.Result<String>() != null); 
            Contract.Ensures(Contract.Result<String>().Length == this.Length - count);
            Contract.EndContractBlock(); 
            return RemoveInternal(startIndex, count);
        }

        // a remove that just takes a startindex. 
        public string Remove( int startIndex ) {
            if (startIndex < 0) { 
                throw new ArgumentOutOfRangeException("startIndex", 
                        Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
            } 

            if (startIndex >= Length) {
                throw new ArgumentOutOfRangeException("startIndex",
                        Environment.GetResourceString("ArgumentOutOfRange_StartIndexLessThanLength")); 
            }
 
            Contract.Ensures(Contract.Result<String>() != null); 
            Contract.EndContractBlock();
 
            return Substring(0, startIndex);
        }

If you are using Substring for removal.. you should just be using Substring.

You also only ever Substring() when you substring from other than the beginning of the string.

sunama · 8 Jan 2012 at 22:51

Dj_Jestar said:
You also only ever Substring() when you substring from other than the beginning of the string.

So you are suggesting that substring should not be used to get a sub-string at the start of the string?

I would've thought that was the best place to use the substring.

EG: if I want to get the first 3 letters of a string temp= "Hello";

The best best way to do this is temp.substring(0,3);

Or are you suggesting that there is a better way of doing this?

I also use substring when taking a substring from the middle of a string, eg: temp.substring(1, 3);

I read up that .remove takes 3 times as long to complete, than substring. Hence it is faster to use 2 substring statements than a single .remove command.

I also tested the code (I wrote and this is the most important thing to me - actual working results) and on average it is quicker to use my method than to use the .remove command. The reason is because in many cases, my methods make use of the faster substring method. If I use the .remove call, it will always use the .remove command (which is slower than substring).

Here is an explanation of why substring is faster than remove.

Dj_Jestar · 9 Jan 2012 at 02:19

Substring() and Remove() are mutually exclusive. If you want a Substring.. use Substring. If want to Remove a substring, use Remove().

I don't know why you can't see this, but that is exactly what your RemoveUsingSubstring() method does, amongst all the other belt-and-braces stuff (which I cringe at, but also double cringe because you throw ArgumentNullExceptions without the argument name...)

When you need the first n chars of a string (i.e. the substring of a string) your method uses Substring. When you don't want the substring, it uses Remove..