C# for statement and arrays...

Goofball · 4 Jan 2012 at 08:56

Or just stick to the enumerable/queryable and stay immutable.

I hardly ever need to mutate collections any more.

Also, to those arguing that List is slower: Keep in mind that the performance difference between array and list is much greater in debug build or when you start the program with a debugger attached. With a release build started on it's own (outside VS, or the "start without debugging" option) the difference is entirely gone for index access since it's pretty much all in-lined.

Any measurable performance difference you see is likely:
a) not setting the capacity correctly and having it grow out of your control
b) or running a debug build.

The Asgard · 5 Jan 2012 at 18:21

Goofball said:
Or just stick to the enumerable/queryable and stay immutable.

I hardly ever need to mutate collections any more.

Also, to those arguing that List is slower: Keep in mind that the performance difference between array and list is much greater in debug build or when you start the program with a debugger attached. With a release build started on it's own (outside VS, or the "start without debugging" option) the difference is entirely gone for index access since it's pretty much all in-lined.

Any measurable performance difference you see is likely:
a) not setting the capacity correctly and having it grow out of your control
b) or running a debug build.

It's the adds that kill the speed, especially with large lists.

Dj_Jestar · 5 Jan 2012 at 18:22

Hey look everyone, The Asgard is back, and he's still not reading the thread before quickfire quoting and posting soundbites :E

The Asgard · 5 Jan 2012 at 18:24

Dj_Jestar said:
Hey look everyone, The Asgard is back, and he's still not reading the thread before quickfire quoting and posting soundbites :E

Oh look the jealous school boy is back. Hello.

Dj_Jestar · 5 Jan 2012 at 18:30

Hardly jealous old chap, merely point out the truth of it. And hardly a school boy, neither. Have you actually read the multiple times it has already been stated that it is in the adding to lists that is slow? Thought not. :E

The Asgard · 5 Jan 2012 at 18:37

Dj_Jestar said:
Hardly jealous old chap, merely point out the truth of it. And hardly a school boy, neither.

Then please act your age and not your shoe size and I might post more in this forum.

Dj_Jestar · 5 Jan 2012 at 18:39

So you haven't read the thread, I take it?

Goofball · 5 Jan 2012 at 19:07

The Asgard said:
It's the adds that kill the speed, especially with large lists.

Lists are identical in performance to array for the same operations.

The performance of Add does not degrade with size unless you've set the initial capacity too low. The exact same is true for an array, if you cannot predict the size of the array, you end up having to resize at the same cost.

You gain or loose performance by not managing the size and using the appropriate operations on them, not by choosing List over Array.

sunama · 5 Jan 2012 at 19:23

Goofball said:
Any measurable performance difference you see is likely:

I'm not so sure about this.
In theory, you should be right. But a while ago, I was told about the capacity parameter of a list. I read that if you set the capacity to the correct size BEFORE you start adding (using .Add), you will gain some performance.

Well, I tried this in my own program and observed virtually no performance increase. I wrote about this, in a thread, on this forum.

In summary, after spending a good 3 hours adding approximately 100 ".capacity = " statements and gaining virtually nothing, I gave up on this and no longer worry about stating the capacity of a list. I just use ".Add" as I go.

If you do a search on this form, I believe I wrote down some numbers on this, as well.

EDIT: in my personal experience (forget theory...I'm talking actual performance, in numbers), I'm with Asgard on this...arrays beat Lists, hands down, every time. Where Lists win, is where you need to be able to manipulate the collection (eg, add, remove, reverse, etc). But if you were creating an array which was readonly, I would go with array, every time.

I'm going to run a simple test, comparing lists vs array. results up soon.
Give me a few minutes.

Goofball · 5 Jan 2012 at 19:54

Code:

var list = new List<string>();
//var list = new List<string>(capacity); 
//var list = new List<string>(new string[capacity]);            
//var list = new string[capacity];

sw.Start();
            
for(int i=0;i < size; i++)
    list.Add("a"); // list[i] = "a";

 sw.Stop();

10^8 Assignments, average 3 runs

Using array: 614ms
Using list index operator: 611ms
Using list Add(): 830 ms

Using list Add without setting capacity: 1620ms

sunama · 5 Jan 2012 at 19:59

Okay.
Here are the test times.
This compares loading a list with random strings VS loading an array with random strings.
The list uses ".Add".
The collections are loaded using a for loop.
Iterator count = 9999999
While the size of the array was specified during declaration,
The lists capacity was also specified before loading it.

Number of milliseconds that Array is faster than List at loading:
-6
212
10
105
155
-68
22.

Typically, full loading time of both the array and list is about 10000 MS, so the above numbers are very similar.

Personally, looking at those numbers, there is very little to choose between List and Array, with Array marginally faster.

Now, lets see what happens when we don't specify the capacity of the list and we just load it up, allowing it to resize dynamically, as it increases in size.

Number of milliseconds that list with fixed capacity is faster than List with dynamic capacity.

447
161
214
126
277

From the above results we see that if you specify the capacity of the list, performance of the list loading improves. However, the increase in performance is once again, very small.

If the full list loading procedure is about 5000MS and you can shave off about 200MS off of that, the performance improvement is about 3%.

In summary: if you were loading an array and list, the speed is as follows:

1. Array (fastest)
2. List (pre-specified capacity, using .Add)
3. List (dynamic capacity, capacity is increased dynamically when required, using .Add)

The above order also tallies with Goofball's tests.

Here's the code for the final test:

Code:

        private void TestButton2_Click(object sender, EventArgs e)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            int indexMax = 9999999;


            List<string> theList = new List<string>(); //fixed capacity
            theList.Capacity = indexMax;
            var theArray = new string[indexMax];

            List<string> theList2 = new List<string>(); //dynamic capacity


            Stopwatch stp = new Stopwatch();

            stp.Start();
            
            

            for (var index = 0; index < indexMax; index++)
            {
                theList.Add(MathProcessingClass.GetRandomString(5,false));
            }
            stp.Stop();

            var elapsedMS_loadListFixed = stp.ElapsedMilliseconds;


            stp.Reset();
            stp.Start();
            //Random rand = new Random();
            for (var index = 0; index < indexMax; index++)
            {
                //theArray[index] = MathProcessingClass.GetRandomString(5, false); //this was used for the array loading
                theList2.Add(MathProcessingClass.GetRandomString(5, false));

            }
            stp.Stop();

            var elapsedMS_loadListDynamic = stp.ElapsedMilliseconds;


            MessageBox.Show("number of milliseconds which fixed capacity list loads faster than dynamic capacity list: " + (elapsedMS_loadListDynamic - elapsedMS_loadListFixed).ToString());
        }

EDIT: I wonder if the OP is still subscribed to this thread

HumbleCoder · 7 Jan 2012 at 14:02

Not to be picky, but setting the capacity of a list is not "fixing the size" it's preallocating the initial array size and in the example above you're doing it in a bad way. You should pass it via the ctor or it will allocate the initial one then reallocate when you set the capacity. Not that it has any bearing on the timings.

Setting the capacity is a double edged sword anyway because it will grow its internal array at the capacity you set, the default exponential growth one is a lot better for 99% of the situations.

Anyway, I feel the whole point is a bit of moot one because as a general rule if you're trying to optimise operations at this level then you probably have bigger problems in your algorithm, if your algorithm is prefect then you're working in the wrong language. It's very common among Python and Ruby devs to write C modules for performance sensitive stuff. (Of course there are a few exceptions but unless you're SO or bigger you're not one of them.)

sunama · 7 Jan 2012 at 14:38

HumbleCoder said:
Not to be picky, but setting the capacity of a list is not "fixing the size" it's preallocating the initial array size and in the example above you're doing it in a bad way. You should pass it via the ctor or it will allocate the initial one then reallocate when you set the capacity. Not that it has any bearing on the timings.

The example I gave above was purely from the perspective of performance/speed - for test purposes...so no guarantee on it looking pretty.

I am however, always looking to learn performance tips/tricks from others (I'd be a fool not to).

If I declare the size of the list, in the constructor, would this help speed up the program?

With regards to performance, I tried writing a few modules in C++ and then imported them into my c# program (my main program is in c#). I found that this didnt actually increase the speed of the entire program. What I found is that everything ran at the same speed as the code which was written in c#.

It's almost as if when you import a DLL (c++) code into c#, it somehow gets converted into c# code and then runs as c# code, defeating the object of using c++ (which is faster but takes more time to code).

There are definitely some algorithms which my program uses, coded in c#, which I would love to see run faster (perhaps with c++ coding, by importing the c++ code as a DLL).

If you have any advice/tips I am all ears.

Tavalin · 7 Jan 2012 at 19:16

sunama said:
If I declare the size of the list, in the constructor, would this help speed up the program?

Yes but probably not notably - this issue is that creating a list without defining the size in the constructor creates a list with it's default size. If you set the capacity afterwards then it's likely to cause the list to have to create a new internal array with the new size you've set.

If you'd passed the size through in the constructor the internal array would've been set up with the correct size from the start.

sunama · 7 Jan 2012 at 20:51

I see your point.

But in my example, you will see that when I resized the list, it was empty.
So, although there is another "step" involved, the process of changing the size of the list would have little negative impact on performance.

Now, had the list contained some elements and then I changed its capacity, the list would need to be recreated and copied (this would carry a negative penalty).

Am I thinking along the right lines?

Tavalin · 7 Jan 2012 at 21:18

I wasn't trying to quantify how much of a performance increase you'd gain by defining the size in the constructor. I was merely pointing out that doing an additional "step" isn't going to be faster than not doing the additional step.

Now, had the list contained some elements and then I changed its capacity, the list would need to be recreated and copied (this would carry a negative penalty).

I would assume it would carry a negative penalty regardless of whether you have added anything to the list or not. I would imagine the underlying data structure would have a fixed sized and need to be recreated and copied anyway.