.Net question about List - is '.add' threadsafe?

I spent half an hour doing an experiment.

With Lists, everytime we add and the (natural) limit of the list is hit, the capacity, doubles. This doubling, allegedly, uses resources and wastes time. As you have stated, by resizing the array, before using the series of .add methods, should, in theory save time.

I tried this and it didnt work.

I set up a timing system and in general, I got the following results:

typical load time (when i resize the array, before using .add)..............7.2s
typical load time (when I do not resize the list, before using .add) ......7.0s

I thought that this was a little bizarre, but I have to rely on what I see and the results of my tests, as opposed to the theory. Hence, I shall not be resizing the capacity of the List, before calling .add.

How do mean resize the array? You set the init size via the ctor then add away, it's only a "performance tweak" if you know the size of the list upfront and have a large number (500k+?). If do you arguablebly you'd be better with other structures. Also, be very careful of that array.copy, copying a large number of items can be slow if you're only just over your init size where as exponential growth (the default) does a lot of smaller array copies.

I used this code (admittedly not thread safe but is good for benchmarking) and seen decent saving in large lists. Not so much when you have <1m items:

Default growth with 100 items: 0.0042 ms
Init size and growth set at 10 with 100 items: 0.0026 ms

Default growth with 500 items: 0.0114 ms
Init size and growth set at 250 with 500 items: 0.0107 ms

Default growth with 15000 items: 0.2524 ms
Init size and growth set at 1000 with 15000 items: 0.254 ms

Default growth with 12000 items: 0.2179 ms
Init size and growth set at 10000 with 12000 items: 0.2065 ms

Default growth with 12000 items: 0.2187 ms
Init size and growth set at 6000 with 12000 items: 0.1988 ms

Default growth with 27000 items: 0.4754 ms
Init size and growth set at 6000 with 27000 items: 0.49 ms

Default growth with 10100 items: 0.1942 ms
Init size and growth set at 10000 with 10100 items: 0.1927 ms

Default growth with 100000 items: 10.5534 ms
Init size and growth set at 100000 with 100000 items: 1.1964 ms

Default growth with 150000 items: 2.3558 ms
Init size and growth set at 100000 with 150000 items: 4.1924 ms

Default growth with 10000000 items: 1250.9683 ms
Init size and growth set at 9000000 with 10000000 items: 1452.7528 ms

Be aware all these are in milliseconds so this is extremely in micro optimisations category.
 
How do mean resize the array?

Sorry...not array, but List. (I'm so used to using arrays and arrayList)

So for example.

testList.capacity = newSizeOfList


Not so much when you have <1m items:

Word list is 114718 in size, so this would explain why the benefits are not apparent.

Be aware all these are in milliseconds so this is extremely in micro optimisations category.

This would also explain why I am getting no performance gain by defining the capacity of the List.
 
Could you upload the code somewhere? Pastebin or something?

This is the code of LoadFileContentsIntoList.
I have decided to keep the

List.capacity = xxx

line in the code as it doesnt do any harm and may help when dealing with huge lists.


Code:
    'given a file (dir and name) this sub loads its contents, line by line, into a given array
    Public Sub LoadFileContentsIntoList(ByVal directoryAndFileName As String, ByRef fileContentsList As List(Of String), Optional ByVal extract As Boolean = True, Optional ByVal useCache As Boolean = False, Optional ByVal includeDuplicates As Boolean = True)
        If File.Exists(directoryAndFileName) Then
            Dim extractYesOrNo As String
            If extract = True Then
                extractYesOrNo = "yes"
            Else
                extractYesOrNo = "no"
            End If


            Dim contentsOfFile As String = GetDataFromFileByPrecedingText(directoryAndFileName, "[all]", extractYesOrNo, useCache) ', False, False)
            fileContentsList.Capacity = fileContentsList.Capacity + GetNumberOfLinesInFile(directoryAndFileName, useCache) + 1

            If extract = False Then
                Dim ap As New ArrayProcessingClass
                ap.LoadMultipleLineContentsIntoEmptyList(contentsOfFile, fileContentsList, False)
                Exit Sub
            End If


            Dim maxNumberOfCharsInBigString As Integer = contentsOfFile.Length - 1
            Dim ignoreLine As Boolean = False
            Dim maxNumberOfChars2 As Integer = maxNumberOfCharsInBigString - 5
            Dim tp As New TextProcessingClass
            For charNumber = 0 To maxNumberOfCharsInBigString
                If extract = True Then
                    ''donot copy lines which start with ''''', into the array
                    If charNumber < maxNumberOfChars2 Then
                        If String.Equals(contentsOfFile.Substring(charNumber, 5), "'''''") Then
                            ignoreLine = True
                        End If
                    End If

                    If Char.Equals(contentsOfFile.Chars(charNumber), NEWLINE) Then
                        ignoreLine = False
                    End If
                End If


                If ignoreLine = False Then
                    If String.Equals(contentsOfFile.Chars(charNumber), "{") Then
                        'keep going until we hit "}"
                        Dim internalIndex As Integer
                        Dim startInternalIndex As Integer = charNumber + 1
                        For internalIndex = startInternalIndex To maxNumberOfCharsInBigString
                            While cpuUsage >= maxCpuUsage 'performance management
                                Threading.Thread.Sleep(50) 'halts thread if cpu usage is too high
                            End While

                            If String.Equals(contentsOfFile.Chars(internalIndex), "}") Then
                                'now assign the small string to the array, } amd { are excluded
                                Dim tempLineContents As String = contentsOfFile.Substring(charNumber + 1, internalIndex - charNumber - 1)
                                tempLineContents = tp.RemoveCurlyBracketsFromStartAndEndOfString(tempLineContents)

                                If includeDuplicates Then
                                    fileContentsList.Add(tempLineContents)
                                Else
                                    If Not fileContentsList.Contains(tempLineContents) Then fileContentsList.Add(tempLineContents)
                                End If

                                charNumber = internalIndex + 2 'this takes into account the carriage return char
                                Exit For
                            End If
                        Next internalIndex
                    End If
                End If
            Next charNumber
        Else
            errorLog.Add("ERROR27: An error occurred while reading " & directoryAndFileName & _
                    ". Operation aborted.") ', MsgBoxStyle.OkOnly, "File Read Error")
        End If

        fileContentsList.TrimExcess()
    End Sub
 
What's with the "performance management" Thread.Sleep? It's not really a surprise it took 8 minutes. Why do you not want "CPU usage to be high"?
 
My belief is that any program (especially an "intelligent" program), must have the ability to throttle/monitor its own resources. The idea is to avoid the scenario of locking up the computer with near 100% cpu usage. My belief is that the computer which runs this application must be able to deal with other stuff as well.

Performance management is still in an experimental phase and I shall probably be creating a completely separate class which deals with this. Rather than placing sleep/wait statements all over the place, it might be easier to simply hold the entire program.

In any case, cpu usage hasn't really been dealt with.

Initially, when I developed the application, the idea was to be able to run this program on any machine (even a mobile phone). As the program developed, I created a server - client scenario, which now allows the clients to be run on a slow PCs, while the main work is done by the Server.

If there is no cpu management, it is quite possible that the server could run at 100 cpu utilization 24/7. The AI is going to be fairly advanced which means that if it is sitting idle, it shall be able to go on the net and research certain subjects. It shall also have the ability to make posts on the official website. The idea is to also allow it to "tweet" and possibly even post on forums (though I haven't yet come up with the algorithm for this).

The idea is that the AI will be continuously evolving and shall never be sitting idle. If I let it go wild, it will quite happily eat 100% of the cpu, 24/7. IMO CPU management is important.
 
Use thread priority for that. Sleeping for 50ms is a huge amount of time. Even sleeping for 1ms will slow it down massively. If you have an intensive task that's not necessarily urgent, just set thread priority to low. And an urgent task to high... Let the OS deal with what gets done when. I can guarantee you its scheduler is far more advanced than Thread.Sleep ;)
 
Those were my intial thoughts, however, my understanding is as follows:

Lets say AICore (my program name), is receiving input/output requests at a rate of say 10/second. To process a single request requires a fair amount of processing power and this, when you consider that 80% of the code has not yet been written. By time I get close to completing the final version 1 (end of this year), each input/output request from a client will completely bog the server down.

Consider also that AICore (while processing input/output requests from clients), will also be going on the internet and looking up various "terms". It shall be finding meanings to words and be checking whether or not certain statements which users have written are true or false, there is heavy emphasis on web browsers being opened and closed at rapid speed.

Now, if I were to set a thread priority for each thread (consider that there will be 100s of threads and processes working concurrently), it will become very difficult to manage which thread should have a high priority and which has a low priority. Simply to remember all the different threads which may (or may not) be running, during the coding process would be a nightmare. For this reason, I felt it better to simply use a bog standard cpu throttling mechanism, whereby AICore (or the user) can throttle the entire program and not individual threads.

At present I'm at 20k+ lines of code. By the Summer I will hit the 30k mark and by December should be at 45k. As you can imagine, identifying which threads gets what priority could be prove a nightmare.

To get around this problem I was thinking about throttling the ENTIRE program, at certain points. I felt that using the CPUsage was the best method to achieve this.

Also, as I understand, even if a program has the lowest priority, CPU usage may still hit 100%.

Bear in mind, I'm still thinking how best to tackle the issue of cpu usage, so I'm just sounding out my thoughts.
 
I think Phyre's point is that you just set all worker threads to low priority, and let the OS sort it out for you. :)

Though tbh, if you really will be serving 10+/second requests, you should look at using a cluster/hive instead of one machine.
 
Yeah.

OR, you could use some form of message passing pattern (check out MPAPI at codeplex). So you have X number of threads (based on number of processor cores usually) and then pass tasks to those threads like a thread pool. In the main thread set each "task" a priority level and put it in the queue accordingly. That way high-priority tasks get processed before low-priority, but at least everything gets finished smoothly without having to resort to Sleep hacks.

Yes, low priority threads can still take the CPU to 100%. What's wrong with that? If a higher priority thread comes along the processor will devote more time to processing it.
 
Back
Top Bottom