VB.NET problem has me baffled

Soldato
Joined
18 Oct 2002
Posts
15,861
Location
NW London
Right gents I've been feverishly developing my natural language processing program and today, for the first time I encountered something that completely baffled me. I solved, but I had to use a very dirty solution, which is not ideal.

Perhaps some VB.NET expert can tell me what the hell is going on.

Below is my main subroutine, which calls the function (IsSimilarUserRecord) below it.

I haven't pasted all the code for IsSimilarUserRecord, as it is not important.


The parts I want you to take a look at are in bold (purple and pink)

main sub:

Public Sub CompileShortListOfSimilarUserRecordsMain(ByVal mainUserRecordID As String, ByVal clientID As Integer, ByRef shortListOfSimilarUserRecordIDs As ArrayList) 'Optional ByVal userRecordLineContents As String = "")
If String.IsNullOrEmpty(mainUserRecordID) Then
errorLog.Add("ERROR179: record id input is empty (" & mainUserRecordID & "). Unable to compile short lilst of similar user records.")
Exit Sub
End If


'we can't start a new short list until the previous short list has been completely emptied of user records
shortListOfSimilarUserRecordIDs.Clear()
shortListOfSimilarUserRecordIDs.Add("")

'now load up the fields of the main user record into an array list
Dim mainUserRecordFieldsArrayList As New ArrayList
mainUserRecordFieldsArrayList = userRecordArrayHandler.ReturnGenericFieldsInArrayList(mainUserRecordID)
mainUserRecordFieldsArrayList.Sort()

'now go through all the user records which apply to the current user
'load all the user record fields into arraylists, one by one.

Dim arrayProcessor As New ArrayProcessingClass
Dim localUserRecordIDAL As New ArrayList
localUserRecordIDAL = clientInfoHolder(clientID).GetUserRecordIDArrayList

Dim URIDIndexMax As Integer = localUserRecordIDAL.Count - 1
For URIDIndex = 1 To URIDIndexMax 'do for each user record loaded in memory
Dim tempURID As String = localUserRecordIDAL(URIDIndex).ToString
Dim tempURGenericFieldArrayList As New ArrayList
tempURGenericFieldArrayList = userRecordArrayHandler.ReturnGenericFieldsInArrayList(tempURID)
tempURGenericFieldArrayList.Sort()

Dim textProcessing As New TextProcessingClass
If arrayProcessor.ArrayListStringEquals(mainUserRecordFieldsArrayList, tempURGenericFieldArrayList) Then
'do nothing 'ignore the identical user record
ElseIf (IsSimilarUserRecord(mainUserRecordFieldsArrayList, tempURGenericFieldArrayList)) Then 'check if user record array is similar to new user record
'if it is we add the user record id to the short list
shortListOfSimilarUserRecordIDs.Add(tempURID)
End If
Next URIDIndex

'if none found
If shortListOfSimilarUserRecordIDs.Count < 2 Then 'count = 0 or 1
shortListOfSimilarUserRecordIDs.Clear()
End If
End Sub



Public Function IsSimilarUserRecord(ByVal arraylist1 As ArrayList, ByVal arraylist2 As ArrayList, Optional ByVal allowableNumberOfElementsToBeNonIdentical As Integer = 1) As Boolean
'stuff
End Function





Now what is happening is that in the purple line I am calling IsSimilarUserRecord. All arguments are passed, Byval (not Byref). When I call IsSimilarUserRecord for the first time, mainUserRecordFieldsArrayList and tempURGenericFieldArrayList are passed into IsSimilarUserRecord without a problem. However, when IsSimilarUserRecord returns its boolean value, the 2 array lists: mainUserRecordFieldsArrayList and tempURGenericFieldArrayList are both emptied of their contents. This should not be happening, as their contents were passed ByVal, so should be unaltered after the boolean value is returned.

In itself this wouldn't be a problem but because the contents of mainUserRecordFieldsArrayList have to be repeatedly used (as it is contained in a For loop), after the 1st loop, that array list is useless.

Can somebody explain to me why the array lists are getting emptied, whenever I call IsSimilarUserRecord and pass the 2 array to it?

Thanks
 
This is happening because ArrayList is a reference type.
In .NET reference types are passed by reference always
When passing a reference type ByVal you are simply passing the reference by value, so it's a copy of the reference rather than a copy of the actual object itself.

See this link for an explanation: http://msdn.microsoft.com/en-us/magazine/cc301569.aspx

If you really want a copy of the object you'll need to clone the contents.
I should ask why you're clearing the contents of the ArrayLists in the subroutine though?
I would expect that a method named IsSimilarUserRecord does not make any modifications to its inputs.

This is a question I always ask in interviews, it's surprising the number of professional developers who aren't clear on the concepts of reference types and value types in .NET!

PS For heaven's sake, please use code tags when posting code too!
 
Last edited:
Thanks for your reply.

If you really want a copy of the object you'll need to clone the contents.

This is what I was forced to do in the main sub. It is a dirty solution and not as elegant as the previous (simple) if statement I previously had in place. I then send this copied version into IsSimilarUserRecord, where the array lists are being cleared (for some reason).

I should ask why you're clearing the contents of the ArrayLists in the subroutine though?

Are you referring to: shortListOfSimilarUserRecordIDs array list?

I would expect that a method named IsSimilarUserRecord does not make any modifications to its inputs.

There are some modifications made by IsSimilarUserRecord, however, this is just to ensure the format of the array lists is correct. In general, after the "modifications", the contents and order of the array list, remains the same. With the input which I am seeing this "error" occur, IsSimilarUserRecord makes no changes to the contents/order of the array lists

Now, I still do not understand why the array lists (in IsSimilarUserRecord), are being emptied?

I can understand that if I clear the 2 array lists (in IsSimilarUserRecord), the array lists in the main sub, will also be cleared (due to the reference type, as you corrected me on, earlier). However, at no stage am I clearing the array lists, with code.

When we reach the end of IsSimilarUserRecord, both array lists are in tact. Yet, the moment we get back to the main sub, those 2 same array lists are empty. Is this due to garbage collection, perhaps?

I'm still baffled.
 
what does the method "IsSimilarUserRecord" do?

Can you post the code? have you tried commenting the code out of this method and just returning true to see if the same behaviour exists?

Edit:

This isnt garbage collection at work. The two variables are still reachable and therefore should not be marked for collection.
 
Last edited:
I'm having a play around with this code right now.

I've just passed the arraylists byref and all works as it should I'm suspecting that when I pass the arraylists byval, at the end of IsSimilarUserRecord, garbage collection is involved.

In the final line of IsSimilarUserRecord, BOTH arraylists are fully in tact. Something is happening between the point that IsSimilarUserRecord returns the boolean value and the point that the main sub receives this boolean value. During this time, those 2 arraylists are getting wiped out. This is not happening when the arraylists are passed ByRef.
 
Code:
    'true if 1 word away. False otherwise
    Private Function IsSimilarUserRecord(ByVal arraylist1 As ArrayList, ByVal arraylist2 As ArrayList, Optional ByVal allowableNumberOfElementsToBeNonIdentical As Integer = 1) As Boolean
        'make sure that the only emptry string in both arrays is in element 0
        Try 'prepare array1
            While arraylist1.Contains("")
                arraylist1.Remove("")
            End While

            While IsNothing(arraylist1(0))
                arraylist1.RemoveAt(0)
            End While

            arraylist1.Insert(0, "")  'now insert a space in position 0
        Catch ex As Exception

        End Try


        Try 'prepare array2
            While arraylist2.Contains("")
                arraylist2.Remove("")
            End While

            While IsNothing(arraylist2(0))
                arraylist2.RemoveAt(0)
            End While

            arraylist2.Insert(0, "")  'now insert a space in position 0
        Catch ex As Exception

        End Try
        ' ------------------------------------------------------------------

        Dim arraylist1Count As Integer = arraylist1.Count
        Dim arraylist2Count As Integer = arraylist2.Count

        If arraylist1Count <> arraylist2Count Then Return False 'if the 2 arrays are of different size, then return false
        If arraylist1Count < 2 Then Return False 'if array list has no elements or only 1 element in it, then return false

        Dim ap As New ArrayProcessingClass
        ap.RemoveDuplicateEntriesInArrayList(arraylist1)
        ap.RemoveDuplicateEntriesInArrayList(arraylist2)

        arraylist1.Sort()
        arraylist2.Sort()

        Dim identicalElementCount As Integer = 0
        Dim maxIndex As Integer = arraylist1Count - 1
        Dim minimumIdenticalElementRequired As Integer = arraylist2Count - allowableNumberOfElementsToBeNonIdentical - 1
        For index = 1 To maxIndex
            Dim tempStringFromArrayList1 As String = arraylist1(index).ToString
            If arraylist2.Contains(tempStringFromArrayList1) Then
                identicalElementCount += 1
            End If
        Next

        Dim isSimilar As Boolean = False
        If identicalElementCount >= minimumIdenticalElementRequired Then
            isSimilar = True
        End If
        Return isSimilar[COLOR="Magenta"] 'at this point, both array lists are fully in tact[/COLOR]
    End Function
 
As MrRoper says, can you post up the code for the IsSimilarUserRecord method?
There's probably something going on in there that's causing it not to work as desired.

Incidentally, which version of .NET are you using? ArrayLists have effectively been deprecated since .NET 2.0 came out and you should be using a generic List.
If you're on 3.5 or above you may be able to take advantage on LINQ too if you're using generic collections, and there may be some LINQ stuff that will help you do what you're after.

EDIT: I see you've posted the code now, will take a look shortly.
 
...have you tried commenting the code out of this method and just returning true to see if the same behaviour exists?

I've just done this.
I set the IsSimilarUserRecord, to accept the arraylists ByVal.

I commented everything out, except for the return value.

It behaves as it should, ie. the array lists are not emptied.

So, for some reason, when I leave the other code in place (inside IsSimilarUserRecord), the array lists are being emptied at the point when the boolean value is returned. However, when that code is commented out, even when the boolean value is returned, the array lists are not being cleared.
 
Incidentally, which version of .NET are you using?

3.5.

you should be using a generic List.

Thanks for the suggestion...much appreciated.

I was previously used to using (normal) arrays and arraylists, which is why I've kept on using them. I've just checked out Lists and there are definitely some advantages to be had. If we use 'List', I can't see there being any reason to use ArrayList or arrays.

If I were using an earlier version of .NET though, is List, recognised? I don't wish to have issues with earlier versions of .NET, as that would involve a MAJOR code re-write (to accommodate earlier .NET versions).
 
Nothing immediately jumps out as to why you're getting the issues you are.
The notion of a function that modifies its inputs just doesn't sit right with me though.
Had a quick look at the logic and you should hopefully be able to replace the IsSimilarUserRecord function with the one below that uses a bit of LINQ to do the task and doesn't modify the initial ArrayLists (i'd do a few tests to make sure it does work the same if you decide to use this though)

Code:
    Private Function IsSimilarLinq(ByVal elements1 As ArrayList, ByVal elements2 As ArrayList, Optional ByVal allowableNumberOfElementsToBeNonIdentical As Integer = 1) As Boolean

        Dim distinctElements1 = elements1.Cast(Of String).Distinct()
        Dim distinctElements2 = elements2.Cast(Of String).Distinct()

        If distinctElements1.Count <> distinctElements2.Count Then Return False

        Return distinctElements1.Intersect(distinctElements2).Count >= distinctElements1.Count - allowableNumberOfElementsToBeNonIdentical

    End Function

Doing it properly I would have a generic method to do this rather than taking in ArrayLists and casting them, but this version should just plug in to what you have already.

I'd be interested to see whether this works as expected.
 
Hi Haircut,

Thats a very nice function you've written then. Makes everything nice and concise.

I may use this, though would this work with older version of .NET?

I've nailed the problem down to the 2 lines of code.
When these are commented out, the function performs as normal. When these 2 lines are left in, the arraylists are cleared after the boolean value is returned back to the calling Sub.

I have posted the code of IsSimilarUserRecord, again, but this time, with the 2 lines of code causing the problem.

Note, that RemoveDuplicateEntriesInArrayList is a sub which removes any duplicate elements from the arraylist. The arrayList is passed ByRef.


Code:
 'true if 1 word away. False otherwise
    Private Function IsSimilarUserRecord(ByVal arraylist1 As ArrayList, ByVal arraylist2 As ArrayList, Optional ByVal allowableNumberOfElementsToBeNonIdentical As Integer = 1) As Boolean
        'make sure that the only emptry string in both arrays is in element 0
        Try 'prepare array1
            While arraylist1.Contains("")
                arraylist1.Remove("")
            End While

            While IsNothing(arraylist1(0))
                arraylist1.RemoveAt(0)
            End While

            arraylist1.Insert(0, "")  'now insert a space in position 0
        Catch ex As Exception

        End Try


        Try 'prepare array2
            While arraylist2.Contains("")
                arraylist2.Remove("")
            End While

            While IsNothing(arraylist2(0))
                arraylist2.RemoveAt(0)
            End While

            arraylist2.Insert(0, "")  'now insert a space in position 0
        Catch ex As Exception

        End Try
        ' ------------------------------------------------------------------

        Dim arraylist1Count As Integer = arraylist1.Count
        Dim arraylist2Count As Integer = arraylist2.Count

        If arraylist1Count <> arraylist2Count Then Return False 'if the 2 arrays are of different size, then return false
        If arraylist1Count < 2 Then Return False 'if array list has no elements or only 1 element in it, then return false

        Dim ap As New ArrayProcessingClass
[COLOR="Magenta"]        ap.RemoveDuplicateEntriesInArrayList(arraylist1)
        ap.RemoveDuplicateEntriesInArrayList(arraylist2)[/COLOR]

        arraylist1.Sort()
        arraylist2.Sort()

        Dim identicalElementCount As Integer = 0
        Dim maxIndex As Integer = arraylist1Count - 1
        Dim minimumIdenticalElementRequired As Integer = arraylist2Count - allowableNumberOfElementsToBeNonIdentical - 1
        For index = 1 To maxIndex
            Dim tempStringFromArrayList1 As String = arraylist1(index).ToString
            If arraylist2.Contains(tempStringFromArrayList1) Then
                identicalElementCount += 1
            End If
        Next

        Dim isSimilar As Boolean = False
        If identicalElementCount >= minimumIdenticalElementRequired Then
            isSimilar = True
        End If
        Return isSimilar 'at this point, both array lists are fully in tact
    End Function


I'm still baffled as to why the array lists are in tact at the end of IsSimilarUserRecord, but when the boolean value is returned, those same arrayLists, are getting cleared. When those same array lists are passed ByRef, this does not happen. This only happens when using ByVal.

It could simply be that I shall just have to put this down as a mystery.

My current version of code (which does a shallow copy of the array lists, before sending them to IsSimilarUserRecord), works fine, but it is nice to know exactly why the arrayLists are being cleared.
 
The function I posted uses LINQ, which isn't available before .NET 3.5.
You should really be targetting an earlier version of the framework if this an issue though.

Do you have the code for ArrayProcessingClass?
From what you've posted something appears to be happening in there.
 
Hi.

Backwards compatibility may be important here, with regards to the LINQ function you posted earlier. In which case, I will not include it in my code. However, I do appreciate your help.

I won't post the entire array processing class, as it is too long, but here is the RemoveDuplicateEntriesInArrayList sub:



Code:
    'input: an arraylist made up strings
    'action: removes all dupicate entries
    Public Sub RemoveDuplicateEntriesInArrayList(ByRef stringArrayList As ArrayList)
        Dim newArrayList As New ArrayList
        newArrayList.Add("")

        While stringArrayList.Contains("")
            stringArrayList.Remove("")
        End While

        Dim indexMax As Integer = stringArrayList.Count - 1
        For index = 0 To indexMax
            Dim tempString As String = stringArrayList(index).ToString

            If Not newArrayList.Contains(tempString) Then
                newArrayList.Add(tempString)
            End If


        Next index

        stringArrayList.Clear()
        [COLOR="Magenta"]stringArrayList = CType(newArrayList.Clone, ArrayList)[/COLOR]

    End Sub


I've also highlighted the line which could be causing the problem here.

Thanks.
 
Ah, that explains it.
It's because you're passing the references by value in the call to IsSimilarUserRecord.
This means that you're duplicating the reference to the object.
You then pass the duplicated reference by reference to RemoveDuplicateEntriesInArrayList, which clears the ArrayList.
You then set this reference to an instance of a new ArrayList.

So, at this point you have two references pointing to two separate ArrayLists.
Inside IsSimilarUserRecord you're pointing to the clone, outside you're pointing to the original (now empty) ArrayList.

I'll see if I can knock up a quick diagram to explain as I know it can be a little confusing!

EDIT: Added diagram for anyone that is confused how this works.
referencesy.png
 
Last edited:
I spotted this.

After I posted my last post, I was looking closely at the line I've highlighted in pink (above). That's when it clicked that that was the root of all the problems.

I've just made the corrections to that line and have changed it, so that a shallow copy is now what is done (and not a clone). Its now tested and working nicely.

Thank you for your help on this one Haircut...much appreciated.

The key post was Mr Ropers, when he said to comment out the lines in IsSimilarUserRecord. This narrowed down the problem to the line highlighted in pink.

Thank you all for your help.

Haircut, you suggested to perhaps switch to List (and move away from ArrayList). What is the earliest version of .NET that 'List' is recognised? I want to avoid backwards compatibility problems.
 
List has been there since .NET 2.0, which is when generics was first introduced.

What backwards compatibility problems are you worried about?
If you're currently using .NET 3.5 then you could be unknowingly putting all sorts of things in there that are 3.5 only if you don't know when they were introduced.
VS2008 can target .NET 2.0 as well, so if it really is a concern then you should be targetting this version.

PS I still stand by my original comments that the function shouldn't alter the input parameters in any way. Sticking to this would have solved all the problems :p
 
As I am using so many arraylists (and soon List), I just want to know what I am getting myself into, should I start to use List.

Originally, I was working off of .NET 2.0. Only recently did I make the switch to 3.5, so I'm pretty sure that most of my code is compatible with 2.0.

With regards to the Function not changing the original arraylist. I understand where you are coming from. I "sort of" agree with what you are saying, but for totally unrelated reasons, I have actually made it so that at the start of the Function, the arrayLists are first duplicated and then all the remaining operations are performed on the duplicates. This complies with your way of thinking.

The only reason why the function I pasted had operations on the original inputs was because that particular function is used only by CompileShortListOfSimilarUserRecordsMain. No other sub/function uses it.

The only reason I placed the code into a function was to remove it from the Sub, to improve readability and clutter. Otherwise all that code would've been left in the Sub (which was originally the case).
 
Glad you got the problem sorted, but I have to say that if you want to keep compatibility with a certain version of .NET then the only 100% way to do this is to target that framework. You shouldn't be targetting in .NET 3.5 if you want to keep compatibility with 2.0.

If you build against .NET 3.5 then people will need 3.5 framework installed. They won't be able to run it against 2.0 even if you haven't included any .NET 3.5 specific features.

Both VS2008 and VS2010 allow you to target .NET 2.0, so target it if that's the minimum framework you want to support. Visual Studio will also adjust it's intellisense to only show you the areas of the framework which are compatible with the currently targetted framework so you will know if something is not compatible straight away (VS2010 does, anyway!)
 
Thats good advice. I shall make the alterations in the application's properties, changing the .NET version from 3.5 down to 2.0.

I have an innate need to always run the latest and greatest software. Unfortunately, not everybody does as I do. Many offices for instance, use very very old software (eg. old operating systems and office applications), with no updates.

Incidentally, I spent about 8-10 hours yesterday, re-writing the code, changing container usage from array lists to generic lists. This was a bit of a pain, but from reading around, it is clear that on 64-bit operating systems, there are major memory usage benefits when it comes to using generic lists, compared to array lists.

Thanks for your help gents.
 
Nothing immediately jumps out as to why you're getting the issues you are.
The notion of a function that modifies its inputs just doesn't sit right with me though.
Had a quick look at the logic and you should hopefully be able to replace the IsSimilarUserRecord function with the one below that uses a bit of LINQ to do the task and doesn't modify the initial ArrayLists (i'd do a few tests to make sure it does work the same if you decide to use this though)

Code:
    Private Function IsSimilarLinq(ByVal elements1 As ArrayList, ByVal elements2 As ArrayList, Optional ByVal allowableNumberOfElementsToBeNonIdentical As Integer = 1) As Boolean

        Dim distinctElements1 = elements1.Cast(Of String).Distinct()
        Dim distinctElements2 = elements2.Cast(Of String).Distinct()

        If distinctElements1.Count <> distinctElements2.Count Then Return False

        Return distinctElements1.Intersect(distinctElements2).Count >= distinctElements1.Count - allowableNumberOfElementsToBeNonIdentical

    End Function

Doing it properly I would have a generic method to do this rather than taking in ArrayLists and casting them, but this version should just plug in to what you have already.

I'd be interested to see whether this works as expected.
Surely more simple would be (in C#, I don't use VB so won't attempt to translate):
Code:
return elements1.Count == elements2.Count 
  && elements1.Cast<string>().All(elements2.Contains);
 
Back
Top Bottom