“In an array with integers between 1 and 1,000,000 one value is in the array twice. How do you deter

Associate
Joined
21 Jun 2011
Posts
16
I have been asked the following by one of my tutors
“In an array with integers between 1 and 1,000,000 one value is in the array twice. How do you determine which one?”

How would i go about implementing the above in either java or c#?

Thanks in advance
 
Spunkey is right. You would do a quick sort, then go through the array to compare from one number to the previous. As soon as you find the one is equal to the previous, then you have found the number and the traversing can stop.
 
Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.
 
Could you employ some sort of binary search once you've sorted it? Take the middle index, check the value there if it isn't what you expect then the duplicate must be in the first half, else repeat with the second half?

For instance the index 499,999 should contain the value 500,000. If it contains 499,999 then you know that the duplicate is between 0 and 499,000.

edit: great minds and all that... ^^^
 
Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.
This is assuming the numbers are consecutive, but still a nice method to use.
 
they will be once you've sorted them? My understanding was that you would have 1,000,001 numbers from 1 - 1,000,000 but with a duplicate value somewhere.

If that's the case then you can just sum the numbers and subtract 500,000,500,000 (the sum of all numbers from 1 - 1,000,000) from the total to find your duplicate value.
 
Yea, we don't know from the wording of the question especially. As said though, the numbers must be consecutive and without break for our method to be of any use.
 
From the wording the array size is unknown but the numbers range from 1 - 1000000.

Just use a hashtable and add the numbers to it and check each time you iterate through the array if it's already in the table.
 
bit of a kludge but you could add the values in the array to a dictionary key, it will throw an error when it trys to add a duplicate key, catch the error and catch the value attempting to be added to the dictionary.
 
In Java you can use your own Comparator with the Arrays.sort method. So you will see the duplicate during the sort. Since there's no partial sort, you could then abort the sort by throwing an excepton once you've got your match so you don't have to sort the whole array.
 
bit of a kludge but you could add the values in the array to a dictionary key, it will throw an error when it trys to add a duplicate key, catch the error and catch the value attempting to be added to the dictionary.

That's not far off from the method I'd use to do this, but rather than a dictionary that throws I'd use a HashSet that returns true or false whether the insert succeeded.

In C# I'd define this as an extension method for IEnumerable<T>, something like the following:

Code:
void Main()
{
	var numbers = new [] { 2, 6, 8, 4, 5, 6, 9, 10, 1 };
	
	var duplicate = numbers.FirstDuplicate();
	
	duplicate.Dump();
}

public static class Extensions
{
	public static T FirstDuplicate<T>(this IEnumerable<T> source)
	{
		var hashSet = new HashSet<T>();
		
		foreach (var item in source)
		{
			if (!hashSet.Add(item))
			{
				return item;
			}
		}
		
		throw new InvalidOperationException("No matching elements");
	}
}

(This is a LinqPad snippet)

Similarly you could also define a FirstDuplicateOrDefault method to follow the patter of First/FirstOrDefault and Single/SingleOrDefault in LINQ.
This is similar to how LINQ actually implements the Distinct method, with a few modifications for null argument checking etc.
 
Last edited:
Yeah but you do know the number you're looking for, whenever you do the split you're looking for the expected number a la

Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.

Could you employ some sort of binary search once you've sorted it? Take the middle index, check the value there if it isn't what you expect then the duplicate must be in the first half, else repeat with the second half?

For instance the index 499,999 should contain the value 500,000. If it contains 499,999 then you know that the duplicate is between 0 and 499,000.

edit: great minds and all that... ^^^

Good article: http://blog.computationalcomplexity.org/2005/03/finding-duplicates.html
 
Yeah but you do know the number you're looking for, whenever you do the split you're looking for the expected number a la

No we don't, all we know is it's an array with a value that is duplicate.

I don't see how a binary search would help find that, if you already know what number to search for then you don't need to search the array :)
 
Back
Top Bottom