“In an array with integers between 1 and 1,000,000 one value is in the array twice. How do you deter

XXL1990 · 12 Jun 2012 at 16:05

I have been asked the following by one of my tutors
“In an array with integers between 1 and 1,000,000 one value is in the array twice. How do you determine which one?”

How would i go about implementing the above in either java or c#?

Thanks in advance

GravyMonster · 12 Jun 2012 at 16:08

Do you need to find the location of the duplicate or remove it?

First thing would be to sort the array then traverse and compare current/previous values. There may be faster ways though, depending on what you need to do.

sachinbadiani · 12 Jun 2012 at 16:32

Spunkey is right. You would do a quick sort, then go through the array to compare from one number to the previous. As soon as you find the one is equal to the previous, then you have found the number and the traversing can stop.

fez · 12 Jun 2012 at 16:56

Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.

SoapSurgeon · 12 Jun 2012 at 16:56

Could you employ some sort of binary search once you've sorted it? Take the middle index, check the value there if it isn't what you expect then the duplicate must be in the first half, else repeat with the second half?

For instance the index 499,999 should contain the value 500,000. If it contains 499,999 then you know that the duplicate is between 0 and 499,000.

edit: great minds and all that... ^^^

GravyMonster · 12 Jun 2012 at 16:59

fez said:
Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.

This is assuming the numbers are consecutive, but still a nice method to use.

SoapSurgeon · 12 Jun 2012 at 17:00

Spunkey said:
This is assuming the numbers are consecutive, but still a nice method to use.

they will be once you've sorted them? My understanding was that you would have 1,000,001 numbers from 1 - 1,000,000 but with a duplicate value somewhere.

GravyMonster · 12 Jun 2012 at 17:07

Could be the case, but OP only states that the array contains numbers from 1-1,000,000, not that it has 1,000,000 values in the array.

Anyway I'm just splitting hairs, that method is the best for consecutive numbers IMO.

Haircut · 12 Jun 2012 at 17:08

SoapSurgeon said:
they will be once you've sorted them? My understanding was that you would have 1,000,001 numbers from 1 - 1,000,000 but with a duplicate value somewhere.

If that's the case then you can just sum the numbers and subtract 500,000,500,000 (the sum of all numbers from 1 - 1,000,000) from the total to find your duplicate value.

fez · 12 Jun 2012 at 17:09

Yea, we don't know from the wording of the question especially. As said though, the numbers must be consecutive and without break for our method to be of any use.

moogle · 12 Jun 2012 at 17:39

From the wording the array size is unknown but the numbers range from 1 - 1000000.

Just use a hashtable and add the numbers to it and check each time you iterate through the array if it's already in the table.

dazzerd · 12 Jun 2012 at 20:53

bit of a kludge but you could add the values in the array to a dictionary key, it will throw an error when it trys to add a duplicate key, catch the error and catch the value attempting to be added to the dictionary.

ProfessorFate · 12 Jun 2012 at 23:35

In Java you can use your own Comparator with the Arrays.sort method. So you will see the duplicate during the sort. Since there's no partial sort, you could then abort the sort by throwing an excepton once you've got your match so you don't have to sort the whole array.

Haircut · 13 Jun 2012 at 09:31

dazzerd said:
bit of a kludge but you could add the values in the array to a dictionary key, it will throw an error when it trys to add a duplicate key, catch the error and catch the value attempting to be added to the dictionary.

That's not far off from the method I'd use to do this, but rather than a dictionary that throws I'd use a HashSet that returns true or false whether the insert succeeded.

In C# I'd define this as an extension method for IEnumerable<T>, something like the following:

Code:

void Main()
{
	var numbers = new [] { 2, 6, 8, 4, 5, 6, 9, 10, 1 };
	
	var duplicate = numbers.FirstDuplicate();
	
	duplicate.Dump();
}

public static class Extensions
{
	public static T FirstDuplicate<T>(this IEnumerable<T> source)
	{
		var hashSet = new HashSet<T>();
		
		foreach (var item in source)
		{
			if (!hashSet.Add(item))
			{
				return item;
			}
		}
		
		throw new InvalidOperationException("No matching elements");
	}
}

(This is a LinqPad snippet)

Similarly you could also define a FirstDuplicateOrDefault method to follow the patter of First/FirstOrDefault and Single/SingleOrDefault in LINQ.
This is similar to how LINQ actually implements the Distinct method, with a few modifications for null argument checking etc.

ProfessorFate · 13 Jun 2012 at 19:26

Thinking about it, not sure my idea would work in practice due to the number of comparisons the sort actually does :confused:

Brian The Snail · 13 Jun 2012 at 19:34

Why not just take the mode of the array?

slylittlefox · 13 Jun 2012 at 20:04

Clever answer ^

But I'd guess that a binary search is probably the most time efficient solution

moogle · 13 Jun 2012 at 20:11

slylittlefox said:
Clever answer ^

But I'd guess that a binary search is probably the most time efficient solution

Wouldn't you need to know the value you're searching for in the first place?

slylittlefox · 13 Jun 2012 at 20:19

Yeah but you do know the number you're looking for, whenever you do the split you're looking for the expected number a la

fez said:
Theoretically, as you know the number of items in the array, once you have ordered them you should be able to simply cut the array in half, look at the first value of the second half of the array, if its 1 more than it should be then you know that the double number is in the first half of the array, if not then its in the second half. Rinse and repeat to narrow the range it can be in then you will just have to check a few values at the end to complete it.

SoapSurgeon said:
Could you employ some sort of binary search once you've sorted it? Take the middle index, check the value there if it isn't what you expect then the duplicate must be in the first half, else repeat with the second half?

For instance the index 499,999 should contain the value 500,000. If it contains 499,999 then you know that the duplicate is between 0 and 499,000.

edit: great minds and all that... ^^^

Good article: http://blog.computationalcomplexity.org/2005/03/finding-duplicates.html

moogle · 13 Jun 2012 at 20:21

slylittlefox said:
Yeah but you do know the number you're looking for, whenever you do the split you're looking for the expected number a la

No we don't, all we know is it's an array with a value that is duplicate.

I don't see how a binary search would help find that, if you already know what number to search for then you don't need to search the array