Question for database types..

rogan · 16 Aug 2012 at 15:34

I have 2 database files from 1 table exported as .xls

one from last year, one from this year with thousands of rows, is there a program of some sort that will give me the rows that are in the new one but not in the old

i.e the difference between the two?

Help appreciated!

estebanrey · 16 Aug 2012 at 15:37

Can't you just do a VLOOKUP on the new sheet against the old one and where you get "#N/A's" is line not on the old one, the ones it finds are on both.

example

uv · 16 Aug 2012 at 15:41

Or import the data into a temp table, select where count < 2

estebanrey · 16 Aug 2012 at 15:43

uv said:
Or import the data into a temp table, select where count < 2

That only works if the lines he is comparing are all unique.

uv · 16 Aug 2012 at 15:47

estebanrey said:
That only works if the lines he is comparing are all unique.

* insert from both files
* select * from ( select rowdata, count(*) from temptable group by rowdata) where count(*) < 2;
* boosh, you've got the difference between both files. if the rowdata is in both files, it won't return in the select.

Not exactly rocket science.

p1RATE · 16 Aug 2012 at 15:47

Im not too sure about the data you have but this could work.

Copy all values into a list. In the next row put in the formula '=COUNTIF(A:A,A2)>1'

This will give the result Ture if the value is in the list more than once. if you filter for false that will be your list of new values.

rogan · 16 Aug 2012 at 15:53

This is all going way over my head aha. Its for a client we've adopted who are on an old system and they wanted inventory dumps, was hoping there was a simpler way >.<

Gornall · 16 Aug 2012 at 15:54

All of the above work but without a bit more information about the data it would be difficult to say which is best.

Is the data sensitive? Email me in trust and I can have a look for you if not.

/edit Just saw your last post. Probably can't help if its business information.

Also, if this is work related how has it fallen to you when you're clearly out of your depth lol.

rogan · 16 Aug 2012 at 15:58

Its not sensetive info its just a list of products youd find on the clients website anyway.

And because im a web dev and they asked for a dump last year and one this year for their accountant, but now theyve recieved this one they 'dont want the whole thing as half of its the same we want the difference'

left to me..!

janp · 16 Aug 2012 at 16:06

If you can get both sets of data in Excel. ASAP Utilities has a macro that can look for differences in multiple columns of data. Free for personal use I believe.

http://www.asap-utilities.com/

uv · 16 Aug 2012 at 16:07

Gornall said:
All of the above work but without a bit more information about the data it would be difficult to say which is best.
.

There's a billion different ways to do it

If you're on linux, cat the two files into main.txt

cat main.txt | sort | uniq -c | sort -n > results.txt

will give you a count for each instance of a row

estebanrey · 16 Aug 2012 at 16:13

uv said:
* insert from both files
* select * from ( select rowdata, count(*) from temptable group by rowdata) where count(*) < 2;
* boosh, you've got the difference between both files. if the rowdata is in both files, it won't return in the select.

Not exactly rocket science.

Can't you see that method would fail if the any of the files contained more than one row that was the same? If the new file for example has two identical rows in it which aren't in the old one and you merge it with the old file and run a query looking for any line that appears less than twice those lines would not be highlighted despite being missing from the old list.

Not only that it would highlight every difference between the two when the OP is only interested in what is new on the new list (missing from the old list).

VLOOKUP is the easiest way, it's checking every line in one file against every line in the other and doesn't require any of the lines in either file to be unique.

Look, here's a comparison of using both methods, vlookup on the left, mixing them then counting for lines less than 2 on the right...

As you can see the lines with "2" in are correctly called using the vlookup method but are missed by your suggested method because they appear twice in the new list and thus fail fail your condition. Secondly 3,6,9 and 10 are actually ones missing from the new list but are on the old which OP isn't interested in.

estebanrey · 16 Aug 2012 at 16:23

rogan said:
This is all going way over my head aha. Its for a client we've adopted who are on an old system and they wanted inventory dumps, was hoping there was a simpler way >.<

You can't really get more simple than a vlookup query mate. If you give me the names of the columns you have and what kind of data is contained within them I'll write it for you so you can just paste it in if you really want.

rogan · 16 Aug 2012 at 16:39

Cheers for the replies guys, the primary key data was totally random because some of it was auto generated and some of it was put in directly

(its for products so it started off well they put in the product id's as tshirt-red but then got lazy and left them so a lot of they are lke 123457) ive sorted it by fobbing them off anyway its their problem really just tried to help if it was a 2 minute job but seemingly not so its for them to sort