Best way to analyse 1m rows of data?

Thug
Soldato
Joined
4 Jan 2013
Posts
3,783
Looking for any help from Excel Gurus on this one!

I've written a macro to pick up and process all files in a folder, but I'm finding excel is struggling when it gets close to the 1 million row mark and is crashing everytime.

Is there a better way to do this? Anything that would be causing the crashing? Am I pulling it wrong, do I need to split it?

Usually I'd take a sample, but on this occasion I actually need the data for totals, and for generating stats...

There is a further caveat, it'll have to be done on a pc with excel 2007 (don't ask). pc is reasonably specced (4770k, 8gb ram), but nothing special. Can't see speed being a hindrance but I think excel 2007 has limitations?

There'll be 5 formulas on each line (data goes across to AC) but they're simple conditional formulas based on vlookup. Will be using a pivot table to derive my results.

Any help at all would be useful. Got a tight deadline :(
 
Looking for any help from Excel Gurus on this one!

I've written a macro to pick up and process all files in a folder, but I'm finding excel is struggling when it gets close to the 1 million row mark and is crashing everytime.

Is there a better way to do this? Anything that would be causing the crashing? Am I pulling it wrong, do I need to split it?

Usually I'd take a sample, but on this occasion I actually need the data for totals, and for generating stats...

There is a further caveat, it'll have to be done on a pc with excel 2007 (don't ask). pc is reasonably specced (4770k, 8gb ram), but nothing special. Can't see speed being a hindrance but I think excel 2007 has limitations?

There'll be 5 formulas on each line (data goes across to AC) but they're simple conditional formulas based on vlookup. Will be using a pivot table to derive my results.

Any help at all would be useful. Got a tight deadline :(
1m row is the limit in excel IIRC.

Just split the file into two data-sets if excel is the only tool you have & pool the aggregates. (two pivots) - then pivot those results.
 
Were they all created with 64bit excel in xlsx?

We found that files made in 32bit excel are subject to the performance limitations (both ram and cpu) even when opened in a 64bit version.
 
1m row is the limit in excel IIRC.

Just split the file into two data-sets if excel is the only tool you have & pool the aggregates. (two pivots) - then pivot those results.

Ah, didn't realise that! I thought it was unlimited for some stupid reason...

Great idea on the two pivots will try that. Having researched the topic more, looks like I scraped everything including formatting which probably doesn't help...

Do the stats in the source data and just pool the results?

Excel doesnt like lots of formulas at once, if you have 5 add them one at a time then paster over with their values

Thanks for the idea, unfortunately there are several hundred spreadsheets. Also I'll be using external data (Libor rates etc) and I don't want to run the risk of getting values wrong for each sheet. Much better to have one sheet and know everything is fully standardised.

Good idea on pasting the values though, I like it!

Were they all created with 64bit excel in xlsx?

We found that files made in 32bit excel are subject to the performance limitations (both ram and cpu) even when opened in a 64bit version.

Hmm not sure on this, will have to check work version. Thanks for the heads up though.

With so much data, have you looked into something like R? It's geared towards big data analysis.

Alas I don't have the time to learn new software, nor do I think work would be happy with me installing random stuff on pcs.

Is this software particularly useful? I'm going to be doing a lot more over time, so if there is an advantage over excel, then I'd be happy to learn it myself in my spare time. : )
 
I do a lot of data analysis in Excel for work, and learning R is currently high on my list of 'to-dos'.

There's a SAMS 'Teach yourself R in 24 hours'. Waiting for a few more reviews before I get it, though.

Edit: Here's a good intro to R
 
Last edited:
Get a copy of qlikview it's free for none commercial use import and analyse if all the files are standardised your gtg just think of it as a pivot table which can handle datasets in the 100s of gig size.
 
Ah, didn't realise that! I thought it was unlimited for some stupid reason...

Great idea on the two pivots will try that. Having researched the topic more, looks like I scraped everything including formatting which probably doesn't help...



Thanks for the idea, unfortunately there are several hundred spreadsheets. Also I'll be using external data (Libor rates etc) and I don't want to run the risk of getting values wrong for each sheet. Much better to have one sheet and know everything is fully standardised.

Good idea on pasting the values though, I like it!



Hmm not sure on this, will have to check work version. Thanks for the heads up though.



Alas I don't have the time to learn new software, nor do I think work would be happy with me installing random stuff on pcs.

Is this software particularly useful? I'm going to be doing a lot more over time, so if there is an advantage over excel, then I'd be happy to learn it myself in my spare time. : )

I would have also thought R and R studio is best. If you ever do science or many other disciplines to a good level you will use R.

Not the easiest program to use and master, but to do basic analysis it is quite simple.
 
Could you use Access?

This might be the better option, if you've got Excel you surely have Access too?

Linking a few tables instead of VLOOKUPS would be a 5 minute job to learn even if you've had no experience in Access.

How would you use access? I had to build some relational databases during my masters but that was a long time ago!

I've got access so if theres a better way I'm all ears.

My problem would be scraping all the data from the sheets into the program. Unless you can batch import? (THere are 5 tabs on each sheet, just need the one, not the rest).
 
How good is your VBA?

Could you feed all the data into a massive array, do the analysis and then spit it back out again?
 
I agree with the access suggestions above. When I was doing my computer science degree (some thirteen years ago) one of the best teachers we had always said 'everything belongs in a database' and he knew his stuff. :)

From what I understand though the newer access iterations stopped offering pivots, but from memory 2007 should still have that as standard.
 
This suggestion is along same lines as using access or vba.
In an engineering environment, I often use perl language (runs on windows) as a one stop shop, to read/process data from source text files and then prepare csv files to load into excel, also calculating stats in perl (eg. binning of the data or standard deviations)
Matlab and Octave (licensed/free) are also tools that can be used to produce nice charts if that is how you finally present data.

pseudo code along the lines of
foreach file x in directory/folder
open file x
foreach line y in file , fields y1,y2,...
data_array(x, y1) = y2
))

so build a big array which you can then traverse/process in perl
I think learning curve to get into perl is much less than vba.


I did not understand the role of the 'several hundred spread-sheets' ?
 
You should be able to "import external data" in access, or add the excel sheets as linked tables.

You can then query via SQL/query builder.
 
Get a copy of qlikview it's free for none commercial use import and analyse if all the files are standardised your gtg just think of it as a pivot table which can handle datasets in the 100s of gig size.

+1. Qlikview is a fantastic and powerful tool. I use it at work. Can handle large data sets very easily.
 
Back
Top Bottom