If I use PCA taking 95% of the data explained it is very much more than 1/3 of the PCs,
The reason I used PCA in the first place was that often there are two variables that correlate strongly and I know they are pretty much the same thing in real life. The automated model making software does not exclude these.
The distributions amoung many of the variables are not good. Almost categorical in appearance and far from a nice normal distribution
I have definitely identified that PCA does not pick out complex relationships. Regression against a random variable and looking at variance inflation does a good job here
When actually making models I do trim off factors which do not contribute. The models even pick out near perfect correlation. It's the highly correlated variables it fails on.
Neural nets is something I have access to and has produced a good model in a different basic dataset, but this is after successful variable sorting where most of the variables are nicely distributed
Sorry, this reply is on my phone, I'll try and give a better one when home
The reason I used PCA in the first place was that often there are two variables that correlate strongly and I know they are pretty much the same thing in real life. The automated model making software does not exclude these.
The distributions amoung many of the variables are not good. Almost categorical in appearance and far from a nice normal distribution
I have definitely identified that PCA does not pick out complex relationships. Regression against a random variable and looking at variance inflation does a good job here
When actually making models I do trim off factors which do not contribute. The models even pick out near perfect correlation. It's the highly correlated variables it fails on.
Neural nets is something I have access to and has produced a good model in a different basic dataset, but this is after successful variable sorting where most of the variables are nicely distributed
Sorry, this reply is on my phone, I'll try and give a better one when home