degree level Stats/mathmatic help..I feel my job is begining to get beyond my level of understanding

If I use PCA taking 95% of the data explained it is very much more than 1/3 of the PCs,

The reason I used PCA in the first place was that often there are two variables that correlate strongly and I know they are pretty much the same thing in real life. The automated model making software does not exclude these.

The distributions amoung many of the variables are not good. Almost categorical in appearance and far from a nice normal distribution

I have definitely identified that PCA does not pick out complex relationships. Regression against a random variable and looking at variance inflation does a good job here

When actually making models I do trim off factors which do not contribute. The models even pick out near perfect correlation. It's the highly correlated variables it fails on.
Neural nets is something I have access to and has produced a good model in a different basic dataset, but this is after successful variable sorting where most of the variables are nicely distributed

Sorry, this reply is on my phone, I'll try and give a better one when home
 
Today I have been trying non linear methods without PCA at all

Does this seem valid?
Taking the points above PCA in this case seems to be creating more problems than it issolvi ng
So I have been putting large chunks of variables into non linear model making software
Removing any variables that are definitely explaining the same thing with maybe just a factor to confuse things
Looking at the factors which do not contribute to the model and removing these
Of the ones that do I have been using linear Regression to find correlations within these and confirming then on the real data and removing 2 if 3 are correlated

I am also splitting the data into training and validation to confirm I am not over-fitting
The models initially over-fit but I am finding using the above I can improve the difference between the training and validation

The Neural net seems to work the best although it is the model I know least about
 
You can learn ANYTHING OP.

That's beauty of the internet, use this opportunity to try and beast stats in your spare time on the internetz.

Don't give up because you may leave this job and in the future this knowledge you force yourself to gain now can be used down the line to get a better job.

They say oh you got degree with relevant statistical work? you say no however I worked for x and x company doing it blah blah. Oh right cool you've got the job.

Give it your best shot first, good luck.
 
I think I may reluctantly have to try something else, I have beenllooking at what I feel I need to understand to do this job well and have come to the conclusion that I have somehow missed out on a whole range of aspects that would help with this despite having a bio sciences degree and a level maths

I think the time i have here is just too short to plug the holes and seeing as I can't afford to not have a job for any length of time

This is a bit of a downer for me Tbh, all I can think of is powering through as much aat accountancy modules. My previous work experience is not ideal for getting jobs unfortunately as I have found when applying for this
 
Last edited:
Back
Top Bottom