Statistical Analysis :S

Soldato
Joined
20 Oct 2004
Posts
13,112
Location
Nottingham
Help me OCUK, you're my only hope...

A year or so back I asked you chaps to undertake a survey for some stuff I was doing with work / uni and you very helpfully did and provided me with really good data. Anyway I didn't finish that piece of work due to work/family (children being born)/covid, general real life stuff that gets in the way etc, Until now!

I've got to do some form of data analysis on this but its not in anyway something I understand (Im in Architecture). I have an excel spread sheet which I can take very basic visual graphs and pie charts from but I am wondering if there's a better way to do it given in quite a varied data set.

For reference it related to physical movement around your working office environment and sedentary behaviour within the office and thus each participant provided quite a lot of data which is stored in each cell (example below).

Basic Example;
Q. What physical activities outside of work do you participate in?
A1. Gym, Walking, House work, Gardening
A2. Badminton, Walking
A3. Cycling
...
A100. Running, Gym, Cycling, Walking, House work

Can anyone offer any ideas as to how I can analyse this? I have enabled the analysis plug-in for excel but its beyond me really as to what its asking lol.
 
Frequency of particular answer and correlation between say age sets or sex ect.

I agree I need to break this down a bit more into something more useable
 
You will need to do some data preparation, and I don't think excel will be a good place for this.

some very simple python code would let you split the text into tokens, you could then map them to particular categories like tom said.


Are these text strings likely to be coherent, i.e. no spelling errors or alternative names for the same thing (walk vs walking)?

you lost me at python sadly. Its not a massive data set, its spread over 2 surveys with 62 and 100 participants and the answers were for the most part multiple choice but some anomalies crept in because of the use of an "other" field in some questions.
 
Can you post a link to the data set on here - if not confidential in any way ?

Really hard to know how to approach this without a clearer idea of the data.

LINK

I've stripped out names and any personally identifying data, but that is what it looks like in its current state. The other one is the same but with more people and slightly different questions.
 
Do you have access to IBM SPSS?

Also, rather that correlation, you're probably looking for statistically significant differences - i.e. "There was a statistically significant difference in activity intensity between the males and females (F(1,23)=33.07, p<.05) with the male activity intensity (µMAI=106.40s) being significantly higher than the female activity intensity (µFAI=210.90s).

There are various tests to run to look for significant differences - from simple t-tests for a pair of means to one-way and two-way ANOVA etc.

No access to IBM SPSS no, I am very green with statistics as its way outside of my field.

This is going to be an evening with a large glass of wine, some music and digging through your spreadsheet to sort the data out.

Python, R and SPSS are going to overcomplicate matters at this stage.

I think you are probably correct, how would you sort the information out? (sorry)

Am I the only one that read that as sex acts?
Pervert :P
 
Did you get the answer choices grouped in the way you've shown above? Cycling shows in both A3 and A100?

Are those the choices respodents gave, or have they been grouped?

Not sure I understand, cell A3 is Male and A100 is blank.
 
looks pretty feasable for many of the columns.

had a go just with a few options (age, gender, activity):
https://drive.google.com/file/d/1Qf81s-v_NiOj7Y8KC97YJJrh9N37ypRp/view?usp=sharing

not the prettiest method, but it's just an example of how you could go splitting up the strings of text into something that you can make comparisons with.

Apologies i was parked up on phone when I first saw that and didn't realise what you had done, so much thanks. I can butcher that to deal with more or less every question by creating a new tab for each one and adjusting the targets etc
 
Back
Top Bottom