Any statistics kings on here?

Soldato
Joined
16 Oct 2007
Posts
7,482
Location
UK
I have statistical analysis to do, but I've long since finished the little stats course I did, and I need some basic help!

Basically, I've asked a number of people:

"Of your 5 closest friends, how many of them smoke?"

and I know whether they themselves smoke or not.

I would like to find if there is any correlation between the number of friends that smoke, and whether they smoke. But I can't get my head around what test I'd need to do.

Any help would be great! Thank you
 
Test a group of 50 smokers and a group of 50 non-smokers and in each test ask them "Of your closest five friends, how many of them smoke?". Then you could do a bar graph for both non-smokers and smokers, with the horizontal axis "How many of them smoke?" (0 - 5), and the vertical axis going from 0 - 50.

Then you could compare the two bar graphs as required?
 
I have statistical analysis to do, but I've long since finished the little stats course I did, and I need some basic help!

Basically, I've asked a number of people:

"Of your 5 closest friends, how many of them smoke?"

and I know whether they themselves smoke or not.

I would like to find if there is any correlation between the number of friends that smoke, and whether they smoke. But I can't get my head around what test I'd need to do.

Any help would be great! Thank you


If it's a test for correlation, I think it may be a chi squared test.

EDIT: http://en.wikipedia.org/wiki/Chi-squared_test
http://en.wikipedia.org/wiki/Pearson's_chi-squared_test

Not 100% sure though.
 
I've used a lot of chi-squared tests to find out if there is a significant difference (i.e. reject the null hypothesis) for more simple questions (i.e. difference between you smoking & living with a smoker), but I can't figure out how to quantify the number of friends smoking against you smoking in a chi square

Perhaps I can just work out the average number of friends smoked if you smoke/don't smoke.
Or use a Mann-Whitney U test?
 
Last edited:
I've used a lot of chi-squared tests to find out if there is a significant difference (i.e. reject the null hypothesis) for more simple questions (i.e. difference between you smoking & living with a smoker), but I can't figure out how to quantify the number of friends smoking against you smoking in a chi square

Perhaps I can just work out the average number of friends smoked if you smoke/don't smoke.
Or use a Mann-Whitney U test?

:eek:

After reading this - I've decided... I'm not a Statistics King. :p
 
I am a statistics pauper. I bloody hate the things!

I'm swaying towards either a Mann-Whitney U test, or a Wilcoxon signed-rank test. Both look to see if there is a significant difference (by rejecting the null-hypothesis that there isn't a significant difference), but which to use..

OK, neither it seems, as I can only find the critical values for when n=30 or less.
My n value for my largest sample is 61
 
Last edited:
Suppose it depends how far you want to go into the statistics! Excel and Matlab have lots of pre-written functions that may be useful. If you assume that what you're trying to model is roughly normal in distribution, then Student's t-test or Pearson's correlation coefficient may give a better idea of correlation / hypothesis testing than a standard chi-squared test.
 
Can do it quite easily with a Chi Sq. Pearsons rank as mentioned would also work. I like Chi Sq shince you can do it with a pencil and paper and a calculator.

You have 2 samples (Smokers, Non Smokers) and 5 variables (Number of close friends who smoke)

Your interest is: Is the number of close friends who smoke related to whether a person smokes or not.

H0. Number of friends who smoke and smokers are independent (no association)
H1. Number of friends who smoke and smokers are not independent (association)

Set up your Chi Sq table

5 rows (Number of friends who smoke)
2 columns (Smoker Non Smoker)

You can calculate your expected values for the chi sq using

Expected = (row total * colum total)/total observations

Then work yout your chi value for each cell.

Chi Sq = Sum ((O-E)^2)/E

This will give you your Chi value. Use a lookup table to give you the significant value of Chi Sq for the correct degrees of freedom (4).

If p < 0.05 you can reject H0 and state you have a statistically significant association between smokers and the number of friends who smoke.

Note: This test wont tell you what the assocation IS only that there is an association. You could graphically represent the data using bar charts to see if there is an observed association and use the Chi Sq to confirm there is one. If you want to know exactly what the correlation is (ie. +ve or -ve) and how strong then you will need to use something like a Pearson Rank test.

Sorry for the messy post. I will mock up an example and post it shortly.

/Salsa

ChiSqExample.jpg
 
Last edited:
Can do it quite easily with a Chi Sq. Pearsons rank as mentioned would also work. I like Chi Sq shince you can do it with a pencil and paper and a calculator.

You have 2 samples (Smokers, Non Smokers) and 5 variables (Number of close friends who smoke)

Your interest is: Is the number of close friends who smoke related to whether a person smokes or not.

H0. Number of friends who smoke and smokers are independent (no association)
H1. Number of friends who smoke and smokers are not independent (association)

Set up your Chi Sq table

5 rows (Number of friends who smoke)
2 columns (Smoker Non Smoker)

You can calculate your expected values for the chi sq using

Expected = (row total * colum total)/total observations

Then work yout your chi value for each cell.

Chi Sq = Sum ((O-E)^2)/E

This will give you your Chi value. Use a lookup table to give you the significant value of Chi Sq for the correct degrees of freedom (4).

If p < 0.05 you can reject H0 and state you have a statistically significant association between smokers and the number of friends who smoke.

Sorry for the messy post. I will mock up an example and post it shortly.

/Salsa

ChiSqExample.jpg

Salsa, you hero, thank you so much for that. I made two stupid mistakes which meant I couldn't calculate it. Very much appreciate your efforts - is there any chance you still have that worksheet open?
 
Salsa, you hero, thank you so much for that. I made two stupid mistakes which meant I couldn't calculate it. Very much appreciate your efforts - is there any chance you still have that worksheet open?

I do.

Please note this test will not tell you what the association is only that there is an association. You would need to use Spearmans Rank to express exactly the value of your correlation if there is one. Bit laborious to do by hand tho ;)

/Salsa
 
Actually I fixed my own, thank you again. I'm happy to accept the null hypothesis!

No problem.

Just for info you can never accept your null hypothesis. You can only fail to reject it based on the confidence intervals used.

The possibility still exists (although slim) that the true value lies outside your confidence intervals and therefore you would have to reject your null and accept your alternative.

/Salsa
 
Back
Top Bottom