Probability Maths Question

Associate
Joined
31 Dec 2004
Posts
1,384
Location
Essex, UK
I am trying to work this out... and my maths is failing me, I know its basically lottery maths

You have 14000 boxes with balls in them, 350 of the balls are green and the rest are yellow. You want to find as many green balls as possible but you only have 35 chances.

I think the probability of getting one green ball is 8.75%...

But I can't then work out how to calculate the probability of getting 2 green balls or how it steps down to possibly getting all 35. Any one care to assist and save my sanity? :D
 
Last edited:
Assuming 1 ball per box, after you open the first box you then have 13999 boxes and either 350 or 349 green balls left.
 
I am trying to work this out... and my maths is failing me, I know its basically lottery maths

You have 14000 boxes with balls in them, 350 of the balls are green and the rest are yellow. You want to find as many green balls as possible but you only have 35 chances.

I think the probability of getting one green ball is 8.75%...

But I can't then work out how to calculate the probability of getting 2 green balls or how it steps down to possibly getting all 35. Any one care to assist and save my sanity? :D

Your question is not well defined and needs clarification before it can be answered. Does each box have one and only one ball in it? Is each box removed from the set once you have selected it or does it get returned empty? And do you ever do your own maths homework?
 
I fear the information in that link is beyond my brain power, if you could give an explanation/workings that would be handy, thanks! :D

Binomial distribution is going to be beyond you if you don't have at least A-level mathematics. Not saying don't learn it, but you may need to study properly to do so. Answer the questions people have asked and we can probably give you the answer you want, however.
 
40-1 ( 2.4% ) chance of the first step being a green ball

2nd step would depend if you picked a green ball on the first step

I think ...... :p
 
Ok let me try the story again lol

There are 14000 boxes in front of you, each box has one ball in.
350 of the balls are green and the rest are yellow.
You want to find as many green balls as possible but you only have time to look in 35 boxes.
Once a ball is found you put it to one side to look for more.

This is a for a non ball related issue where I work but the theory would apply, I can come up with amazing stuff in excel (in my opinion) normally but for some reason my brain isn't just getting this :)
 
Last edited:
Once a ball is found you put it to one side to look for more.

This sounds like non-replacement. Consequently the probabilities change after each ball is discarded (because the total number of balls reduces by one and the number of balls of one colour reduces by one). You need to use the hypergeometric distribution.
 
To draw one green:

k (num green drawn) = 1, K - k (num green not drawn) = 349, K (num green at start) = 350
n - k (num yellow drawn) = 34, N + k - n - K (num yellow not drawn) = 13616, N - K (num yellows at start) = 13650
n (num balls drawn) = 35, N - n = 13965 (num balls not drawn), N (total num balls) = 14000

f(k; N, K, n) = [KCk * (N - K)C(n - k)] / NCn

P(x = 1) = f(1; 14000, 350, 35) = [350C1 * (14000 - 350)C(35 - 1)] / 14000C35

Where nCr is the combinations of r in n, given by n! / (r! * (n - r)!) - where ! is the factorial operator.

These numbers are crazy big. You will struggle to find a device to handle 13650!. I ran it through Matlab 2017a on a 64 bit platform and it calculated 0.3705 with the caveat that "Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits"

But hopefully you get the idea.
 
I fear the information in that link is beyond my brain power, if you could give an explanation/workings that would be handy, thanks! :D


OK am back in now after my night out.

now as you've looked at that link to the binomial distribution and it doesn't make sense lets start with a simple example to build the intuition behind this

say you've got 10 boxes and 3 green balls

Hopefully it is obvious that the probability of drawing once and getting a green ball is 3/10 and so likewise the probability of not getting a green ball (getting a yellow ball) is 7/10

now consider the case where you draw twice - what are the possible outcomes?

you could get a green ball then a green ball = 3/7 * 3/10 = 9/100
you could get a green ball then a yellow ball = 3/7 * 7/10 = 21/100
you could get a yellow ball then a green ball = 7/10 * 3/10 = 21/100
you could get a yellow ball then a yellow ball = 7/10 * 7/10 = 49/100

now look at the above possibilities - if you were to ask what is the chance of getting 1 green ball after two draws then you can see there are two ways for that to occur - the order doesn't matter, both have the same probability so the overall probability of getting one green ball from two draws is 2 * 21/100 = 42/100

now lets consider the possibilities for 3 draws - denoting green balls with 'G' and yellow balls with 'Y'

GGG
GGY
GYG
YGG
YYG
YGY
GYY
GGG

Now how many ways can you pick three green balls in three draws? You can see from the above 1

How about two green balls? Again you can see from the above - 3

How about 1 green ball? Again you can see from the above there are three ways you could pick one green ball from 3 draws

How about no green balls - well you can see there is one way that no green balls are picked.

So how can we use this to calculate probabilities?

Well lets say we want the probability of picking 2 green balls from three draws... we could write

(3/10 * 3/10 * 7/10) + (3/10 * 7/10 * 3/10) + (7/10 * 3/10 * 3/10)

but as you can see that is a bit tedious

instead we can write 3* 3/10 * 3/10 * 7/10 as we know there are three ways this can happen

or better still 3 * (3/10)^2 * 7/10 which comes to 189/1000

now do we really want to write out all the possibilities or is there an easier way of working them out? Yup there is, as demonstrated in the post above too you can use the binomial distribution, since the wikipedia link confused you it is probably better to just demonstrate how this applies in the simple example above.

this is the important bit from the article - the probability mass function of the binomial distribution:

Screen_Shot_2017-03-24_at_00.19.04.png


Now if you're struggling with the article I'm not going to assume much - so I will explain that the symbol '!' represents the factorial:

3! means 1*2*3 = 6
7! means 1*2*3*4*5*6*7 = 5040

you should be able to find the '!' symbol on your calculator

now look at the above line of the above screen shot, this is how we calculate how many different ways an event can occur - so n is the number of trials and k is the number of successes. In our simple case of picking 2 green balls from 3 attempts n = 3 and k = 2

so using the above formula we have 3!/(2! * (3-2)!) = 3!/(2! * 1!) = 6/2 = 3

this is the same answer we had before by simply writing out the possible combinations

in fact if you've got a calculator with the nCr symbol you can simply use that and type 3C2 and your calculator will do the above calculation for you - I'll make use of that notation for simplicity... essentially '3 choose 2'

now look again at the screen shot, the formula for the binomial distribution, this is the 'probability mass function' or pmf

now p is the probability of getting a green ball = 3/10

1-p is the probability of NOT getting a green ball = 7/0 (sometimes we'd call this 'q')

so to use the binomial distribution for our simple example we have:

(3 choose 2) * (3/10)^2 * (7/10)^1 = 189/1000

same answer as we had above.... hopefully you're can now see how that binomial distribution can be used to solve problems of this type - to solve your larger problem we simply need to plug in the numbers

 
Last edited:
so in your larger case

p = 350/14000 = 0.0250

1-p = 0.975

and the number of trials 'n' = 35

so you want to know the probability of picking 1 green ball:

35C1 * 0.025^1 * 0.975^34 = 0.3700 to 4.dp

and you wanted to know the probability of picking 2 green balls:

35C2 * 0.025^2 * 0.975^33 = 0.1613 to 4.d.p

and you wanted to know the probability of picking 35 green balls

35C35 * 0.025^35 * 0.975^0

which is the same as simply writing 0.025^35 = 8.4703e-57 to 4.d.p basically it is a tiny tiny number and you've got little chance of picking 35 green balls in 35 tries

(you'll note it should be intuitive that 35C1 = 35 (as there are 35 ways you could pick 1 green ball) and 35C35 =1 as there is only 1 way you can pick 35 green balls)







 
I wish I'd read your second post and the rest of the thread....

If you're not replacing the boxes then Tuppy is correct in that you need the hypergeometric distribution:

Screen_Shot_2017-03-24_at_01.00.56.png


I don't really want to write another wall of text again - but briefly hopefully you can see for our simple 10 boxes 3 green ball example that picking say 3 green balls would be:

3/10 * 2/9 * 1/8 = 0.0083 to 4.dp

you can write out the probabilities for the other combinations outlined in the previous post yourself

now using the distribution, we could do as Tuppy has done in his answer - calculate the factorials etc.. though fortunately you'll find both the binomial and hypergeometric and other distributions will be implemented for you in excel, Matlab etc..

so using Matlab lets first try the simple example we calculated above - 3 green balls from 3 draws without replacement:

Screen_Shot_2017-03-24_at_01.11.59.png


sure enough we get the same answer to 4.d.p

now lets try the questions you originally asked:

35 green balls

1 green ball

2 green balls

Screen_Shot_2017-03-24_at_01.10.55.png


and helpfully the middle answer, for 1 green ball, is the same as the answer given in Tuppy's post above

as you said you use excel - you can make use of the built in functions there for the binomial distribution or hypergeometric distribution:

https://support.office.com/en-us/article/BINOM-DIST-function-c5ae37b6-f39c-4be2-94c2-509a1480770c
https://support.office.com/en-us/article/HYPGEOM-DIST-function-6dbd547f-1d12-4b1f-8ae5-b0d9e3d22fbf

so perhaps try yourself using HYPGEOM.DIST setting cumulative = FALSE


notice also that the probabilities we got above aren't too dissimilar to those calculated with the binomial distribution... after all it doesn't affect things too much if a ball isn't replaced given there are 14000 of them to start with and we're only picking 35 in total!
 
Back
Top Bottom