Skip to main content

The Cereal Box Prize Distribution

In October 2015, General Mills introduced a line of Star Wars prizes in some of their cereal boxes.
Much like the boy who asked, "Mr. Owl, how many licks does it take to get to the Tootsie Roll center of a Tootsie Pop?" I wanted to know:

How many boxes of cereal do I need to buy to get all the prizes?

If you looked at the image, screamed "6!" at your screen, and wondered why there are additional sections to this post, let me clarify: we don't know what prize is inside until we open the box.

It's random.

Now, we're in statistics country.



In statistics, we're all about distributions. That is, models that say how likely something is. You're probably familiar with at least one, the normal distribution (a.k.a. the Gaussian distribution, a.k.a. the bell curve).
If we think about GPA, the normal distribution says (if the average is a C), that C would be in the middle, which is also the most common. A would be far out to the right (which is less common) and B would be in-between. The height of the distribution is an indicator of how likely it is.

Another common distribution (though you may not have thought of it that way) is the Bernoulli distribution. It also goes by another name: the coin flip. But this isn't just any coin flip. It can be an unfair coin, where the probability of heads may not be exactly 1/2. As opposed to the normal distribution which can have any real number as its outcome (just with varying probabilities, concentrated around the middle), the Bernoulli distribution comes out as heads with probability p and tails with probability 1-p. The question it asks is, "how likely is a coin flip to come out heads?" This will be the basis for the cereal box prize distribution.

Now, let's go deeper.



One more distribution we need is the geometric distribution. It asks a question similar to the cereal box prize distribution: "how many times do I need to flip a coin before I see heads?" While it may sound like a nitpicky nuance to say this is different from the Bernoulli distribution, it's actually completely different.

If we take one experiment, that is, one sample, from the Bernoulli distribution, we either get heads or tails (a 0 or 1). If we take one experiment from the geometric distribution, we start flipping a coin, stop when we see a head, and report the number of tosses. This can be 1, 2, 3, 4, ... all the way up to infinity. Though it may sound impossible, we can fairly easily compute the probability that we toss the coin once, twice, etc.

Perhaps a longer discussion for another time, but as soon as we can compute those probabilities, we can then compute all sorts of things. For example, we can find the average number of coin tosses we need to see a head. If the probability of a single toss coming up heads is p, then on average, we should expect it to take 1/p times. Intuitively, the higher the chance we have of a single toss being heads, the fewer times we have to toss the coin to see heads (again, on average).

Finally, we're ready for the cereal box prizes.



I just lied. We're going to do a simplified version of the cereal box prize distribution. The full version is more tricky, and I haven't quite figured it out yet. Look for Part II sometime soon.

The simplified version is when there are only 2 prizes. Let's say C-3PO and R2-D2. The probability of getting C-3PO will be p, which makes the probability of getting R2-D2 1-p. If p=0.5, then it's an even chance of each. p=0.6 is 60-40, and p=0.25 is 25-75.

Now, we can actually answer our main question: how many boxes (on average) do I need to buy to reunite the galaxy-saving duo? It turns out that this question is just like the geometric distribution, with a small twist. We buy one box first. If it's C-3PO, then we only care about finding R2-D2. If it's R2-D2, then we only care about finding C-3PO.

We just have two geometric distributions! One based on a success rate p, the other with a success rate 1-p. If you do a bunch of math...
We arrive at the nice, elegant answer:
So the expected number of boxes we need to buy looks like this:
For a 50-50 chance, on average, we'll need to buy 3 cereal boxes. The farther p is from 0.50, the worse it gets. A 17-83 split causes us to buy 6 boxes on average. A 10-90 would be 10 boxes. A 2-98 would be 50 boxes!

And there you have it. The cereal box prize distribution.



Image credits:

  • Hunt, Kevin. "These are the droids you're looking for." Taste of General Mills. 2015 October.
  • Freeman, Matthew. “A visual comparison of normal and paranormal distributions.” Journal of Epidemiology and Community Health. 2006 January; 60(1): 6

Comments

Post a Comment

Popular posts from this blog

COVID-19 Case Fatality Rate

Researchers are hard at work to determine all facets of COVID-19 ( https://ourworldindata.org/coronavirus  for example). I'm no expert, but I just wanted to play with some data a bit and see what I got. I do sort of thing a lot, and decided to share. This is a quick attempt at estimating the COVID-19 mortality rate / case fatality rate (number of people who die from it over number of people who got it). That's not the only important number, and can vary from place to place given appropriate medical attention, etc. But I ran across this tweet :  and I was curious about what story that data might tell us. Disclaimer up front. There's a lot we don't know as of this writing (March 16, 2020) about COVID-19. There may very well be people who have it without symptoms or who have otherwise not been tested. People who have it now may die from it. People who have it now may recover from it. There's a lot of uncertainty. Also, disclaimer, I'm having a little trouble...

Programmatic Thinking and the Google Doodle

There's a joke (or as my former professor might have said, "it's like a joke, it just lacks humor") about how programmers think: Programmer: "How do I make pasta?" Recipe: How to make pasta: Fill an empty pot with cold water Boil it Add pasta Programmer goes home and wants to make pasta. She sees a pot of boiling water already on the stove. So she pours it out, fills it up with cold water, boils it, then adds pasta. Again, like a joke, but potentially lacks humor. For those not "in the know," the joke is that programmers tend to reduce problems to ones they've already solved. In this case, the programmer already had a solution for making pasta. So, she changed things around until she could use it. In this case, we end up doing more work than necessary, but in general, this can be extremely effective. This week, the Google Doodle is pretty great. It's a coding game geared toward kids, in the same vein as Scratch or...