What is the probability that in a box of a dozen donuts picked from 14 flavors there’s no more than 3 flavors in the box?

Problem

Dave’s Donuts offers 14 flavors of donuts (consider the supply of each flavor as being unlimited). The “grab bag” box consists of flavors randomly selected to be in the box, each flavor equally likely for each one of the dozen donuts. What is the probability that at most three flavors are in the grab bag box of a dozen?

Solution

For this we will need the multinomial distribution, which is a discrete probability distribution. In a string of characters there are $k$ characters possible to fill one position of the string, which is $n$ characters long. Therandom variable $X_1$ counts the number of occurrences of character 1 in the string, $X_2$ the number of occurrences of character 2, and so on until $X_{k}$. Let $p_1,\ldots,p_{k}$ be the individual probability each of the characters could appear in a position of the string; each position is filled independently of the characters in other positions. Let $x_1,\ldots,x_{k} \in [n]\cup \left\{0\right\} $ such that $x_1+\ldots+x_{k}=n$. Then

$\displaystyle \Prob{X_1=x_1,\ldots,X_{k}=x_{k}} =\frac{n!}{x_1!\ldots x_{k}!}p_1^{x_1}\ldots p_{k}^{x_{k}}.$

Here, $k=14$, $p_1=\ldots=p_{14}=\frac{1}{14}$, and $n=12$. So we can say

$\displaystyle \Prob{X_1=x_1,\ldots,X_{14}=x_{14}}=\frac{12!}{x_1!\ldots x_{14}!}\left(\frac{1}{14} \right)^{12}.$ (1)

We will say

$\displaystyle \Prob{\text{At most three flavors}}=$ $\displaystyle \Prob{\text{Exactly one flavor}}$    
  $\displaystyle +\Prob{\text{Exactly two flavors}}$    
  $\displaystyle +\Prob{\text{Exactly three flavors}}.$    

Compute each of those probabilities separately.

If $X_1=12$ and $X_2=\ldots=X_{14}=0$, there is exactly one flavor in the box.(1) shows the probability this happens is $\frac{1}{14^{12}}$. Since we could pick an $X_{i}=12$ and there were 14 ways to make this decision, we can say

$\displaystyle \Prob{\text{Exactly one flavor}}=\frac{1}{14^{11}}.$ (1)

Let’s now compute $\Prob{\text{Exactly two flavors}}$. We start by fixing $X_3=\ldots=X_{14}=0$. We get

$\displaystyle \Prob{X_3=\ldots=X_{14}=0}=\sum_{i=0}^{12} \binom{n}{i} \left( \frac{1}{14}\right)^{12} = \left( \frac{2}{14} \right)^{12}.$ (2)

Unfortunately (2) includes cases where there’s actually only one flavor present in the box, so compute

$\displaystyle \Prob{X_1\ge 1,X_2\ge 1,X_3=0,\ldots,X_{14}=0}$ $\displaystyle = \sum_{i=1}^{11}\binom{n}{i} \left( \frac{1}{14} \right)^{12}$ (3)
  $\displaystyle = \left( \frac{2}{14}\right)^{12}-2\left(\frac{1}{14} \right)^{12}.$ (4)

Of course we could have picked different variables to fix at zero, andthere were $\binom{14}{2}$ ways to pick the variables to fix at zero (or equivalently, pick the variables to not fix at zero), finally yielding

$\displaystyle \Prob{\text{Exactly two flavors}}=\binom{14}{2} \left( \left( \frac{2}{14}\right)^{12}-2\left(\frac{1}{14} \right)^{12} \right).$ (5)

Now to compute $\Prob{\text{Exactly three flavors}}$. Again we start by fixing $X_4=\ldots=X_{14}=0$ and compute

$\displaystyle \Prob{X_1\ge 1,X_2\ge 1,X_3\ge 1,X_4=\ldots=X_{14}=0}=\sum_{i=1}^......\left( \frac{12!}{i!j!(12 - i - j)!} \right) \left(\frac{1}{14} \right)^{12}.$ (6)

We could try and use tricks to compute (6) or we can acknowledge that we’re busy people and ask SymPy to do it. Check that the following Python code is correct:

from sympy import init_session, binomial
init_session()

def multinomial(params):
    if len(params) == 1:
        return 1
    return binomial(sum(params), params[-1]) * \
                    multinomial(params[:-1])

l1 = list()
for i in range(1, 10 + 1):
    v = sum([multinomial([i, j, (12 - i - j)]) for j in range(1,
                                                    11 - i + 1)])
    l1.append(v)

sum(l1)/14**12    # Solution

The resulting probability is $\frac{129789}{14173478093824}$. We could have picked different flavors to fix, and there were $\binom{14}{3}$ ways to pick the flavors to fix, so we get

$\displaystyle \Prob{\text{Exactly three flavors}} = \binom{14}{3}\left( \frac{129789}{14173478093824} \right) = \frac{1687257}{506195646208}.$ (7)

We can write (1) and (5) as $\frac{1}{4049565169664}$ and $\frac{26611}{4049565169664}$, respectively. Summing these probabilities yields

$\displaystyle \Prob{\text{At most three flavors}}=\frac{3381167}{1012391292416}\approx3.34\times 10^{-6}.$ (8)

Related Question

This is the proper way to obtain the probability that there are at most three flavors in the “grab bag” box, but how many boxes exist in which there are at most three flavors when we discount the number of ways there are to arrange the donuts in a box?

If there’s exactly one flavor, then we pick it and fill the box with that flavor; there’s 14 ways to pick one flavor. If there’s exactly two flavors in the box, we’ll call them Flavor 1 and Flavor 2. There is at least one donut of Flavor 1 and one of Flavor 2. Now pick the rest of the donuts’ flavors, order doesn’t matter, there is replacement; there are $\binom{10 + 2 - 1}{10} =\binom{11}{10} = 11$ ways to do that. Then pick the two flavors: there’s$\binom{14}{2}$ ways to do that, and thus $11\binom{14}{2}$ boxes with exactly twoflavors. Similarly, for exactly three flavors, there are $\binom{14}{3}\binom{9 + 3 - 1}{9} = \binom{14}{3} \binom{11}{9}$ ways for there to be exactly three flavors. Sum these numbers. (See https://math.stackexchange.com/q/3230011.) There are 21,035 such boxes.

Special thanks to Math Stack Exchange user wavex for his help with this problem! He provided the following R script for simulating it:

total = 0
for (y in 1:10000000){
  x = rmultinom(1,12,c(1/14,1/14,1/14,1/14,1/14,
                       1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14,1/14))
  x <- c(x)

  count = 14
  for(i in x){
    if(i==0){
     count = count -1
   }   

  }
 if(count <= 3){total = total + 1}
}
sprintf("%.20f", total / 10000000)

In his run of the code this event occured only 27 out of 10,000,000 times. A rare event indeed!

About this document …

This document was generated using the LaTeX2HTML translator Version 2019 (Released January 1, 2019)

The command line arguments were:
latex2html -split 0 -nonavigation -lcase_tags -image_type gif Example6Solution.tex

The translation was initiated on 2019-05-20


Packt Publishing published a book for me entitled Hands-On Data Analysis with NumPy and Pandas, a book based on my video course Unpacking NumPy and Pandas. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

Advertisements

My Tutorial Book on Anaconda, NumPy and Pandas Is Out: Hands-On Data Analysis with NumPy and Pandas

I announced months ago that one of my video courses, Unpacking NumPy and Pandas, was going to be turned into a book. Today I’m pleased to announce that this book is available!

Continue reading

Learn Foundations of Python Natural Language Processing and Computer Vision with my Video Course: Applications of Statistical Learning with Python

I’m pleased to announce my fourth and final video course. The course has already been out for a couple months by now, but that doesn’t mean it’s too late for me to write about it!

Continue reading

High Dimensional Data, MSRI, and San Francisco in 2018; Reflections

Last fall my adviser alerted me to the MSRI workshop on high-dimensional data and suggested I may be interested. I applied and was accepted to participate. Thus, from July 9th to July 20th I stayed in San Francisco (for the first time in my life), living in the dorms of UC Berkeley and attending the workshop. I got to experience San Francisco’s legendary weather (escaping Salt Lake City’s triple-digit heat) while learning mathematics. I enjoyed the experience and wanted to share it.

Continue reading

Stock Data Analysis with Python (Second Edition)

Introduction

This is a lecture for MATH 4100/CS 5160: Introduction to Data Science, offered at the University of Utah, introducing time series data analysis applied to finance. This is also an update to my earlier blog posts on the same topic (this one combining them together). I strongly advise referring to this blog post instead of the previous ones (which I am not altering for the sake of preserving a record). The code should work as of July 7th, 2018. (And sorry for some of the formatting; WordPress.com’s free version doesn’t play nice with code or tables.)

Continue reading

Learn Basic Python and scikit-learn Machine Learning Hands-On with My Course: Training Your Systems with Python Statistical Modelling

This post is actually months late, but like with my last video course announcement, it’s better late than never. And besides, of my video courses, I had the most fun writing this one.

Continue reading