How likely is that?

Learning outcomes

How likely is it we ask the question, “How likely is that?” Alot! This unit moves our thinking from simply counting how often an event we have sampled occurs to how probable. The move is subtle.

Relative frequency empirically becomes associated with the probability that an event occurs, and
Cumulative relative frequency empirically becomes associated with the cumulative probaability that a set of events occurs.

In this unit we will

Define and use sample spaces of events (all of our sampled data) summarized in a cross-tabulation table
Calculate the probability that either one event or another might occur
Calculate the probability that both one event and another might occur
Calculate the probability that an event might occur given that another event might occur
Calculate aa revision of a probability given new or different information has arrived

So it is one table and four calculations in this unit. What do you get from this exercise? The ability to use empirical data and query the many possible relationships in that data all summarized by intersections (both - and), unions (either - or), and conditions (one thing given the existence of another). These queries will answer the question: “how likely is that?”

Invitation to a formal

Let’s get speicifc about what we are talking:

Sample space $S$: all possible outcomes of an experiment $E$ Example: “Raining two days in a row” $S = \{ (rain, not), (not, rain), (rain, rain), (not, not) \}$
Events in a sample sample $F$: subset of sample space Example: “If it rains today, then”" $F = \{ (rain, not), (rain, rain) \}$
Measure of probability $P$: maps $F$ to the real numbers between $[0,1]$

$p \in [0,1]$
for all events $i \in F$: $\Sigma_i p_i = 1$
the union of any disjoint events $A_j \in F$: $\cup_i\,Prob(A_j) = \Sigma_i\,Prob(A_j)$

We use these same ideas when we talk about non-overlapping intervals of data, the frequency of the number of observations in an interval, the relative and cumulative relative frequency of the empirical occurence of data in an interval. What we are now adding is how one data stream migh relate to another.

We can now ask a question like this: “how likely is it that when we see high prices, do we also see high square-ffot lot sizes?”

Let’s relate our formal concepts to the language we normally use.

A bit of English

Let’s wrap some everyday language around these formal concepts.

phrase	meaning	math
$x$ is at least $k$	the least $x$ is allowed to be is $k$	$x \geq k$
$x$ is at most $k$	the most $x$ is allowed to be is $k$	$x \leq k$
both $x$ and $y$	common elements of $x$ and $y$	$x \cap y$
either $x$ or $y$	one, or the other, or both $x$ and $y$	$x \cup y$

Dress down a bit

We can use probability to help us anticipate the range of possible outcomes we might experience. For example, in two days will the stock of Johnson and Johnson rise or fall? It turns out it can do both potentially, How likely, and by how much, will that stock price rise (or fall)?

Suppose the experiment $E$ consists of stock price moves UP (U) and DOWN (D). So on day 1 (tomorrow), the stock price sample space is just $S_1 = \{U, D\}$, where the subscript stands for day 1, that simple.

But on day two, the day after tomorrow, the paths get a bit more complicated. If the stock goes up tomorrow, then on the next day the stock might either go up or go down. Thus there are two events we have to account for in two days if the stock price was up the day before $S_2^U = \{UU, UD\}$, where the superscript helps us to cull only day 1 ups. The same logic will apply if on day one the stock price had gone down. In this case the sequence is $S_2^D = \{DU, DD\}$

Putting it all together the sample space by the end of day two may be represented by $S_2 = \{UU, UD, DU, DD\}$.

just like a toss of a coin. Here the coin is the “market.” Given that the coin is fair and that the coin is tossed in an independent and identical manner, it is reasonable to apply the equally likely model: a 50% chance of a rise or a fall in any given round of flip the coin, or a day in the life of a stock price. But we can also load the coin and be more pessimistic or optimistic about a rise or fall in the price.

What is the probability of at least 1 UP tick on day one or two? Looking at the sample space we see three (out of four) elements $\{UU, UD, DU\}$ have at least one UP; thus, Prob(at least 1 UP) = 3/4 = 75%.
What is the probability of no UP ticks)? Notice that the event {no UPs} = {at least one UP$\}^c$ = {all DOWN }, so that using the notations $-UP$ for no (anything, in this case UP):

\[ Prob(-UP) = 1 - Prob(at\,\,least\,\,one\,\,UP) \] \[ = Prob(\{UU\}\cup\{UD\}\cup\{DU\}) = 1 - 3/4 = 1/4. \] Here we use the language either-or, a union of events. We will use this trick often.

Try another

Suppose the experiment $E$ consists of 92 people some of whom identify with being female (F) or being male (M), some of whom smoke (S) or don’t (N). There are several events in this space that combine with gender and smoking status.

What does the event space look like?

Here is a table of empirical results from the survey.

Counts: Gender x Smoking
	no	yes	Sum
female	27	8	35
male	37	20	57
Sum	64	28	92

We notice the counts are in a bold black font. The row and column sums are in bold blue. We can use this information to answer several questions to enhance our knowledge of what might or might not happen. That knowledge is what we might label anticipation, or even belief, and if we are really bold, a forecast or prediction.

What is…

There are several ways we can interpret our results by querying:

What is the probability of being both a female and a smoker?
What is the probability of either being a female or a smoker?
If you are a female, what is the probability of being a smoker?

We can use our table of counts to help us with our inquiries.

Let’s try question one first. What can we calculate?

Let’s use the same idea but twist it a bit into an either-or, or union calculation instead of an intersection to answer question 2. Let’s try it.

If you are a female, what is the probability of being a smoker?

We focus on smokers only and thus look at the column in the table for yes, a smoker. We use this as the basis (also known as the denominator) for figuring out the probability we seek.

We again consult our tabulation table. But this time we only look at the counts in the female row at the cell that intersects with smokers, because respondents who identify as females might smoke or not.

We need this data:

How many females smoke?
How many smokers are there in the survey?

Now we define $Prob(S | F)$ as the probability of seeing a smoker in the survey, who also happens to report as female. The symbol $|$ is read as conditional or given, so the $S | F$ means $S$ conditional on $F$ and $S$ given only $F$. Thus we look at all $F$ observations an filter only $S$ of those observations.

We would be right if we guessed that $F$ is the independent variable and that $S$ depends on the $F$ part of the sample. In fact one way of looking at our regression model

\[ Y = a + bX + e \] is using the notation that $Y \mid X$: given $X$ how do we get $Y$. Here $Y$, and $S$ are conditional on the occurrences in $X$ and $F$. Just like the way we calculated the slope $b$ as

\[ b = \frac{cov(X,Y)}{var(X)} \]

to recognize that there are covariations of $X$ and $Y$ relative to the total variation in $X$, we have

\[ Prob(S\,\, given\,\,F) = Prob(S\mid F) = \frac{n(S\cap F) / n(S)}{n(F) / n(S)} = \frac{n(S\cap F)}{n(F)} \]

or with numbers from the tabulation we have for gender and smoking status:

\[ Prob(S \mid F) = \frac{Prob(S\cap F)}{Prob(F)} = \frac{8.70}{38.04} = 22.87\% \]

In the numerator are the number of ways in which smokers, the dependent variable, and females interact (intersect and might even co-vary). In the denominator are the total number of females, the independent variable.

But…what is $Prob(F \mid S)$?

Law of Conditional Probability (Bayes)

Our female smokers lead us to a more general approach here. If we know $Prob(B \mid A)$, can we find $Prob(A \mid B)$? \[ Prob(B \mid A) =\frac{Prob(A \cap B)}{Prob(A)} = \frac{Prob(B \cap A)}{Prob(A)} \] \[ Prob(A \cap B) = Prob(B \mid A) Prob(A) \] \[ Prob(A \cap B) = \frac{n(B)}{n(A)} \frac{n(A)}{n(S)} = \frac{n(B)}{n(S)} \]

But then we also have \[ Prob(B \cap A) = Prob(A \mid B) Prob(B) \] \[ Prob(B \cap A) = \frac{n(A)}{n(B)} \frac{n(B)}{n(S)} = \frac{n(A)}{n(S)} \] But from our work on contingency tables we also know that $ Prob(B A) = Prob(A B)$, and this means that

\[ Prob(B \cap A) = Prob(A \mid B) Prob(B) = Prob(A \cap B) = Prob(B \mid A) Prob(A) \] \[ Prob(A \mid B) Prob(B) = Prob(B \mid A) Prob(A) \] Bayes’ Rule (Theorem) \[ Prob(A \mid B) = \frac{Prob(B \mid A) Prob(A)}{Prob(B)} \] Yes we can.

Practice 1

Your company keeps detailed records of quality metrics for its manufacturing operations. At the gargleblaster plant in West Adelbrad the morning shift (“M”), 200 gargleblasters are defective (“D”) for every 100,000 produced. For the evening shift (“E”), 500 items are defective per 100,000 produced. In the average 24 hour period 1,000 items are produced by the morning shift and 600 by the evening shift.

What is the probability that an item picked at random from the average total output produced in the 24 hour period:

Was produced by the morning shift and is defective?
Was produced by the evening shift and is defective?
Was produced by the evening shift and is not defective?
Is defective, whether produced by the evening or the morning shift?
Is defective and was produced only by the morning shift?
Is not defective and was produced only by the evening shift?

What is the probability, $Prob()$, that an item picked at random from the average total produced in the 24 hour period:

Was produced by the morning shift and is defective?

We are looking for the probability of output that is from “both the morning shift and is defective,” that is, $Prob(M \cap D)$. This probability is found from the multiplication law of probability which states

\[ Prob(M \cap D) = Prob(M) Prob(D \mid M) \]

and similarly for $Prob(E \cap D)$.

The marginal probabilities of picking an item when produced by the morning shift (“M”) is

\[ Prob(M) = \frac{1,000}{1,600} = 0.625 \]

and by the evening shift (“E”) \[ Prob(E) = \frac{600}{1,600} = 0.375 \]

These are marginal probabilities across both defective and not defective items.

The probability of picking a defective item (“D”) from the morning shift is

\[ Prob(D \mid M) = \frac{200}{100,000} = 0.002 \]

This conditional probability examines the split of defective and not defective items within the sample of morning shift items.

The conditional probability of picking a defective item from the sample of evening shift output is

\[ Prob(D \mid E) = \frac{500}{100,000} = 0.005 \]

The probability that an item is picked at random from a total of 1,600 items produced during a 24 hour period was produced by the morning shift and is defective is

\[ Prob(M \cap D) = Prob(M) Prob(D \mid M) = (0.625)(0.002) = 0.00125 \]

Was produced by the evening shift and is defective?

\[ Prob(E \cap D) = Prob(E) Prob(D \mid E) = (0.375)(0.005) = 0.001875 \]

Was produced by the evening shift and is not defective?

The conditional probability that given the evening shift output only, the probability of no defects ($\neg$ means “not”) is

\[ Prob(\neg D \mid E) = 1 - Prob(D \mid E) = 1 - 0.005 = 0.995 \]

so we then have

\[ Prob(E \cap \neg D) = Prob(E) Prob(\neg D \mid E) = (0.375)(0.995) = 0.373125 \]

Is defective and produced by the morning or the evening shift?

The expected number of defective items from the morning shift equals the probability of a defective item from the morning shift times the number of items produced only by the morning shift.

\[ Prob(D \mid M)count(M) = (0.002)(1,000) = 2 \]

defective items.

From the evening shift we expect

\[ Prob(D \mid E)Count(E) = (0.005)(600) = 3 \]

defective items. Thus we expect $2 + 3 = 5$ defective items to be produced during a 24 hour period.

If there were to be 5 defective items, then the probability of a random pick of 5 defective items out of a total of 1,600 items is

\[ Prob(D) = 5/1,600 = 0.003125 \]

Is defective and was produced only by the morning shift?

We are looking for $Prob(M \mid D)$. We already have $Prob(D \mid M)$. Here we can use Bayes’ theorem to retrieve $Prob(M \mid D)$.

Let’s derive Bayes’ theorem first. We already know that according to the multiplication law of probability we have

\[ Prob(D \cap M) = Prob(M)Prob(D \mid M) \]

Dividing both sides of this equation by $Prob(M)$, and rearranging, gives us a formula for conditional probability

\[ Prob(D \mid M) = \frac{Prob(D \cap M)}{Prob(M)} \]

We also know that $Prob(D \cap M) = Prob(M \cap D)$ from our work on contingency tables. Thus substituting $Prob(M \cap D)$ for $Prob(D \cap M)$ we get

\[ Prob(D \mid M) = \frac{Prob(M \cap D)}{Prob(M)} = \frac{Prob(D)Prob(M \mid D)}{Prob(M)} \]

Solving for $Prob(M \mid D)$ we finally have Bayes’ theorem

\[ Prob(M \mid D) = \frac{Prob(M)Prob(D \mid M)}{Prob(D)} \]

For our data, $Prob(M) = 0.625$, $Prob(D \mid M) = 0.002$, and $Prob(D) = 0.003125$, so we have

\[ Prob(M \mid D) = \frac{(0.625)(0.002)}{0.003125} = 0.60 \]

The probability that a defective item picked at random from the total 24 hour output is produced by the morning shift is 60%. This probability expresses the contribution of defective items by the morning shift only.

Is not defective and was produced only by the evening shift?

Now we look for $Prob(E \mid \neg D)$. Using Bayes’ theorem we have

\[ Prob(E \mid \neg D) = \frac{Prob(E)Prob(\neg D \mid E)}{Prob(\neg D)} \]

For our data, $Prob(E) = 0.375$, $Prob(\neg D \mid E) = 1 - 0.005 = 0.995$, and $Prob(\neg D) = 1- 0.003125 = 0.996875$, so we have

\[ Prob(E \mid \neg D) = \frac{(0.375)(0.995)}{0.996875} = 0.3742947 \]

The probability that there are no defective items picked at random from the total 24 hour output when produced by the evening shift is 37.43%. This probability expresses the contribution of not defective items by the evening shift only.

Practice 2

The marketing department of a everyday-low-price retailer is attempting to optimize marketing promotions. The department analysts estimate that approximately 1 in 50 potential buyers of a product will see the ad after hearing about the ad from a friend, and 1 in 5 sees a corresponding ad on the internet. One in 100 potential buyers will either hear about the ad or see it on the internet. One in 3 actually purchases the product after seeing the ad, while 1 in 10 without seeing it.

What is the probability that a randomly selected potential customer will purchase the product?

Define the following events:

A: buyer hears about the ad from a friend (“word of mouth”)
B: buyer sees only the internet ad
C: buyer purchases the product

What is the probability that the buyer hears about the ad through word of mouth?

\[ Prob(A) = \frac{1}{50} = 0.02 \]

What is the probability that the buyer sees only the internet ad?

\[ Prob(B) = \frac{1}{5} = 0.20 \]

What is the probability that a buyer knows about the ad?

\[ Prob(A \cap B) = \frac{1}{100} = 0.01 \]

What is the probability that a buyer hears about the ad and sees it on the internet?

\[ Prob(A \cup B) = Prob(A) + Prob(B) - Prob(A \cap B) = 0.02 + 0.20 - 0.01 = 0.21 \]

What is the probability that buyer does not see or does not hear about an ad?

\[ Prob(\neg A \cup \neg B) = 1 - Prob(A \cup B) = 1 - Prob(A \cup B) = 1 - 0.21 = 0.79 \]

What is the probability that a buyer will purchase after knowing about the ad?

\[ Prob(C \mid (A \cup B)) = \frac{1}{3} = 0.33 \]

What is the probability that a buyer will purchase without knowing about the ad?

\[ Prob(C \mid (\neg A \cup \neg B)) = \frac{1}{10} = 0.1 \]

Finally, we get to answer the question: What is the probability that a buyer will purchase the item?

There are two ways the buyer would purchase the item. One way is through the ad channel, so that the probability of buying is composed of the probability that a buyer hears about the ad and sees it on the internet and the probability that a buyer will purchase after knowing about the ad. Here $Ads = (A \cup B)$, so that

\[ Prob(Buying \cap Ads) = Prob(A \cup B)Prob(C \mid (A \cup B)) = (0.21)(0.33) = 0.0693 \]

The second channel is without knowledge of ads. The probability of buying is composed of the probability that a buyer does not hear about the ad and does not see it on the internet and the probability that a buyer will still purchase after not knowing about the ad. Here $\neg Ads = (\neg A \cup \neg B)$ so that

\[ Prob(Buying \cap \neg Ads) = Prob(\neg A \cup \neg B)Prob(C \mid (\neg A \cup \neg B)) = (0.79)(0.10) = 0.0790 \]

Combining the two channels we have

\[Prob(Buying) = 0.0693 + 0.0790 = 0.1483\]

The probability that a randomly picked customer buys the product is 14.83%.

Nothing is random!

well not quite? Many outcomes are simply uncertain:

there’s a 20% chance of rain: will it rain? it might (rain = 1, with probability 0-20%) it might not (rain = 0, with probability 80%)

Rain as defined here is a so-called random variable. Any random variable (really a functional mapping) is a set of possible outcomes, each of which is assigned a probability of occurrence.

a number of outcomes {1, 0}
each outcome assigned a probability {20%, 80%}

Suppose 20% of the people in a city prefer Pepsi-Cola as their soft drink of choice.

If a random sample of six people is chosen, the number of Pepsi drinkers could range from?

Shown here are the possible numbers of Pepsi drinkers in a sample of six people and the probability of that number of Pepsi drinkers occurring in the sample.

Drinkers	Probability
0	0.262
1	0.393
2	0.246
3	0.082
4	0.015
5	0.002
6	0.000

Is there a random variable in our midst?

Is this a discrete or a continuous random variable?

What is the probability that 2 or more drinkers are Pepsi-cola drinkers?

What is the mean and standard deviation of the random variable drinkers of Pepsi, which we will call $D$?

phrase	meaning	math
\(x\) is at least \(k\)	the least \(x\) is allowed to be is \(k\)	\(x \geq k\)
\(x\) is at most \(k\)	the most \(x\) is allowed to be is \(k\)	\(x \leq k\)
both \(x\) and \(y\)	common elements of \(x\) and \(y\)	\(x \cap y\)
either \(x\) or \(y\)	one, or the other, or both \(x\) and \(y\)	\(x \cup y\)

Index	Drinkers	Probability	Expected Outcome	Expected Deviation Squared
\(i\)	\(D_i\)	\(P(D_i)\)	\(P(D_i) D_i\)	\(P(D_i) ( D_i - \mu)^2\)
0	0	0.262	0	0.377
1	1	0.393	0.393	0.016
2	2	0.246	0.492	0.157
3	3	0.082	0.246	0.266
4	4	0.015	0.061	0.118
5	5	0.002	0.008	0.029
6	6	0	0	0.000