Learning outcomes

How likely is it we ask the question, “How likely is that?” Alot! This unit moves our thinking from simply counting how often an event we have sampled occurs to how probable. The move is subtle.

  • Relative frequency empirically becomes associated with the probability that an event occurs, and

  • Cumulative relative frequency empirically becomes associated with the cumulative probaability that a set of events occurs.

In this unit we will

  • Define and use sample spaces of events (all of our sampled data) summarized in a cross-tabulation table

  • Calculate the probability that either one event or another might occur

  • Calculate the probability that both one event and another might occur

  • Calculate the probability that an event might occur given that another event might occur

  • Calculate aa revision of a probability given new or different information has arrived

So it is one table and four calculations in this unit. What do you get from this exercise? The ability to use empirical data and query the many possible relationships in that data all summarized by intersections (both - and), unions (either - or), and conditions (one thing given the existence of another). These queries will answer the question: “how likely is that?”

Invitation to a formal

Let’s get speicifc about what we are talking:

  • Sample space \(S\): all possible outcomes of an experiment \(E\) Example: “Raining two days in a row” \(S = \{ (rain, not), (not, rain), (rain, rain), (not, not) \}\)

  • Events in a sample sample \(F\): subset of sample space Example: “If it rains today, then”" \(F = \{ (rain, not), (rain, rain) \}\)

  • Measure of probability \(P\): maps \(F\) to the real numbers between \([0,1]\)

  1. \(p \in [0,1]\)

  2. for all events \(i \in F\): \(\Sigma_i p_i = 1\)

  3. the union of any disjoint events \(A_j \in F\): \(\cup_i\,Prob(A_j) = \Sigma_i\,Prob(A_j)\)

We use these same ideas when we talk about non-overlapping intervals of data, the frequency of the number of observations in an interval, the relative and cumulative relative frequency of the empirical occurence of data in an interval. What we are now adding is how one data stream migh relate to another.

We can now ask a question like this: “how likely is it that when we see high prices, do we also see high square-ffot lot sizes?”

Let’s relate our formal concepts to the language we normally use.

A bit of English

Let’s wrap some everyday language around these formal concepts.

phrase meaning math
\(x\) is at least \(k\) the least \(x\) is allowed to be is \(k\) \(x \geq k\)
\(x\) is at most \(k\) the most \(x\) is allowed to be is \(k\) \(x \leq k\)
both \(x\) and \(y\) common elements of \(x\) and \(y\) \(x \cap y\)
either \(x\) or \(y\) one, or the other, or both \(x\) and \(y\) \(x \cup y\)

Dress down a bit

We can use probability to help us anticipate the range of possible outcomes we might experience. For example, in two days will the stock of Johnson and Johnson rise or fall? It turns out it can do both potentially, How likely, and by how much, will that stock price rise (or fall)?

Suppose the experiment \(E\) consists of stock price moves UP (U) and DOWN (D). So on day 1 (tomorrow), the stock price sample space is just \(S_1 = \{U, D\}\), where the subscript stands for day 1, that simple.

But on day two, the day after tomorrow, the paths get a bit more complicated. If the stock goes up tomorrow, then on the next day the stock might either go up or go down. Thus there are two events we have to account for in two days if the stock price was up the day before \(S_2^U = \{UU, UD\}\), where the superscript helps us to cull only day 1 ups. The same logic will apply if on day one the stock price had gone down. In this case the sequence is \(S_2^D = \{DU, DD\}\)

Putting it all together the sample space by the end of day two may be represented by \(S_2 = \{UU, UD, DU, DD\}\).

just like a toss of a coin. Here the coin is the “market.” Given that the coin is fair and that the coin is tossed in an independent and identical manner, it is reasonable to apply the equally likely model: a 50% chance of a rise or a fall in any given round of flip the coin, or a day in the life of a stock price. But we can also load the coin and be more pessimistic or optimistic about a rise or fall in the price.

  1. What is the probability of at least 1 UP tick on day one or two? Looking at the sample space we see three (out of four) elements \(\{UU, UD, DU\}\) have at least one UP; thus, Prob(at least 1 UP) = 3/4 = 75%.

  2. What is the probability of no UP ticks)? Notice that the event {no UPs} = {at least one UP\(\}^c\) = {all DOWN }, so that using the notations \(-UP\) for no (anything, in this case UP):

\[ Prob(-UP) = 1 - Prob(at\,\,least\,\,one\,\,UP) \] \[ = Prob(\{UU\}\cup\{UD\}\cup\{DU\}) = 1 - 3/4 = 1/4. \] Here we use the language either-or, a union of events. We will use this trick often.

Try another

Suppose the experiment \(E\) consists of 92 people some of whom identify with being female (F) or being male (M), some of whom smoke (S) or don’t (N). There are several events in this space that combine with gender and smoking status.

What does the event space look like?



Here is a table of empirical results from the survey.

Counts: Gender x Smoking
no yes Sum
female 27 8 35
male 37 20 57
Sum 64 28 92

We notice the counts are in a bold black font. The row and column sums are in bold blue. We can use this information to answer several questions to enhance our knowledge of what might or might not happen. That knowledge is what we might label anticipation, or even belief, and if we are really bold, a forecast or prediction.

What is…

There are several ways we can interpret our results by querying:

  1. What is the probability of being both a female and a smoker?

  2. What is the probability of either being a female or a smoker?

  3. If you are a female, what is the probability of being a smoker?

We can use our table of counts to help us with our inquiries.

Let’s try question one first. What can we calculate?



Let’s use the same idea but twist it a bit into an either-or, or union calculation instead of an intersection to answer question 2. Let’s try it.



  1. If you are a female, what is the probability of being a smoker?

We focus on smokers only and thus look at the column in the table for yes, a smoker. We use this as the basis (also known as the denominator) for figuring out the probability we seek.

We again consult our tabulation table. But this time we only look at the counts in the female row at the cell that intersects with smokers, because respondents who identify as females might smoke or not.

We need this data:

  • How many females smoke?

  • How many smokers are there in the survey?



Now we define \(Prob(S | F)\) as the probability of seeing a smoker in the survey, who also happens to report as female. The symbol \(|\) is read as conditional or given, so the \(S | F\) means \(S\) conditional on \(F\) and \(S\) given only \(F\). Thus we look at all \(F\) observations an filter only \(S\) of those observations.

We would be right if we guessed that \(F\) is the independent variable and that \(S\) depends on the \(F\) part of the sample. In fact one way of looking at our regression model

\[ Y = a + bX + e \] is using the notation that \(Y \mid X\): given \(X\) how do we get \(Y\). Here \(Y\), and \(S\) are conditional on the occurrences in \(X\) and \(F\). Just like the way we calculated the slope \(b\) as

\[ b = \frac{cov(X,Y)}{var(X)} \]

to recognize that there are covariations of \(X\) and \(Y\) relative to the total variation in \(X\), we have

\[ Prob(S\,\, given\,\,F) = Prob(S\mid F) = \frac{n(S\cap F) / n(S)}{n(F) / n(S)} = \frac{n(S\cap F)}{n(F)} \]

or with numbers from the tabulation we have for gender and smoking status:

\[ Prob(S \mid F) = \frac{Prob(S\cap F)}{Prob(F)} = \frac{8.70}{38.04} = 22.87\% \]

In the numerator are the number of ways in which smokers, the dependent variable, and females interact (intersect and might even co-vary). In the denominator are the total number of females, the independent variable.

But…what is \(Prob(F \mid S)\)?


Law of Conditional Probability (Bayes)

Our female smokers lead us to a more general approach here. If we know \(Prob(B \mid A)\), can we find \(Prob(A \mid B)\)? \[ Prob(B \mid A) =\frac{Prob(A \cap B)}{Prob(A)} = \frac{Prob(B \cap A)}{Prob(A)} \] \[ Prob(A \cap B) = Prob(B \mid A) Prob(A) \] \[ Prob(A \cap B) = \frac{n(B)}{n(A)} \frac{n(A)}{n(S)} = \frac{n(B)}{n(S)} \]


But then we also have \[ Prob(B \cap A) = Prob(A \mid B) Prob(B) \] \[ Prob(B \cap A) = \frac{n(A)}{n(B)} \frac{n(B)}{n(S)} = \frac{n(A)}{n(S)} \] But from our work on contingency tables we also know that $ Prob(B A) = Prob(A B)$, and this means that

\[ Prob(B \cap A) = Prob(A \mid B) Prob(B) = Prob(A \cap B) = Prob(B \mid A) Prob(A) \] \[ Prob(A \mid B) Prob(B) = Prob(B \mid A) Prob(A) \] Bayes’ Rule (Theorem) \[ Prob(A \mid B) = \frac{Prob(B \mid A) Prob(A)}{Prob(B)} \] Yes we can.

Practice 1

Your company keeps detailed records of quality metrics for its manufacturing operations. At the gargleblaster plant in West Adelbrad the morning shift (“M”), 200 gargleblasters are defective (“D”) for every 100,000 produced. For the evening shift (“E”), 500 items are defective per 100,000 produced. In the average 24 hour period 1,000 items are produced by the morning shift and 600 by the evening shift.

What is the probability that an item picked at random from the average total output produced in the 24 hour period:

  1. Was produced by the morning shift and is defective?
  2. Was produced by the evening shift and is defective?
  3. Was produced by the evening shift and is not defective?
  4. Is defective, whether produced by the evening or the morning shift?
  5. Is defective and was produced only by the morning shift?
  6. Is not defective and was produced only by the evening shift?


The probability that there are no defective items picked at random from the total 24 hour output when produced by the evening shift is 37.43%. This probability expresses the contribution of not defective items by the evening shift only.

Practice 2

The marketing department of a everyday-low-price retailer is attempting to optimize marketing promotions. The department analysts estimate that approximately 1 in 50 potential buyers of a product will see the ad after hearing about the ad from a friend, and 1 in 5 sees a corresponding ad on the internet. One in 100 potential buyers will either hear about the ad or see it on the internet. One in 3 actually purchases the product after seeing the ad, while 1 in 10 without seeing it.

  • What is the probability that a randomly selected potential customer will purchase the product?


Nothing is random!

well not quite? Many outcomes are simply uncertain:

  • there’s a 20% chance of rain: will it rain? it might (rain = 1, with probability 0-20%) it might not (rain = 0, with probability 80%)

Rain as defined here is a so-called random variable. Any random variable (really a functional mapping) is a set of possible outcomes, each of which is assigned a probability of occurrence.

  • a number of outcomes {1, 0}

  • each outcome assigned a probability {20%, 80%}

Suppose 20% of the people in a city prefer Pepsi-Cola as their soft drink of choice.

  1. If a random sample of six people is chosen, the number of Pepsi drinkers could range from?


Shown here are the possible numbers of Pepsi drinkers in a sample of six people and the probability of that number of Pepsi drinkers occurring in the sample.

Drinkers Probability
0 0.262
1 0.393
2 0.246
3 0.082
4 0.015
5 0.002
6 0.000
  1. Is there a random variable in our midst?


  1. Is this a discrete or a continuous random variable?


  1. What is the probability that 2 or more drinkers are Pepsi-cola drinkers?


  1. What is the mean and standard deviation of the random variable drinkers of Pepsi, which we will call \(D\)?