For the following table that has frequencies from counties and their health outcomes, find the probability of a bad health outcome if you were to live in a rural county.
bad good
rural 501 68
urban 103 391
Use the rural only row. \(P(B|R)\) is the probability that bad outcomes occur in rural counties. Add up the cells to get 1063 counties in the sample with 569 rural counties.
About 2% of the time you would expect to succumb to an infection that keeps you from going to work in any given week. What is the probability of being out due to this event exactly once in a 10 week period?
For this problem \(n=10\), \(k=1\), \(p=0.02\). The probability of a single path is
\[
p^k(1-p)^{n-k} = (0.02)*(0.8337 = 0.0167
\]
The number of possible paths that have this probability is
You have no idea about the shape of the distribution of your stock portfolio. You do know that it can range from 10,000 to 15,000 dollars in a 52 week period. What is mean and standard deviation you can expect if you repeatedly (and maybe even annoyingly!) sampled the opinions of 5 expert portfolio managers?
We use the iniform distribution to model our diffuse beliefs. We also sample several portfolio managers (repeatedly!). For this problem \(a = 10000\), \(b = 15000\), and the number od samples \(n = 5\). We let \(X\) be the samples of portfolio values.
The population mean of the uniform distribution is
\[
\mu = \frac{(a+b)}{2} = \frac{10000 + 15000}{2} = 12500
\] The mean of the many sampled means is also equal, as the number of samplings gets ever larger, to the population mean \(\mu = 12500\).
Mid-term grades averaged 74 with a standard deviation of 7. It was such a hard exam that the professor strongly felt that 5% of the students should get an A for their heroic efforts. What is the cut off grade for A’s?
We use the \(z\) score to solve this problem. We know that the \(z\) score for 95% cumulative probability (leaving 5% of students in the upper tail of the normal distribution) is 1.6449. Then with the \(z\) transform of the A cutoff grade \(X\) with mean \(\mu = 74\) and standard deviation \(\sigma = 7\) we have
\[
z = 1.6449 = \frac{X-\mu}{\sigma} = \frac{X-74}{7}
\]
We solve this equation for \(X\), the A cut-off grade as
\[
X = 74 + (1.6449)(7) = 85.514
\]
Confidence Interval – known population variance.
You sample 100 workers in a warehouse that employs 2,000 workers. You find that 60 prefer to form a collective bargaining unit (that is, join a labor union). What is the 95% confidence interval for the proportion of all workers in the warehouse who prefer to form a collective bargaining unit?
We are looking for the confidence interval of average proportions \(\bar p = \bar X / n = 0.6\). The population variance of the binomially distributed number of workers that vote for a union is
\[
\sigma_X^2 = np(1-p)
\]
The population variance of \(\bar X / n = \bar p\) (standard deviation squared) is then
A random sample of 9 application specific integrated circuit (ASIC) chips for inventory control were found to have a mean life of 3,000 operating hours with a standard deviation of 450 hours. The typical ASIC chip standard deviation has never been reported by the manufacturer. What is the confidence interval for the mean life of the entire shipment?
Here we need to use the student-t distribution with an unknown populatino standard deviation. We have sample size \(n-9\), mean life \(\bar X = 3000\), sample standard deviation \(s = 450\). Upper and lower bounds of the comparable \(t\) scores for \(dr = n - 1 = 8\) degrees of freedom are \(+2.306\) and \(-2.306\), respectively.
The confidence interval with upper tail \(t = 2.306\) has lower bound \(LB\)
\[
LB = \bar X - t\,\sigma_{\bar X} = 3000 - (2.306)(150) = 2654.0994
\] and upper bound \(UB\)
You purchase 9 cans of so-called salt-free tomato sauce to test the manufacturer’s claim that there is no more than 35 grams of sodium in each can. The manufacturer also claims that there is a standard deviation of 4 grams for all cans shipped during the past 12 months. Your analysis indicates that there is a mean of 40 grams in your sample. Should you accept the manufacturer’s claim at the 95% level?
We know the population standard deviation of \(\sigma = 4\) and with sample size \(n=9\), the standard deviation of sampled means is \(\sigma / \sqrt{n} = 4 / 9 = 1.3333\).
The null hypothesis \(H_0\) is that the mean equals the manufacturer’s claim \(\mu = \mu_0 = 35\). The alternative hypothesis is that \(\mu > 35\). With an upper tail test like this we find the critical \(z\) score to be the \(z\) associated with a 95% cumulative probability under the normal distribution curve so that \(z^* = 1.6449\). If our sample mean \(z\) score exceeds this number then we reject the null hypothesis in favor of accepting the alternative hypothesis, with a probability of being wrong about the rejection 5% of the time.
Our calculation is
\[
z = \frac{\bar X - \mu_0}{\sigma_{\bar X}} = \frac{3000 - 35}{1.3333} = 3.75
\]
Since \(z = 3.75\) exceeds the critical \(z^* = 1.6449\) we reject the manufacturer’s claim.
Hypothesis testing – sampled variance.
You purchase 9 cans of so-called salt-free tomato sauce to test the manufacturer’s claim that there is no more than 35 grams of sodium in each can. Your analysis indicates that there is a mean of 40 grams with a standard deviation of 7 grams of sodium in your sample. You do not believe the manufacture’s claim that there is only a standard deviation of 4 grams in the so-called population.of all cans produced. Should you accept the manufacturer’s claim at the 95% level? (Use =T.INV() in Excel to compute the critical t-score).
Now we call the manufacturer’s standard deviation into question and use the sample standard deviation. We thus need to use the student-t distribution.
We calculate standard deviation \(s = 7\) and with sample size \(n=9\), the sample standard deviation of sampled means is \(s / \sqrt{n} = 7 / 9 = 2.3333\).
The null hypothesis \(H_0\) is that the mean equals the manufacturer’s claim \(\mu = \mu_0 = 35\). The alternative hypothesis is that \(\mu > 35\). With an upper tail test like this we find the critical \(t\) score to be the \(t\) associated with a 95% cumulative probability under the normal distribution curve so that \(t^* = 1.8595\). If our sample mean \(t\) score exceeds this number then we reject the null hypothesis in favor of accepting the alternative hypothesis, with a probability of being wrong about the rejection 5% of the time.
Our calculation is
\[
t = \frac{\bar X - \mu_0}{s_{\bar X}} = \frac{3000 - 35}{2.3333} = 2.1429
\]
Since \(t = 2.1429\) exceeds the critical \(t^* = 1.8595\) we reject the manufacturer’s claim.
Regression – inference
Regression – slope confidence interval. What is the 95% range of sampled elasticities of the influence of lot size on Bronx housing prices if the sampled mean of slopes is 0.53 with a standard deviation of 0.39? There are 14 observations of prices and lot sizes. (Use =T.INV() in Excel to compute the critical t-score).
UNDER CONSTRUCTION
Regression – slope hypothesis test. Is the sampled elasticity of the influence of lot size on Bronx housing prices meaningful? That is, is the sampled mean elasticity significantly different from zero? The sampled mean of slopes is 0.53 with a standard deviation of 0.39. There are 14 observations of prices and lot sizes. (Use =T.INV() in Excel to compute the critical t-score).