1 What is normal?

Let’s draw \(X_i\) from \(1\) to \(n\) successes independently from a population where \(\mu\) and \(\sigma\) are known, then we would discover that the standardized average

\[ \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

is asymptotically normal with mean 0 and variance 1 (often called normal(0,1)). * This can be interpreted as when \(n\) is large enough the average is approximately normal with mean \(\mu\) and standard deviation \(\sigma / \sqrt{n}\).

  • This result is also known as the Central Limit Theorem (CLT), a cornerstone of classical statistics.

  • Normal means a symmetric distribution with mesokurtic “tailedness”, or kurtosis of 3. This implies there are not too many rare outcomes in the tails of the distribution.

2 How can we check this?

Simulation is an excellent way. Now to get your feet good and wet…

Let’s first do this for the binomial distribution, the CLT translates into saying that if \(x_n\) are binomial distribution outcomes with parameters \(n\) and \(p\) then

\[ z = \frac{x_n - np}{\sqrt{np(1-p)}} \]

then the standardized \(x\), called \(z\), is approximately Normal(0,1).

3 Let’s investigate

  • Create binomial random numbers in Excel using BINOM.INV(n, p, RAND()).

  • RAND() is the randomly generated cumulative probability of a successful binomial outcome.

  • Start with just a few trials: \(n = 10\) and \(p = 0.20\).

  • Then generate in 1000 separate cells

Almost bell-shaped. A little lop-sided too… Here are some statistics on our experimental runs.

mean std_dev median skewness kurtosis
-0.027 0.9983 -0.3651 0.3635 3.0047

Almost a “normal” mesokurtotic result of 3.0. A small skewness indicating a little asymmetry.

Now try this

  • Use a lot more than a few trials: \(n = 1000\) and \(p = 0.20\).

  • Then generate in 1000 separate cells

Much more symmetric. Here are some summary statistics.

mean std_dev median skewness kurtosis
0.0096 1.0005 0 -0.0524 3.1849
  • Slightly negatively skewed tail we can eyeball, but very small.

  • Mean is near zero, and median not far from zero too.

  • Standard deviation is nearly 1.

  • Kurtosis is almost on that magic normal mesokurtic number of 3.0.

3.1 Let’s look at this

Here is a standard normal distribution, \(\mu = 0\), and \(\sigma = 1\). This is a distribution centered on 0. How is this like the calculated \(z\)-score?

Here is a graph of the z-score you generated above.

gnorm(0, 1, a = -2, b = 2, calcProb = TRUE)

What is the interpretation of the \(-2\) and \(+2\)

What is probability that an outcome is not 2 standard deviations from the mean?

3.2 Where do normal outcomes come from?

Is anything normal? Normal distributions very naturally come from a very interesting source: sums and averages of random samples of any set of outcomes.

Suppose we think that the number of students out of a random sample of 15 students from course sections (classes) that voted in last year’s election is 1.2 students/year-class. Lets sample this intensity using the Poisson distribution with \(\lambda = 1.2\) 10,000 times. When we do this, we calculate the sums, averages, and variances of each and every of the 10,000 samples. Then we plot. Here’s the result.

clt_sim(15, source = "P", param1 = 1.89)

What do we notice?

