Log-in | Contact Jeff | Email Updates
Fundamentals of Statistics 3: Sampling :: The central limit theorem
If we were to repeatedly take samples from a population and graph all the means we'd have a normal distribution of sample means with the standard deviation of these means called the standard error

This is such an important concept in statistics, almost everything else you learn after this depends on the fundamental concept. It's called the central limit theorem. The central limit theorem in it's shortest form states that the sampling distribution of the sampling means approaches a normal distribution as the sample size gets larger, regardless of the shape of the population distribution. So the sample means will be normally distributed (especially when the sample is above 30) if the population is positively skewed, negatively skewed or even binomial (having only 2 outcomes).

Here are two key points from the central limit theorem:

The average of our sample means will itself be the population mean.  
The standard deviation of the sample means equals the standard error of the population mean.  

The cool part about the central limit theorem is that the sampling distribution of the means is also normally distributed even if the population is not. For example, I have a large dataset of people who were attempting to locate an address on a rental car website. Only 54% of those who attempted were successful. In the dataset there are only two options, success and failure which are represented by a 1 and a 0 (see the graph below).

In a large data set I have a set of people who tried to find a rental car location and 54% were able to find it on their first attempt. The distrbution is binomial (having only 2 options --pass or fail).

I randomly sampled 50 people from this large dataset then computed the average percentage. Just like with the heights, I repeated this 30 times and graphed the means.
Graph of the distribution of percentages from taking 30 samples of 50 users from a population which is binomial. The distribution of means is normal.

Once again we see the distribution of means is roughly normally distributed--the key tenet of the central limit theorem.

Next we'll see how we can use the properties of the normal curve, such as z-scores and the empirical rule to know how variable our population mean is.

How well did you understand this lesson?

Avg. Rating 6.71 (349)

Not at all    Neutral    Extremely

What didn't make sense?