Log-in
|
Contact Jeff
|
Email Updates
About
Stats Q&A
Calculators
Tutorials
Products
Home
View All Tutorials
Fundamentals of Statistics 3: Sampling :: The central limit theorem
If we were to
repeatedly take samples
from a population and graph all the means we'd have a normal distribution of sample means with the standard deviation of these means called the
standard error
.
This is such an important concept in statistics, almost everything else you learn after this depends on the fundamental concept. It's called the
central limit theorem.
The central limit theorem in it's shortest form states that the sampling distribution of the sampling means approaches a normal distribution as the sample size gets larger, regardless of the shape of the population distribution. So the sample means will be normally distributed (especially when the sample is above 30) if the population is positively skewed, negatively skewed or even binomial (having only 2 outcomes).
Here are two key points from the central limit theorem:
The average of our sample means will itself be the population mean.
The standard deviation of the sample means equals the
standard error
of the population mean.
The cool part about the central limit theorem is that the sampling distribution of the means is also normally distributed even if the population is not. For example, I have a large dataset of people who were attempting to locate an address on a rental car website. Only 54% of those who attempted were successful. In the dataset there are only two options, success and failure which are represented by a 1 and a 0 (see the graph below).
In a large data set I have a set of people who tried to find a rental car location and 54% were able to find it on their first attempt. The distrbution is binomial (having only 2 options --pass or fail).
I randomly sampled 50 people from this large dataset then computed the average percentage. Just like
with the heights
, I repeated this 30 times and graphed the means.
Graph of the distribution of percentages from taking 30 samples of 50 users from a population which is binomial. The distribution of means is normal.
Once again we see the distribution of means is roughly normally distributed--the key tenet of the central limit theorem.
Next we'll see how we can use the
properties of the normal curve
, such as
z-scores
and the
empirical rule
to know how variable our population mean is.
View All Tutorials
How well did you understand this lesson?
Avg. Rating 6.73 (345)
Not at all
Neutral
Extremely
0
1
2
3
4
5
6
7
8
9
10
What didn't make sense?
Name
Email
Not Published
Comment
To prevent comment spam, please answer the following question before submitting (tags not permitted) :
What is 3 + 5:
(enter the number)
July 20, 2014 | anonomous wrote:
Thank you!