## Question 878:

1

1. Q878_1_878-chi-square.xls
We are asked to determine whether the observed frequency count of credit applications follows a Poisson distribution. The Poisson distribution is a discrete (count data) distribution that looks like a skill jump without the bump at the end. It starts high on the left then goes down and tails off to the right. A normal distribution looks like a bell and is for continuous data (like height, weight and IQ).

This is a classic case of using the Chi-Square Goodness of fit test. We want to know how good our observed data "fit" or match the expected frequency of the Poisson distribution with a mean of 2. The null hypothesis is that there is no difference between our data and the expected distribution. We will reject the Null if the p-value is less than .05.

Poisson distributions typically are found for events that don't happen a lot but are generally applied to many situations that have a count of things happening in a place or time. For example, the number of car accidents in an intersection or the number of errors a user commits while attempting to complete a task on software.

A Poisson distribution has only one parameter, the mean (unlike the normal which has two: mean and standard deviation). We are given that mean of 2 so we can compare whether our observed data actually follows this distribution.

As indicated in the question, the first thing we do is we need to find what the probability of each count is.

1. In excel we use the formula =POISSON(0,2,FALSE) where 0 is the 1st frequency count, 2 is the mean and FALSE means to use the exact probability instead of the cumulative probability. We get an expected probability of .1353. That means, if this really were a Poisson distribution with a mean of 2, we'd expect to see 0 credit card applications around 13.5% of the time.
2. To get the expected frequency, we just multiply the percentage times 300 (the sample size) and we get an expected count of 40.60.
3. We continue this for all values up to 4. When we get to 5 it is a special case because it says "5 or more."  So we can't use the regular formula as this will give us the probability of exactly 5. To find the probability of 5 or more, we just subtract the total probability of 0 to 4 from 1. We can do this because by definition the probability of all values has to add up to 1 or 100%. The total expected probability from 0 to 4 is .947 making the probability of 5 or more 1-.947 = .0526 and an expected count of 15.80.
4. Now we plug our values in to the Chi-Square. It uses the simple formula (Observed Count- Expected Count)2 / Expected. So for the 1st value of 0 we have: (50- 40.60)2 / 40.60 = 2.18.
5. We repeat step 4 for all values up through 5.
6. Add up all the values from step 5, this becomes our test-statistic, the Chi-Square. We get a Chi-Square value of 4.16.
7. To evaluate whether this is significant we use a Chi-Square table or use the Excel formula =CHIDIST(4.16,5), where the 5 is the degrees of freedom (found as the total number of rows -1).
8. We get a p-value of .5273 which is well above the alpha level of .05 so we fail to reject the Null Hypothesis and conclude that this data likely comes from a Poisson distribution with a mean of 2.
See the attached spreadsheet for the calculations and data.