## Question 544:

1

a. Correlation analysis can be based on only two variables-one dependent and one independent. Regression analysis can be based on one dependent, but multiple independent variables. Regression analysis will provide with both a slope of the regression line (just as correlation does) but also the y-intercept. This is the point where the line crosses 0. With both the slope and y-intercept you can build a regression equation which allows you to predict the value of the dependent variable from values of the independent variable.

b. A correlation coefficient reveals the relationship or association between two variables. It is bounded between -1 and 1. A stronger relationship will have values closer to -1 and 1. A correlation of 0 means there is no association.

c. For the quick rule, since the only two ingredients in testing the significance of correlation are r and the sample size we can use the following rule:  If the absolute value of r is greater than or equal to 2/SQRT(N) then we can consider it significant.  For example, given a correlation of .2 and sample size of 10 we get  2/SQRT(10) = .632, which is greater than .2, so we'd say its not significant. If we had a correlation of .7 and sample of 20 we'd have 2/SQRT(20) = .4472, which is less than the correlation of .7 so we'd say this correlation is significant.
See http://kelley.iupui.edu/dsjie/Tips/krehbiel.htm for more detail.

The limitations of this approach is that is conservative and will tend to tell you correlations are not significant, when doing the full t statistic on the correlation would reveal them as significant. In short, if the result of 2/SQRT(N) is close to r, then I'd use the t statistic (see e below).

d. You'd need the sum of the product of (x-xbar)*(y-ybar) for all value pairs. These are the x value minus the mean of x and the y value minus the mean of y. Multiply these all together then add them up. That's one sum. Then you need the total number of pairs minus 1 (called the degrees of freedom).

e. The most used way of testing a correlation coefficient for significance is to use the t statistic and the so-called quick rule which was explained above in c. The t statistic is evaluated using the following formula
t = r / sqrt[(1r2)/(N2)]
For example, given a correlation of .3 and sample size of 16 we get:
t = .3 /sqrt[(1.09)/(162)]
This evaluates to a t of 1.176 on n-2 degrees of freedom (since we estimate the population mean of both pairs). Using the excel function =TDIST(1.176,14) we get a p-value of .258, which is above .05 and makes that correlation not significantly different from 0. We can also use the web-calculator here http://www.usablestats.com/calcs/tdist and get the same result.
The very quick rule tells us 2/SQRT(16) = .5 which is also not significant since it is greater than the correlation of .3.