## Question 876:

1

We're asked to look at the relationship between two variables, which is a classic correlation/regression problem.  We can use the Excel Data Analysis Tool-pack. Select Regression from the menu.
a.    Insert a scatter plot graph with Income as the Y and number of occupied rooms as the X. See the attached excel file for a scatter diagram between rooms occupied and income. We can see that there is a slight positive relationship between the two. Right-Click on one of the dots and select "Add Trendline." When the menu appears select "Display Equation on Chart" and Display "R-Square value"  We can see the dots scatter somewhat around the diagonal (regression line). We know the relationship is positive because the diagonal line points up (from left to right) and not down. By looking at a couple points we can see that as the number of occupied rooms increases the income gets slightly larger.
b.    The correlation coefficient "r" can be found by using excel function =PEARSON() or by selecting the Regression option from the Data Analysis Add-in. Highlight both the X and Y variables as input (where Y is income). The output tells us the Pearson Correlation is a .4229. In the regression output it is referred to as "Multiple R."

If there were no association between variables we'd have a correlation of 0. A correlation of .4229 could be characterized as a medium positive correlation ( correlation's can range from -1 to 1).

c.    We look at the p-value from the F-ratio and see that it is .03515 which is less than the .10 alpha level so we conclude that it is statistically different than 0.
d.    By squaring the correlation coefficient we get something called the coefficient of determination (r-squared). It tells us how much variation in the dependent variable (income) is explained by the independent variable (rooms occupied). We see the r-squared value is .1789 which can also be expressed as 17.89% of the variation in income is explained by occupied rooms. The adjusted r-square value takes into account the sample size and in this case adjusts it down a bit to 14.3% variation. When in doubt use the adjusted-r square value.

See the attached excel file for the graph and output.