11.4 Explanatory power of correlations (R2)
One way to determine the explanatory power of a linear regression equation is by determining the coefficient of determination, R2 – that is, the square of the correlation coefficient. R2 measures the percentage of variation in the dependent variable that can be attributed to the independent variable. That is, it is a goodness of fit measure for linear regression models. R2 is always a value between 0 and 1. A dataset with a low R2 value will have data points spread wide from the regression line; a dataset with a high R2 value will have data points clustered close to the regression line.
Example: Correlation and causation
Researchers found a correlation between the latitude people live in and mortality rate due to skin cancer. A scatter plot revealed that the relationship appeared to be linear, and the correlation coefficient of the data was 0.71 (a high positive correlation). What percentage of the mortality rate due to skin cancer can be attributed to latitude?
R2 = (0.71)2 = 0.50. Therefore, 50% of the mortality rate due to skin cancer can be attributed to latitude. Or in other words, 50% of the mortality rate due to skin cancer is due to factor(s) other than latitude. Also, take note from this example that seemingly high values of r (e.g. r ≈ 0.7) explain only about 50% of the variability in the response variable.
It is important to use the correct notation for the correlation coefficient (use r or R) and the coefficient of determination (use r2 or R2). Using these incorrectly is likely to lead to misleading data interpretations.