Statistics - Assumptions underlying correlation and regression analysis (Never trust summary statistics alone)


The magnitude of a correlation depends upon many factors, including:

Anscombe's quartet

In 1973, statistician Dr. Frank Anscombe developed a classic example to illustrate several of the assumptions underlying correlation and linear regression.

The below scatter-plots have the same correlation coefficient and thus the same regression line.

They have also the same mean and variance.

<MATH> Y = 3 + 0.5 X </MATH>

Only the first one on the upper left satisfies the assumptions underlying a:

Datasaurus: Never trust summary statistics alone; always visualize your data

The Datasaurus Dozen. While different in appearance, each dataset has the same summary statistics (mean, standard deviation, and Pearson's correlation) to two decimal places.


Bring your own doodles linear regression

Most of the examples of using linear regression just show a regression line with some dataset. it's much more fun to understand it by drawing data in. Bring your own doodles linear regression

How to

test the assumptions in a regression analysis ?

To test the assumptions in a regression analysis, we look a those residual as a function of the X productive variable. (X remaining on the X axis and the residuals coming on the Y axis).

For each of the individual, the residual can be calculated as the difference between the predicted score and a actual score.

If the assumptions are good, there must be:

  • no relationship between X and the residual. They must be independent. The relation coefficient must be zero.
  • some of the points above zero and some of them below zero. It will indicate Homoscedasticity

Powered by ComboStrap