Regression is a statistical analysis used:
Regression analysis is a statistical process (Supervised function) for:
“Regression” problems are principally aimed to resolve problem with a continuous value (numeric) outcome but can also be applied to nominal outcome
Regression analysis helps one understand how the typical value of a (outcome|dependent) variable changes when any one of the (predictor|independent) variables is varied, while the other (predictor|independent) variables are held fixed. (ie which among the independent variables are related to the dependent variable)
The term “regression” was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). For Galton, regression had only this biological meaning, but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context.
“Regression” comes historically from the idea of regression towards the mean, which is a concept which was discussed in the early 1900s. But we have to live with this term because it's become time honored and you can change this term by model.
Many (techniques|methods) for carrying out regression analysis have been developed.
Technique | Parametric | Description |
---|---|---|
Nearest Neighbors | No | |
Linear regression | Yes | |
Data Mining - (Global) Polynomial Regression (Degree) | Yes | |
Statistics - Standard Least Squares Fit (Gaussian linear model) | ||
ordinary least squares regression | Yes | The earliest form of regression which was published by Legendre in 1805 and by Gauss in 1809. |
Multiple Regression (GLM) | ||
Support Vector Machine (SVM) | ||
Logistic regression | ||
LeastMedSq | LeastMedSq gives an accurate regression line even when there are outliers. However, it is computationally very expensive. In practical situations it is common to delete outliers manually and then use LinearRegression. |
A model predicts a specific target value for each case from among (possibly) infinitely many values.
The regression model is used to model or predict future behaviour and involves the following variables:
<MATH> Y = f(X) + \epsilon </MATH>
<MATH> \begin{array}{ccc} Y & \approx & f ( {X}, \beta ) \\ E(Y | X) & = & f ( {X}, \beta ) \end{array} </MATH> The approximation is usually formalized as E(Y | X)
The true function above generates always errors, When there is no error, there is overfitting.
A model is:
Example of Simple Linear Regression Model: <math>Model = \hat{Y} = B_0 + B_1.{X_1}</math>
The goal is to produce better models so we can generate more accurate predictions
We can improve a model by:
When we're doing regression, we're more engaging in inferential statistics and we're going to look at this statistics:
in order to know if the results from this sample is going to generalize to other samples.
We want to know if it's possible to make an inference from this sample data to a more general population.
The lm function