Statistics - Standard Least Squares Fit (Gaussian linear model)

Thomas Bayes


least square is a regression method.

In a least squares, the coefficients are found in order to make RSS as small as possible.

When p is be much bigger than n (the number of samples), we can't use full least squares, because the solution's not even defined.

Legendre published the method of least squares in 1805.


Standard least squares is scale-invariant The scaling of the variable doesn't matter because if a feature is multiplied by a constant, the coefficient can be divided by the same constant in order to get the same target.

Whether a length is measured in feet or inches is not going to matter because the coefficient can just account for the change in units.


Ordinary Least Square (OLS) is a least square procedure performs on the the raw predictors.

Documentation / Reference

Recommended Pages
Thomas Bayes
(Machine learning|Inverse problems) - Regularization

Regularization refers to a process of introducing additional information in order to: solve an ill-posed problem or to prevent overfitting. This information is usually of the form of a penalty...
Rating Collaborative Filtering
(Prediction|Recommender System) - Collaborative filtering

Collaborative filtering is a method of making automatic predictions (filtering) the interests of a user by collecting preferences or taste information from many users (collaborating). But in general,...
Feature Importance
Data Mining - (Attribute|Feature) (Selection|Importance)

Feature selection is the second class of dimension reduction methods. They are used to reduce the number of predictors used by a model by selecting the best d predictors among the original p predictors....
Feature Extraction
Data Mining - (Feature|Attribute) Extraction Function

Feature extraction is the second class of methods for dimension reduction. dimension reduction It creates new attributes (features) using linear combinations of the (original|existing) attributes. ...
Linear Vs True Regression Function
Machine Learning - Linear (Regression|Model)

Linear regression is a regression method (ie mathematical technique for predicting numeric outcome) based on the resolution of linear equation. This is a classical statistical method dating back more...
Card Puncher Data Processing
Notion - (Best approximation|Closest|Closeness)

This notion of “best approximates” comes up again and again: in least-squares, a fundamental data analysis technique, image compression, in principal component analysis, another data analysis...
Card Puncher Data Processing
R - K-fold cross-validation (with Leave-one-out)

Cross-validation in R. Leave-one-out cross-validation in R. Each time, Leave-one-out cross-validation (LOOV) leaves out one observation, produces a fit on all the other data, and then makes a...
Statistical Learning - Lasso

Lasso is a shrinkage method. Ridge regression doesn't actually select variables by settings the parameters to zero. Lasso is a more recent technique for shrinking coefficients in regression that overcomes...
Lasso Vs Ridge Regression211
Statistics - (Shrinkage|Regularization) of Regression Coefficients

Shrinkage methods are more modern techniques in which we don't actually select variables explicitly but rather we fit a model containingall p predictors using a technique that constrains or regularizes...
Thomas Bayes
Statistics - Adjusted R^2

A big R squared indicates a model that really fits the data well. But unfortunately, you can't compare models of different sizes by just taking the one with the biggest R squared because you can't compare...

Share this page:
Follow us:
Task Runner