Statistics - Bias-variance trade-off (between overfitting and underfitting)

Thomas Bayes

About

The bias-variance trade-off is the point where we are adding just noise by adding model complexity (flexibility). The training error goes down as it has to, but the test error is starting to go up. The model after the bias trade-off begins to overfit.

When the nature_of_the_problem is changing the trade-off is changing.

Formula

The ingredients of prediction error are actually:

  • bias: the bias is how far off on the average the model is from the truth.
  • and variance. The variance is how much that the estimate varies around its average.

Bias and variance together gives us prediction error.

This difference can be expressed in term of variance and bias:

<math>e^2 = var(model) + var(chance) + bias</math>

where:

  • <math>var(model)</math> is the variance due to the training data set selected. (Reducible)
  • <math>var(chance)</math> is the variance due to chance (Not reducible)
  • bias is the average of all <math>\hat{Y}</math> over all training data set minus the true Y (Reducible)

As the flexibility (order in complexity) of f increases, its variance increases, and its bias decreases. So choosing the flexibility based on average test error amounts to a bias-variance trade-o ff.

Illustration

Model Complexity Error Training Test

where:

We want to find the model complexity that gives the smallest test error.

When the nature_of_the_problem is changing the trade-off is changing.

Nature of the problem

When the nature of the problem is changing the trade-off is changing.

  • the truth is wiggly and the noise is high, so the quadratic do the best

Bias Variance Trade Off 1

  • the truth is smoother, so the linear model do really well

Bias Variance Trade Off 2

  • the truth is wiggly and the noise is low, so the more flexible do the best

Bias Variance Trade Off 3

Model Complexity is better/worse

Model Complexity = Flexibility

  • The sample size is extremely large, and the number of predictors is small: Flexible is better. A flexible model will allow us to take full advantage of our large sample size.
  • The number of predictors is extremely large, and the sample size is small: Flexible is worse. The flexible model will cause overfitting due to our small sample size.
  • The relationship between the predictors and response is highly non-linear. A flexible model will be necessary to find the nonlinear effect.
  • The variance of the error terms, i.e. sigma^2 = var(Epsilon) , is extremely high: Flexible is worse. A flexible model will cause us to fit too much of the noise in the problem.

Documentation / Reference





Recommended Pages
Cross Validation Cake
(Statistics|Data Mining) - (K-Fold) Cross-validation (rotation estimation)

Cross-validation, sometimes called rotation estimation is a resampling validation technique for assessing how the results of a statistical analysis will generalize to an independent new data set. This...
Model Funny
Data Mining - (Function|Model)

The model is the function, equation, algorithm that predicts an outcome value from one of several predictors. During the training process, the models are build. A model uses a logic and one of several...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Statistics Residual
Statistics - (Residual|Error Term|Prediction error|Deviation) (e| )

The residual is a deviation score measure of prediction error in case of regression. The difference between an observed target and a predicted target in a regression analysis is known as the residual...
Overfitting Underfitting
Statistics - (Variance|Dispersion|Mean Square) (MS)

The variance shows how widespread the individuals are from the average. The variance is how much that the estimate varies around its average. It's a measure of consistency. A very large variance means...
Overfitting Underfitting
Statistics - Bias (Sampling error)

Bias in stats Bias is a systematic error in sampling or measurement. Systematic measurement error represents bias. It has an effect on the entire distribution (It shift it right or left). The bias...
Ridge Regression Lambda Versus Standardized Coefficients
Statistics - Ridge regression

Ridge regression is a shrinkage method. It was invented in the '70s. The least squares fitting procedure estimates the regression parameters using the values that minimize RSS. In contrast, the...



Share this page:
Follow us:
Task Runner