Statistics Learning - Prediction Error (Training versus Test)


The Prediction Error tries to represent the noise through the concept of training error versus test error.

We fit our model to the training set. We take our model, and then we apply it to new data that the model hasn't seen.

In general, because the more data, the bigger the sample size, the more information you have, the lower the error is.


Error calculation:


Most of the regression metrics are based on the residual, see Regression Accuracy metrics for a list.


Most of the classification metrics are based on the error rate, see Classification Accuracy metrics for a list.



Training error is the error we get applying the model to the same data from which we trained.


Test error is the error that we incur on new data. The test error is actually how well we'll do on future data the model hasn't seen.

Training vs Test

Training error almost always UNDERestimates test error, sometimes dramatically.

Training error usually UNDERestimates test error when the model is very complex (compared to the training set size), and is a pretty good estimate when the model is not very complex. However, it's always possible we just get too few hard-to-predict points in the test set, or too many in the training set. Then the test error can be LESS than training error, when by chance the test set has easier cases than the training set.

Powered by ComboStrap