Statistics - Model Building (Training|Learning|Fitting)


Training is a supervised problem phase also known as building whereby the software analyses many cases where the target value is already known.

In the training process, the model “learns” the logic for making the prediction.

A building operation is also called “training” a model, and the model is said to “learn” from the training data.

A common data mining practice is to build (or train) your model against part of the source data, and then to test the model against the remaining portion of your data. See evaluation method

Models are often trained to identify the set of features, algorithms, and hyper-parameters that create the best model for their problem. Before arriving at the ideal model, it is not uncommon to train hundreds of models that do not make the cut.


A model that seeks to identify the customers who are likely to respond to a promotion must be trained by analyzing the characteristics of many customers who are known to have responded or not responded to a promotion in the past.



See Data Mining - Training (Data|Set)

Pre-trained model


A record of a trained model to build the history will be like:

  • Who trained the model
  • Start and end time of the training job
  • Full model configuration (features used, hyper-parameter values, etc.)
  • Reference to training and test data sets
  • Distribution and relative importance of each feature
  • Model accuracy metrics
  • Standard charts and graphs for each model type (e.g. ROC curve, PR curve, and confusion matrix for a binary classifier)
  • Full learned parameters of the model
  • Summary statistics for model visualization

Documentation / Reference

Task Runner