# Data Mining - Training (Data|Set)

In statistics, the training data is the sample whereas in data mining, machine learning, the training data is often a subset of the data set. See Model Building - ReSampling Validation

Training Set represents the hindsight whereas test set represents the foresight.

Discover More
Data Mining - (Anomaly|outlier) Detection

The goal of anomaly detection is to identify unusual or suspicious cases based on deviation from the norm within data that is seemingly homogeneous. Anomaly detection is an important tool: in data...
Data Mining - Naive Bayes (NB)

Naive Bayes (NB) is a simple supervised function and is special form of discriminant analysis. It's a generative model and therefore returns probabilities. It's the opposite classification strategy...
Data Mining - Test Set

The test set is a set that is used to validate the model. Test set represent the foresight (unknown data, real data) whereas training Set represents the hindsight. Generally, the test data is created...
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Statistics - Model Building (Training|Learning|Fitting)

Training is a supervised problem phase also known as building whereby the software analyses many cases where the target value is already known. In the training process, the model “learns” the logic...
Statistics - Model Evaluation (Estimation|Validation|Testing)

Evaluation is how to determine if the model is a good representation of the truth. Validation applies the model to test data in order to determine whether the model, built on a training set, is generalizable...
Statistics - Resampling through Random Percentage Split

Percentage Split (Fixed or Holdout) is a re-sampling method that leave out random N% of the original data. For example, you might select: 75% of the rows formed the training setfor building the model...
Statistics Learning - Prediction Error (Training versus Test)

The Prediction Error tries to represent the noise through the concept of training error versus test error. We fit our model to the training set. We take our model, and then we apply it to new data that...