Data Mining - Ensemble Learning (meta set)

Thomas Bayes


Combining multiple models into ensemble in order to produce an ensemble for learning.

(Committee| Collective Intelligence) decision of different classifier algorithms.

Having different classifiers (also known as expert, base learners) with different perspectives and let them vote is often a very effective and robust way of making good decisions.

  • Bagging randomize the training data. It obtains classifiers from different training set samples.
  • Random forests randomizing the algorithm. Randomizing the algorithm depend on what the algorithm is. (Here the decision tree algorithm). Decision Tree select the best attribute to split. This procedure is randomized by not necessarily selecting the very best but choosing a few of the best options and randomly picking amongst them. Generally, if you bag decision trees, if you randomize them and bag the result, you get better performance.
  • Boosting forces new classifiers to focus on the errors produced by earlier ones.
  • Stacking uses a “meta learner” to combine the predictions of “base learners.”

Diversity help, especially when the learners (model) are unstable (when small changes in the training data can produce large changes in the learned model).

Create diversity by

  • Bagging: resampling the training set
  • Random forests: alternative branches in decision trees
  • Boosting: focus on where the existing model makes errors
  • Stacking: combine results using another learner (instead of voting)

Advantage / Disadvantage

The output is hard to analyze but the performance is really good.

Documentation / Reference

Discover More
Adaboost Accuracy By Numiterator Boosting
Data Mining - (Boosting|Gradient Boosting|Boosting trees)

Boosting forces new classifiers to focus on the errors produced by earlier ones. boosting works by aggressively reducing the training error Gradient Boosting is an algorithm based on an ensemble of decision...
Thomas Bayes
Data Mining - Decision Tree (DT) Algorithm

Desicion Tree (DT) are supervised Classification algorithms. They are: easy to interpret (due to the tree structure) a boolean function (If each decision is binary ie false or true) Decision trees...
Thomas Bayes
Data Mining - Stacking

Stacking is a ensemble of models combined sequentially. Stacking uses a “meta learner” (not voting) to combine the predictions of “base learners.” The base learners (the expert) are not combined...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Thomas Bayes
Machine learning - Bootstrap aggregating (bagging)

Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression....

Share this page:
Follow us:
Task Runner