Machine learning - Bootstrap aggregating (bagging)

About

Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression.

It also reduces variance and helps to avoid over-fitting. Although it is usually applied to decision tree methods, it can be used with any type of method.

Bagging:

  1. produces several different training sets of the same size with replacement
  2. and then build a model for each one using the same machine learning scheme
  3. Combine predictions by voting for a nominal target or averaging for a numeric target

Bagging can be parallelized.

Advantage / Inconvenient

It's very suitable for “unstable” learning schemes which means that small change in training data can make big change in the model.

Example: decision trees is a very unstable schema but not Naïve Bayes or instance‐based learning because all attributes contributes independently

Replacement

In bagging, you sample the set “with replacement” which means that you might get in your sample two of the same instance.

Implementation

Weka

meta>Bagging

Documentation / Reference


Powered by ComboStrap