Boosting forces new classifiers to focus on the errors produced by earlier ones. boosting works by aggressively reducing the training error
Gradient Boosting is an algorithm based on an ensemble of decision tree similar to random forests.
Instead of creating trees from different random subsets, Boosting trees take the error from the previous tree and use it to improve the next one.
It's an iterative algorithm. The idea is that you create a model, and then take a look at the instance that are misclassified (The hard one to classify). You put extra weight on those instance to make a training set for producing the next model in the iteration. This encourage the new model to become an “expert” for instance that were misclassified.
Iterative: new models are influenced by performance of previously built ones
If there is no structure to the features or if you have a limited amount of time to spend on a problem, one should definitely consider boosted trees.
Boosting is sequential (not parallel).
Boosting seems to not overfit (Why boosting does'nt overfit)
In weka: meta > AdaBoostM1. It's a standard and very good implementation. AdaBoostM1 utilizes by default DecisionStump as its base learner
Boosting the baseline algorithm (No Rule) will produce the same classifier for practically any subset of the data. Combining these identical classifiers would give the same result as the baseline by itself.
The default configuration of AdaBoostM1 is 10 boosting iterations using the DecisionStump classifier.
Performance might improve if :
With boosting, the accuracy generally improves, up to an asymptote, as the number of iterations increases.
Example with AdaBoostM1 on the diabetes dataset.
numIterations | Accuracies |
---|---|
1 | 71.875 |
10 | 74.349 |
20 | 75.2604 |
30 | 74.7396 |
40 | 74.7396 |
50 | 74.349 |
60 | 75.3906 |
70 | 75.1302 |
80 | 74.4792 |
90 | 74.8698 |
100 | 75.3906 |