Desicion Tree (DT) are supervised Classification algorithms.
- easy to interpret (due to the tree structure)
- a boolean function (If each decision is binary ie false or true)
Decision trees extract predictive information in the form of human-understandable tree-rules. Decision Tree is a algorithm useful for many classification problems that that can help explain the model’s logic using human-readable “If…. Then…” rules.
- reliable and robust algorithm.
- simple to implement.
- work on categorical attributes,
- handle many attributes, so big p smaller n cases.
Each decision in the tree can be seen as an feature.
The creation of a tree is a quest for:
- purity (only pure node: only yes or no)
- the smallest tree
At each level, choose the attribute that produces the “purest” nodes (ie choosing the attribute with the highest information gain)
Decision Trees are prone to overfitting:
- whereas ensemble of tree are not. See random forest
- Pruning can help: remove or aggregate sub-trees that provide little discriminatory power
Decision Trees can overfit badly because of the highly complex decision boundaries it can produce; the effect is ameliorated, but rarely completely eliminated with Pruning.
- package=FFTrees - Create, visualize, and test fast-and-frugal decision trees (FFTs). FFTs are very simple decision trees for binary classification problems. FFTs can be preferable to more complex algorithms because they are easy to communicate, require very little information, and are robust against overfitting.
Titanic (Survive Yes or No)
if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" and age < 5 then Survive = "yes" if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" then Survive = "no" if Ticket Class = "3" if Sex = "male" then Survive = "no" if Sex = "female" then if Age < 4 then Survive = "yes" if Age >= 4 then Survive = "no"
Every path from the root is a rule
Single tests at the nodes
Compound tests at the nodes