Data Mining - Decision Tree (DT) Algorithm

Thomas Bayes


Desicion Tree (DT) are supervised Classification algorithms.

They are:

  • easy to interpret (due to the tree structure)
  • a boolean function (If each decision is binary ie false or true)

Decision trees extract predictive information in the form of human-understandable tree-rules. Decision Tree is a algorithm useful for many classification problems that that can help explain the model’s logic using human-readable “If…. Then…” rules.

  • reliable and robust algorithm.
  • simple to implement.

They can:

  • work on categorical attributes,
  • handle many attributes, so big p smaller n cases.

Each decision in the tree can be seen as an feature.


The creation of a tree is a quest for:

  • purity (only pure node: only yes or no)
  • the smallest tree

At each level, choose the attribute that produces the “purest” nodes (ie choosing the attribute with the highest information gain)



Decision Trees are prone to overfitting:

  • whereas ensemble of tree are not. See random forest
  • Pruning can help: remove or aggregate sub-trees that provide little discriminatory power

Decision Trees can overfit badly because of the highly complex decision boundaries it can produce; the effect is ameliorated, but rarely completely eliminated with Pruning.


  • package=FFTrees - Create, visualize, and test fast-and-frugal decision trees (FFTs). FFTs are very simple decision trees for binary classification problems. FFTs can be preferable to more complex algorithms because they are easy to communicate, require very little information, and are robust against overfitting.


Titanic (Survive Yes or No)

Titanic Data Set

if Ticket Class = "1" then
   if Sex = "female" then Survive = "yes"
   if Sex = "male" and age < 5 then Survive = "yes"
if Ticket Class = "1" then
   if Sex = "female" then Survive = "yes"
   if Sex = "male" then Survive = "no"
if Ticket Class = "3"
   if Sex = "male" then Survive = "no"
   if Sex = "female" then 
      if Age < 4  then Survive = "yes"
      if Age >= 4 then Survive = "no"

Every path from the root is a rule



Single tests at the nodes


Compound tests at the nodes

Documentation / Reference

Task Runner