Machine Learning - (One|Simple) Rule - (One Level Decision Tree)

Thomas Bayes


One Rule is an simple method based on a 1‐level decision tree described in 1993 by Rob Holte, Alberta, Canada.

Simple rules often outperformed far more complex methods because some datasets are :

  • really simple
  • so small/noisy/complex that nothing can be learned from them



  • One branch for each value
  • Each branch assigns most frequent class
  • Error rate: proportion of instances that don’t belong to the majority class of their corresponding branch
  • Choose attribute with smallest error rate
For each attribute,
   For each value of the attribute,
   make a rule as follows:
       count how often each class appears
       find the most frequent class
       make the rule assign that class to this attribute-value
   Calculate the error rate of this attribute’s rules
Choose the attribute with the smallest error rate

Example of output for the weather data set

    if sunny	-> no
    if overcast	-> yes
    if rainy	-> yes

with this one-level decision tree, 10 instances are correct on 14.


Algorithm to choose the best rule

For each attribute:
  For each value of that attribute, create a rule:
      1. count how often each class appears
      2. find the most frequent class, c
      3. make a rule "if attribute=value then class=c"
  Calculate the error rate of this rule
Pick the attribute whose rules produce the lowest error rate

One Rule vs Baseline

OneR always outperforms (or, at worst, equals) Baseline when evaluated on the training data. (evaluating on the training data doesn't reflect performance on independent test data.)

ZeroR sometimes outperforms OneR if the target distribution is skewed or limited data is available, predicting the majority class can yield better results than basing a rule on a single attribute. This happens with the nominal weather dataset

minBucket Size

The “minBucket size” parameter of weka limits the complexity of rules in order to avoid overfitting (Default 6)

With one “minBucket size” the accuracy on the training data set is really high and decreases whereas the “minBucket size parameter” increases.

The cross validation evaluation method (10 folders) limits the accuracy effect and make it more stable through the “minBucket size” values.

One R Graph

1 47.66 92.99 106
2 48.13 71.5 31
3 59.81 68.22 14
4 59.35 66.36 10
5 57.94 63.55 8
6 57.94 63.08 8
7 58.41 62.14 6
8 56.07 61.68 6
9 57.48 60.75 4
10 57.94 59.34 4

Discover More
Data Mining - (Classifier|Classification Function)

A classifier is a Supervised function (machine learning tool) where the learned (target) attribute is categorical (“nominal”) in order to classify. It is used after the learning process to classify...
Odm Rule Data Mining
Data Mining - (Decision) Rule

Some forms of predictive data mining generate rules that are conditions that imply a given outcome. Rules are if-then-else expressions; they explain the decisions that lead to the prediction. They...
Thomas Bayes
Data Mining - Decision Tree (DT) Algorithm

Desicion Tree (DT) are supervised Classification algorithms. They are: easy to interpret (due to the tree structure) a boolean function (If each decision is binary ie false or true) Decision trees...
Data Minig Naives Bayes
Data Mining - Naive Bayes (NB)

Naive Bayes (NB) is a simple supervised function and is special form of discriminant analysis. It's a generative model and therefore returns probabilities. It's the opposite classification strategy...
Thomas Bayes
Machine Learning - Decision Stump

It makes a binary split on one of the attributes. It's considered as weak learner“ because it can only produce a tree with one level as One Rule. The boosting algorithm AdaBoostM1 utilizes it by default...
Linear Vs True Regression Function
Machine Learning - Linear (Regression|Model)

Linear regression is a regression method (ie mathematical technique for predicting numeric outcome) based on the resolution of linear equation. This is a classical statistical method dating back more...
Thomas Bayes
Statistics - (Confidence|likelihood) (Prediction probabilities|Probability classification)

Prediction probabilities are also known as: confidence (How confident can I be of this prediction?). or likelihood: (How likely is this prediction to be true?) They gives the probability of a predicted...

Share this page:
Follow us:
Task Runner