Data Mining - (Decision) Rule


Some forms of predictive data mining generate rules that are conditions that imply a given outcome.

Rules are if-then-else expressions; they explain the decisions that lead to the prediction.

They are produced from a decision tree or association (such as association rule)

For example, a rule might specify that a person:

  • who has a bachelor's degree (an attribute)
  • and lives in a certain neighbourhood (an other attribute)

is likely to have an income greater than the regional average.

ODM Basket Analysis rules where antecedent and consequent contain the product ID.



A classification rule predicts value of a given attribute:

If outlook = sunny and humidity = high then play = no


A Association rule predicts value of arbitrary attribute (or combination)

If temperature = cool                  then humidity = normal
If humidity = normal and windy = false then play = yes
If outlook = sunny and play = no       then humidity = high
If windy = false and play = no         then outlook = sunny and humidity = high

Set of Rules

rule is easy to interpret, but a complex set of rules probably isn’t.

A sequential cover algorithm for sets of rules with complex conditions. Sets of rules are hard to interpret.


Strategies for Learning Each Rule:

  • General-to-Specific:
    • Start with an empty rule
    • Add constraints to eliminate negative examples
    • Stop when only positives are covered
  • Specific-to-General
    • Start with a rule that identifies a single random instance
    • Remove constraints in order to cover more positives
    • Stop when further generalization results in covering negatives

If more than one rule is triggered (Conflicts),

  • choose the “most specific” rule
  • Use domain knowledge to order rules by priority


Learning Rules by Sequential Covering (src: Alvarez)

Initialize the ruleset R to the empty set

for each class C {
    while D is nonempty {
        Construct one rule r that correctly classifies
        some instances in D that belong to class C 
        and does not incorrectly classify any non-C instances
        # See below "Finding next rule for class C"

        Add rule r to ruleset R
        Remove from D all instances correctly classified by r
return the ruleset R

Sequential Covering: Finding next rule for class C

Initialize A as the set of all attributes over D

while r incorrectly classifies some non-C instances of D 
  write r as antecedent(r) => C
  for each attribute-value pair (a=v), 
  where a belongs to A and v is a value of a,
  compute the accuracy of the rule
      antecedent(r) and (a=v) => C
  let (a*=v*) be the attribute-value pair of
  maximum accuracy over D; in case of a tie,
  choose the pair that covers the greatest
  number of instances of D
  update r by adding (a*=v*) to its antecedent:
      r = ( antecedent(r) and (a*=v*) ) => C
  remove the attribute a* from the set A:
      A = A - {a*}



Rules have an associated support (What percentage of the population satisfies the rule?).


Confidence is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} ⇒ C


Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent

Documentation / Reference

Powered by ComboStrap