Data Mining - (Decision) Rule

1 - About

Some forms of predictive data mining generate rules that are conditions that imply a given outcome.

Rules are if-then-else expressions; they explain the decisions that lead to the prediction.

They are produced from a decision tree or association (such as association rule)

For example, a rule might specify that a person:

  • who has a bachelor's degree (an attribute)
  • and lives in a certain neighbourhood (an other attribute)

is likely to have an income greater than the regional average.

ODM Basket Analysis rules where antecedent and consequent contain the product ID.

3 - Type

3.1 - Classification

A classification rule predicts value of a given attribute:

If outlook = sunny and humidity = high then play = no

3.2 - Association

A Association rule predicts value of arbitrary attribute (or combination)

If temperature = cool                  then humidity = normal
If humidity = normal and windy = false then play = yes
If outlook = sunny and play = no       then humidity = high
If windy = false and play = no         then outlook = sunny and humidity = high

4 - Set of Rules

rule is easy to interpret, but a complex set of rules probably isn’t.

A sequential cover algorithm for sets of rules with complex conditions. Sets of rules are hard to interpret.

5 - Algorithm

Strategies for Learning Each Rule:

  • General-to-Specific:
    • Start with an empty rule
    • Add constraints to eliminate negative examples
    • Stop when only positives are covered
  • Specific-to-General
    • Start with a rule that identifies a single random instance
    • Remove constraints in order to cover more positives
    • Stop when further generalization results in covering negatives

If more than one rule is triggered (Conflicts),

  • choose the “most specific” rule
  • Use domain knowledge to order rules by priority

5.1 - General-to-Specific

Learning Rules by Sequential Covering (src: Alvarez)

Initialize the ruleset R to the empty set

for each class C {
    while D is nonempty {
        Construct one rule r that correctly classifies
        some instances in D that belong to class C 
        and does not incorrectly classify any non-C instances
        # See below "Finding next rule for class C"

        Add rule r to ruleset R
        Remove from D all instances correctly classified by r
return the ruleset R

Sequential Covering: Finding next rule for class C

Initialize A as the set of all attributes over D

while r incorrectly classifies some non-C instances of D 
  write r as antecedent(r) => C
  for each attribute-value pair (a=v), 
  where a belongs to A and v is a value of a,
  compute the accuracy of the rule
      antecedent(r) and (a=v) => C
  let (a*=v*) be the attribute-value pair of
  maximum accuracy over D; in case of a tie,
  choose the pair that covers the greatest
  number of instances of D
  update r by adding (a*=v*) to its antecedent:
      r = ( antecedent(r) and (a*=v*) ) => C
  remove the attribute a* from the set A:
      A = A - {a*}

6 - Properties

6.1 - Support

Rules have an associated support (What percentage of the population satisfies the rule?).

6.2 - Confidence

Confidence is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} ⇒ C

6.3 - Lift

Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent

7 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap