About
Some forms of predictive data mining generate rules that are conditions that imply a given outcome.
Rules are if-then-else expressions; they explain the decisions that lead to the prediction.
They are produced from a decision tree or association (such as association rule)
For example, a rule might specify that a person:
- who has a bachelor's degree (an attribute)
- and lives in a certain neighbourhood (an other attribute)
is likely to have an income greater than the regional average.
ODM Basket Analysis rules where antecedent and consequent contain the product ID.
Articles Related
Type
Classification
A classification rule predicts value of a given attribute:
If outlook = sunny and humidity = high then play = no
Association
A Association rule predicts value of arbitrary attribute (or combination)
If temperature = cool then humidity = normal
If humidity = normal and windy = false then play = yes
If outlook = sunny and play = no then humidity = high
If windy = false and play = no then outlook = sunny and humidity = high
Set of Rules
rule is easy to interpret, but a complex set of rules probably isn’t.
A sequential cover algorithm for sets of rules with complex conditions. Sets of rules are hard to interpret.
Algorithm
Strategies for Learning Each Rule:
- General-to-Specific:
- Start with an empty rule
- Add constraints to eliminate negative examples
- Stop when only positives are covered
- Specific-to-General
- Start with a rule that identifies a single random instance
- Remove constraints in order to cover more positives
- Stop when further generalization results in covering negatives
If more than one rule is triggered (Conflicts),
- choose the “most specific” rule
- Use domain knowledge to order rules by priority
General-to-Specific
Learning Rules by Sequential Covering (src: Alvarez)
Initialize the ruleset R to the empty set
for each class C {
while D is nonempty {
Construct one rule r that correctly classifies
some instances in D that belong to class C
and does not incorrectly classify any non-C instances
# See below "Finding next rule for class C"
Add rule r to ruleset R
Remove from D all instances correctly classified by r
}
}
return the ruleset R
Sequential Covering: Finding next rule for class C
Initialize A as the set of all attributes over D
while r incorrectly classifies some non-C instances of D
{
write r as antecedent(r) => C
for each attribute-value pair (a=v),
where a belongs to A and v is a value of a,
compute the accuracy of the rule
antecedent(r) and (a=v) => C
let (a*=v*) be the attribute-value pair of
maximum accuracy over D; in case of a tie,
choose the pair that covers the greatest
number of instances of D
update r by adding (a*=v*) to its antecedent:
r = ( antecedent(r) and (a*=v*) ) => C
remove the attribute a* from the set A:
A = A - {a*}
}
Properties
Support
Rules have an associated support (What percentage of the population satisfies the rule?).
Confidence
Confidence is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} ⇒ C
Lift
Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent