Oracle® Data Mining Concepts 11g Release 1 (11.1) Part Number B28129-01 |
|
|
View PDF |
This chapter describes Apriori, the algorithm used by Oracle Data Mining for calculating association rules.
See Also:
Chapter 8, "Association Rules"This chapter contains the following topics:
Associations are calculated using the Apriori algorithm. The association mining problem can be decomposed into two subproblems:
Find all combinations of items, called frequent itemsets, whose support is greater than the specified minimum support.
Use the frequent itemsets to generate the desired rules. Oracle Data Mining association supports single consequent rules only (for example, "ABC implies D").
The number of frequent itemsets is controlled by the minimum support parameters. The number of rules generated is controlled by the number of frequent itemsets and the confidence parameter. If the confidence parameter is set too high, there may be frequent itemsets in the association model but no rules.
When Apriori uses equi-width binning, outliers cause most of the data to concentrate in a few bins, sometimes a single bin. As a result, the discriminating power of these algorithms can be significantly reduced.
Similarly, an association model might have all the values of a numerical attribute concentrated in a single bin, except for one value (the outlier) that belongs to a different bin. If, for example, this attribute is income, there will not be any rules reflecting different levels of income. All rules containing income will only reflect the range in the single bin; this range is basically the income range for the whole population.
Similarly, an association model might have all the values of a numerical attribute concentrated in a single bin, except for one value (the outlier) that belongs to a different bin. If, for example, this attribute is income, there will not be any rules reflecting different levels of income. All rules containing income will only reflect the range in the single bin; this range is basically the income range for the whole population.