38 DBMS_DATA_MINING

Oracle Data Mining is an analytical technology for deriving actionable information from large data stores. You can use Oracle Data Mining to evaluate the probability of future events and discover unsuspected associations and groupings within your data.

The DBMS_DATA_MINING package is the primary interface to Oracle Data Mining. A Java API is layered on the PL/SQL API.Oracle Data Mining supports a family of SQL functions for deploying data mining models.

See Also:

Chapter 39, "DBMS_DATA_MINING_TRANSFORM". This package supports data pre-processing for mining models.
Chapter 90, "DBMS_PREDICTIVE_ANALYTICS". This package supports several routines that perform automated data mining.
Oracle Database SQL Language Reference for information about the SQL Data Mining functions.
Oracle Data Mining Concepts for an overview of Oracle Data Mining.
What's New in Oracle Data Mining in Oracle Data Mining Concepts for new features in 11g Release 1 (11.1)

This chapter contains the following topics:

Using DBMS_DATA_MINING
Summary of DBMS_DATA_MINING Subprograms

Using DBMS_DATA_MINING

This section contains topics that relate to using the DBMS_DATA_MINING package.

Overview
Mining Model Objects
Deprecated Subprograms
Mining Function Constants
Settings Table
Data Types

Overview

Oracle Data Mining supports both supervised and unsupervised data mining. Supervised data mining predicts a target value based on historical data. Unsupervised data mining discovers natural groupings and does not use a target.

See Also:

Oracle Data Mining Concepts for background information on supervised and unsupervised data mining, and for additional information about Data Mining functions and algorithms.

A data mining function refers to the methods for solving a given class of data mining problems. The mining function must be specified when a model is created. See "Mining Function Constants".

Note on Terminology:

In data mining terminology, a function is a general type of problem to be solved by a given approach to data mining. In SQL language terminology, a function is an operator that returns a value.

In Oracle Data Mining documentation, the term function, or mining function refers to a data mining function; the term SQL function or SQL Data Mining function refers to a SQL function for scoring (deploying) data mining models. The SQL Data Mining functions are documented in Oracle Database SQL Language Reference.

Supervised data mining functions include:

Classification
Regression
Attribute Importance

Unsupervised data mining functions include:

Clustering
Association
Feature Extraction

The steps you use to build and apply a mining model depend on the data mining function and the algorithm being used. The algorithms supported by Oracle Data Mining are listed in Table 38-1.

Table 38-1 Oracle Data Mining Algorithms

Algorithm	Abbreviation	Function
Adaptive Bayes Network (deprecated)	ABN	Classification
Apriori	AP	Association
Decision Tree	DT	Classification
Generalized Linear Model	GLM	Classification and regression
k-Means (default clustering algorithm)	KM	Clustering
Minimal Descriptor Length	MDL	Attribute Importance
Naive Bayes (default classification algorithm)	NB	Classification
Non-Negative Matrix Factorization	NMF	Feature Extraction
Orthogonal Partitioning Clustering	O-Cluster	Clustering
Support Vector Machine (default regression algorithm)	SVM	Classification and regression (and anomaly detection through classification)

Mining Model Objects

Mining models are Oracle Database schema objects. They support the standard security features of Oracle Database. Mining models are also supported by SQL COMMENT and SQL AUDIT.

See Also:

Oracle Data Mining Administrator's Guide for information about mining model objects, SQL COMMENT, and SQL AUDIT.
Oracle Data Mining Administrator's Guide for information about mining model security
Oracle Data Mining Administrator's Guide for information about the sample Data Mining programs.

Note that each of these links will take you to the relevant chapter in the Oracle Data Mining Administrator's Guide.

ALL_MINING_MODELS

You can query the data dictionary view ALL_MINING_MODELS to obtain a list of accessible mining models.

Example 38-1 ALL_MINING_MODELS

SQL> describe all_mining_models
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 MINING_FUNCTION                                    VARCHAR2(30)
 ALGORITHM                                          VARCHAR2(30)
 CREATION_DATE                             NOT NULL DATE
 BUILD_DURATION                                     NUMBER
 MODEL_SIZE                                         NUMBER
 COMMENTS                                           VARCHAR2(4000)

See Also:

Oracle Data Mining Application Developer's Guide for more information about ALL_MINING_MODELS.

Mining Model Naming Restrictions

The naming rules for models are more restrictive than the naming rules for most database schema objects. A model name must satisfy the following additional requirements:

It must be 25 or fewer characters long.
It must be a nonquoted identifier. Oracle requires that nonquoted identifiers contain only alphanumeric characters, the underscore (_), dollar sign ($), and pound sign (#); the initial character must be alphabetic. Oracle strongly discourages the use of the dollar sign and pound sign in nonquoted literals.

Naming requirements for schema objects are fully documented in Oracle Database SQL Language Reference.

ALL_MINING_MODEL_ATTRIBUTES

You can query the data dictionary view ALL_MINING_MODEL_ATTRIBUTES to obtain a list of the data attributes for each accessible mining model. Data attributes are the columns of data used by an algorithm to build a model. Some or all of these columns must be present in the data to which the model is applied.

Data attributes are referred to as the model signature. The ALL_MINING_MODEL_ATTRIBUTES view lists the data attributes in the model signature plus the target if the model is supervised.

An algorithm builds an internal representation of the data attributes and uses them as either categoricals (data that classifies or categorizes) or as numericals (continuous data). These internal model attributes can be viewed using the GET_MODEL_DETAILS functions.

Example 38-2 ALL_MINING_MODEL_ATTRIBUTES

SQL> describe all_mining_model_attributes
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 ATTRIBUTE_NAME                            NOT NULL VARCHAR2(30)
 ATTRIBUTE_TYPE                                     VARCHAR2(11)
 DATA_TYPE                                          VARCHAR2(12)
 DATA_LENGTH                                        NUMBER
 DATA_PRECISION                                     NUMBER
 DATA_SCALE                                         NUMBER
 USAGE_TYPE                                         VARCHAR2(8)
 TARGET                                             VARCHAR2(3)

See Also:

Oracle Data Mining Application Developer's Guide for more information about attributes and ALL_MINING_MODEL_ATTRIBUTES.

ALL_MINING_MODEL_SETTINGS

The view ALL_MINING_MODEL_SETTINGS returns the settings for each accessible mining model. Settings control various characteristics of mining models.

All settings have default values. The values of some settings are generated by the algorithm by default. You can override the default value of a setting by specifying its value in a settings table for the model. All settings, both default and user-specified, are listed in ALL_MINING_MODEL_SETTINGS.

Example 38-3 ALL_MINING_MODEL_SETTINGS

SQL> describe all_mining_model_settings
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 SETTING_NAME                              NOT NULL VARCHAR2(30)
 SETTING_VALUE                                      VARCHAR2(4000)
 SETTING_TYPE                                       VARCHAR2(7)

See Also:

Settings Table
Oracle Data Mining Application Developer's Guide for more information about ALL_MINING_MODEL_SETTINGS

Deprecated Subprograms

The following subprograms are deprecated in 11g Release 1 (11.1).

GET_DEFAULT_SETTINGS

Replaced with data dictionary views: USER/ALL/DBA_MINING_MODEL_SETTINGS
GET_MODEL_SETTINGS

Replaced with data dictionary views: USER/ALL/DBA_MINING_MODEL_SETTINGS
GET_MODEL_SIGNATURE

Replaced with data dictionary views: USER/ALL/DBA_MINING_MODEL_ATTRIBUTES

The following view is deprecated in 11g Release 1 (11.1).

DM_USER_MODELS

Replaced with data dictionary views: USER/ALL/DBA_MINING_MODELS

The Adaptive Bayes Network algorithm is deprecated in 11g Release 1 (11.1).

Note:

Oracle recommends that you do not use deprecated procedures in new applications. Support for deprecated features is for backward compatibility only.

The DMSYS schema no longer exists in 11g Release 1 (11.1). Oracle Data Mining metadata is now in SYS.

Mining Function Constants

The constants that specify the mining function of a model are listed in Table 38-2. The concept of a "mining function" is introduced in "Overview".

All models are created with a mining function. The mining function is a required argument to the CREATE_MODEL Procedure.

Table 38-2 Mining Functions

Value	Description
`ASSOCIATION`	Association is a descriptive mining function. An association model identifies relationships and the probability of their occurrence within a data set. Association models use the Apriori algorithm.
`ATTRIBUTE_IMPORTANCE`	Attribute Importance is a predictive mining function. An attribute importance model identifies the relative importance of an attribute in predicting a given outcome. Attribute Importance models use the Minimal Descriptor Length algorithm.
`CLASSIFICATION`	Classification is a predictive mining function. A classification model uses historical data to predict a categorical target. Classification models can use: Naive Bayes, Adaptive Bayes Network (deprecated), Decision Tree, Logistic Regression, or Support Vector Machine algorithms. The default is Naive Bayes. The classification function can also be used for anomaly detection. In this case, the SVM algorithm with a null target is used (One-Class SVM).
`CLUSTERING`	Clustering is a descriptive mining function. A clustering model identifies natural groupings within a data set. Clustering models can use: k-Means or O-Cluster algorithms. The default is k-Means.
`FEATURE_EXTRACTION`	Feature Extraction is a descriptive mining function. A feature extraction model creates an optimized data set on which to base a model. Feature extraction models use the Non-Negative Matrix Factorization algorithm.
`REGRESSION`	Regression is a predictive mining function. A regression model uses historical data to predict a numerical target. Regression models can use Support Vector Machine or Linear Regression. The default is Support Vector Machine.

Settings Table

Oracle Data Mining uses settings to specify the algorithm and other characteristics of a model. Some settings are general, some are specific to a mining function, and some are specific to an algorithm.

All settings have default values. If you want to override one or more of the settings for a model, you must create a settings table. The settings table must have the column names and data types shown in Table 38-3.

Table 38-3 Required Columns in the Model Settings Table

Column Name	Data Type
`SETTING_NAME`	`VARCHAR2(30)`
`SETTING_VALUE`	`VARCHAR2(4000)`

The information you provide in the settings table is used by the model at build time. The name of the settings table is an optional argument to the CREATE_MODEL Procedure.

You can find the settings used by a model by querying the data dictionary view ALL_MINING_MODEL_SETTINGS. This view lists all the model settings used by the mining models to which you have access. All the setting values are included in the view, whether default or user-specified. See "ALL_MINING_MODEL_SETTINGS".

Algorithm Name Setting

The ALGO_NAME setting specifies the model algorithm. Oracle Data Mining supports more than one algorithm for the classification, regression, and clustering mining functions. Each of these mining functions has a default algorithm, as shown in Table 38-4.

Table 38-4 Default Algorithms

Mining Function	Default Algorithm
Classification	Naive Bayes
Regression	Support Vector Machine
Clustering	k-Means

The values for the ALGO_NAME setting are listed in Table 38-5.

Table 38-5 Algorithm Names

ALGO_NAME Value	Description	Mining Function
`ALGO_ADAPTIVE_BAYES_NETWORK`	Adaptive Bayes Network (deprecated)	Classification
`ALGO_DECISION_TREE`	Decision Tree	Classification
`ALGO_NAIVE_BAYES`	Naive Bayes	Classification
`ALGO_GENERALIZED_LINEAR_MODEL`	Generalized Linear Model	Classification and Regression
`ALGO_SUPPORT_VECTOR_MACHINES`	Support Vector Machine	Classification and Regression
`ALGO_KMEANS`	Enhanced k_Means	Clustering
`ALGO_O_CLUSTER`	O-Cluster	Clustering
`ALGO_AI_MDL`	Minimum Description Length	Attribute Importance
`ALGO_APRIORI_ASSOCIATION_RULES`	Apriori	Association Rules
`ALGO_NONNEGATIVE_MATRIX_FACTOR`	Non-Negative Matrix Factorization	Feature Extraction

Automatic Data Preparation Setting

The PREP_AUTO setting indicates whether or not the model will use Automatic Data Preparation (ADP). By default ADP is disabled.

When you enable ADP, the model uses heuristics to transform the build data according to the requirements of the algorithm. The transformation instructions are stored with the model and reused whenever the model is applied. You can view the transformation instructions in the model details.

You can choose to supplement automatic data preparations by specifying additional transformations in the xform_list parameter when you build the model. (See "CREATE_MODEL Procedure".)

If you do not use ADP (default) and do not specify transformations in the xform_list parameter to CREATE_MODEL (also the default), you will continue to operate in 10.2 mode. This means that you must implement your own transformations separately in the build, test, and scoring data; you must take special care to implement the exact same transformations in each data set.

If you do not use ADP, but you do specify transformations in the xform_list parameter to CREATE_MODEL, Oracle Data Mining embeds the transformation definitions in the model and prepares the test and scoring data to match the build data. Because of automatic and embedded data preparation, mining models are known as supermodels.

The values for the PREP_AUTO setting are described in Table 38-6.

Table 38-6 Automatic Data Preparation Setting

PREP_AUTO =	Description
`PREP_AUTO_OFF`	Disable Automatic Data Preparation (default).
`PREP_AUTO_ON`	Enable Automatic Data Preparation.

See Also:

Oracle Data Mining Concepts for information about data preparation.

Mining Function Settings

The settings described in Table 38-7 apply to a mining function.

Table 38-7 Mining Function Settings

Setting Name	Setting Value	Description
`ASSO_MAX_RULE_LENGTH`	`TO_CHAR(2<=` `numeric_expr` `<=20)`	Maximum rule length for association rules. Default is 4.
`ASSO_MIN_CONFIDENCE`	`TO_CHAR(0<=` `numeric_expr` `<=1)`	Minimum confidence for association rules. Default is 0.1.
`ASSO_MIN_SUPPORT`	`TO_CHAR(0<=` `numeric_expr` `<=1)`	Minimum support for association rules. Default is 0.1.
`CLAS_COST_TABLE_NAME`	table_name	(Decision Tree only) Name of a table that stores the cost matrix for a Decision Tree model. The cost matrix specifies the cost of misclassifications. Only Decision Tree models can use a cost matrix at build time. All classification algorithms can use a cost matrix at apply time. The cost matrix table is user-created. See Oracle Data Mining Concepts for the column requirements and for additional information.
`CLAS_PRIORS_TABLE_NAME`	table_name	(Naive Bayes) Name of a table that stores prior probabilities to offset differences in distribution between the build data and the scoring data. SVM classification uses the priors table for weights. The priors table is user-created. See Oracle Data Mining Concepts for the column requirements and for additional information.
`CLUS_NUM_CLUSTERS`	`TO_CHAR(numeric_expr` `>=1)`	Number of clusters generated by a clustering algorithm. Default is 10.
`CLAS_WEIGHTS_TABLE_NAME`	table_name	Name of a table that stores weighting information for individual target values in a GLM classification model. The weights are used by the algorithm to bias the model in favor of higher weighted classes. The class weights table is user-created. See Oracle Data Mining Concepts for the column requirements and for additional information.
`FEAT_NUM_FEATURES`	`TO_CHAR(numeric_expr` `>=1)`	Number of features to be extracted by a feature extraction model. The default is estimated from the data by the algorithm.

See Also:

Oracle Data Mining Concepts for information about mining functions.

Global Settings

The settings in Table 38-8 are applicable to any type of model, but are currently only implemented for GLM.

You can query the data dictionary view *_MINING_MODEL_SETTINGS (using the ALL, USER, or DBA prefix) to find the setting values for a model. See Oracle Data Mining Application Developer's Guide for information about *_ALL_MINING_MODEL_SETTINGS.

Table 38-8 Global Settings

Setting Name Setting Value Description

Setting Name	Setting Value	Description
`ODMS_MISSING_VALUE_TREATMENT`	`ODMS_MISSING_VALUE_MEAN_MODE` `ODMS_MISSING_VALUE_DELETE_ROW`	(GLM only) How to treat missing values in the training data. This setting does not affect the scoring data. Oracle Data Mining replaces missing values with the mean (numeric attributes) or the mode (categorical attributes) both at build time and apply time. You can set `ODMS_MISSING_VALUE_TREATMENT` to `ODMS_MISSING_VALUE_DELETE_ROW` to override this behavior in the training data. When `ODMS_MISSING_VALUE_TREATMENT` is set to `ODMS_MISSING_VALUE_DELETE_ROW`, the rows in the training data that contain missing values are deleted. However, if you want to replicate this missing value treatment in the scoring data, you must perform the transformation explicitly. For instructions, see Oracle Data Mining Concepts. The value `ODMS_MISSING_VALUE_DELETE_ROW` is only valid for tables without nested columns. If this value is used with nested data, an exception is raised.
`ODMS_ROW_WEIGHT_COLUMN_NAME`	column_name	(GLM only) Name of a column in the training data that contains a weighting factor for the rows. Row weights can be used as a compact representation of repeated rows, as in the design of experiments where a specific configuration is repeated several times. Row weights can also be used to emphasize certain rows during model construction. For example, to bias the model towards rows that are more recent and away from potentially obsolete data.

ODMS_MISSING_VALUE_TREATMENT

ODMS_MISSING_VALUE_MEAN_MODE

ODMS_MISSING_VALUE_DELETE_ROW

(GLM only) How to treat missing values in the training data. This setting does not affect the scoring data.

Oracle Data Mining replaces missing values with the mean (numeric attributes) or the mode (categorical attributes) both at build time and apply time. You can set ODMS_MISSING_VALUE_TREATMENT to ODMS_MISSING_VALUE_DELETE_ROW to override this behavior in the training data. When ODMS_MISSING_VALUE_TREATMENT is set to ODMS_MISSING_VALUE_DELETE_ROW, the rows in the training data that contain missing values are deleted. However, if you want to replicate this missing value treatment in the scoring data, you must perform the transformation explicitly. For instructions, see Oracle Data Mining Concepts.

The value ODMS_MISSING_VALUE_DELETE_ROW is only valid for tables without nested columns. If this value is used with nested data, an exception is raised.

ODMS_ROW_WEIGHT_COLUMN_NAME

column_name

(GLM only) Name of a column in the training data that contains a weighting factor for the rows.

Row weights can be used as a compact representation of repeated rows, as in the design of experiments where a specific configuration is repeated several times. Row weights can also be used to emphasize certain rows during model construction. For example, to bias the model towards rows that are more recent and away from potentially obsolete data.

See Also:

Oracle Data Mining Concepts for information about GLM.

Algorithm Settings: Adaptive Bayes Network (deprecated)

These settings affect the behavior of the Adaptive Bayes Network algorithm.

Table 38-9 ABN Settings

Setting	Value	Description
`ABNS_MAX_BUILD_MINUTES`	`TO_CHAR(` `numeric_expr` `>=0)`	Maximum time to complete an ABN model build. Default is 0, which implies no time limit.
`ABNS_MAX_NB_PREDICTORS`	`TO_CHAR(` `numeric_expr` `>0)`	Maximum number of predictors, measured by their MDL ranking, to be considered for building an ABN model of type `abns_naive_bayes`. Default is 10.
`ABNS_MAX_PREDICTORS`	`TO_CHAR(numeric_expr` `>0)`	Maximum number of predictors, measured by their MDL ranking, to be considered for building an ABN model of type `abns_single_feature` or `abns_multi_feature`. Default is 25.
`ABNS_MODEL_TYPE`	`ABNS_MULTI_FEATURE` `ABNS_NAIVE_BAYES` `ABNS_SINGLE_FEATURE`	Type of ABN model. The default is `multi_feature`.

Algorithm Settings: Decision Tree

These settings affect the behavior of the Decision Tree algorithm.

Table 38-10 Decision Tree Settings

Setting	Value	Description
`TREE_IMPURITY_METRIC`	`TREE_IMPURITY_ENTROPY` `TREE_IMPURITY_GINI`	Tree impurity metric for Decision Tree. Tree algorithms seek the best test question for splitting data at each node. The best splitter and split value are those that result in the largest increase in target value homogeneity (purity) for the entities in the node. Purity is measured in accordance with a metric. Decision trees can use either gini (`TREE_IMPURITY_GINI`) or entropy (`TREE_IMPURITY_ENTROPY`) as the purity metric. By default, the algorithm uses gini.
`TREE_TERM_MAX_DEPTH`	`TO_CHAR( 2<=` `numeric_expr` `<=20)`	Criteria for splits: maximum tree depth (the maximum number of nodes between the root and any leaf node, including the leaf node). Default is 7.
`TREE_TERM_MINPCT_MODE`	`TO_CHAR( 0<=` `numeric_expr` `<=10)`	No child shall have fewer records than this number, which is expressed as a percentage of the training rows. Default is 0.05, indicating 0.05%.
`TREE_TERM_MINPCT_SPLIT`	`TO_CHAR( 0 <=` `numeric_expr` `<=20)`	Criteria for splits: minimum number of records in a parent node expressed as a percent of the total number of records used to train the model. No split is attempted if number of records is below this value. Default is 0.1, indicating 0.1%.
`TREE_TERM_MINREC_NODE`	`TO_CHAR(numeric_expr` `>=0)`	No child shall have fewer records than this number. Default is 10.
`TREE_TERM_MINREC_SPLIT`	`TO_CHAR( numeric_expr` `>=0)`	Criteria for splits: minimum number of records in a parent node expressed as a value. No split is attempted if number of records is below this value. Default is 20.

Algorithm Settings: Generalized Linear Models

These settings affect the behavior of GLM models. GLM can be used for classification (logistic regression) or regression (linear regression).

Table 38-11 GLM Settings

Setting Name	Setting Value	Description
`GLMS_CONF_LEVEL`	`TO_CHAR(0<` `numeric_expr` `<1)`	The confidence level for coefficient confidence intervals. The default confidence level is 0.95.
`GLMS_DIAGNOSTICS_TABLE_NAME`	table_name	The name of a table to contain row-level diagnostic information for a GLM model. The table is created during model build. If you want to create a diagnostics table, you must specify a case ID when you build the model. (See the CREATE_MODEL Procedure.) If you specify a diagnostics table but do not provide a case ID, an exception is raised. For information on GLM diagnostics, see Oracle Data Mining Concepts.
`GLMS_REFERENCE_CLASS_NAME`	target_value	The target value to be used as the reference value in a logistic regression model. Probabilities will be produced for the other (non-reference) class. By default, the algorithm chooses the value with the highest prevalence (the most cases) for the reference class.
`GLMS_RIDGE_REGRESSION`	`GLMS_RIDGE_REG_ENABLE` `GLMS_RIDGE_REG_DISABLE`	Whether or not ridge regression will be enabled. By default, the algorithm determines whether or not to use ridge. You can explicitly enable ridge by setting `GLMS_RIDGE_REGRESSION` to `GLMS_RIDGE_REG_ENABLE`. Ridge applies to both regression and classification mining functions. When ridge is enabled, no prediction bounds are produced by the `PREDICTION_BOUNDS` SQL operator.
`GLMS_RIDGE_VALUE`	`TO_CHAR(0<` `numeric_expr)`	The value for the ridge parameter used by the algorithm. This setting is only used when you explicitly enable ridge regression by setting `GLMS_RIDGE_REGRESSION` to `GLMS_RIDGE_REG_ENABLE`. If ridge regression is enabled internally by the algorithm, the ridge parameter is determined by the algorithm.
`GLMS_VIF_FOR_RIDGE`	`GLMS_VIF_RIDGE_ENABLE` `GLMS_VIF_RIDGE_DISABLE`	(Linear regression only) Whether or not to produce Variance Inflation Factor (VIF) statistics when ridge is being used. By default, VIF is not produced when ridge is enabled. When you explicitly enable ridge regression by setting `GLMS_RIDGE_REGRESSION` to `GLMS_RIDGE_REG_ENABLE`, you can request VIF statistics by setting `GLMS_VIF_FOR_RIDGE` to `GLMS_VIF_RIDGE_ENABLE`; the algorithm will produce VIF if enough system resources are available.

See Also:

Oracle Data Mining Concepts for information about GLM.

Algorithm Settings: k-Means

These settings affect the behavior of the k-Means algorithm.

Table 38-12 k-Means Settings

Setting Name	Setting Value	Description
`KMNS_BLOCK_GROWTH`	`TO_CHAR(1<` `numeric_expr` `<=5)`	Growth factor for memory allocated to hold cluster data Default value is 2
`KMNS_CONV_TOLERANCE`	`TO_CHAR(0<` `numeric_expr` `<=0.5)`	Convergence tolerance for k-Means algorithm Default is 0.01
`KMNS_DISTANCE`	`KMNS_COSINE` `KMNS_EUCLIDEAN` `KMNS_FAST_COSINE`	Distance Function for k-Means Clustering. The default is euclidean.
`KMNS_ITERATIONS`	`TO_CHAR(0<` `numeric_expr` `<=20)`	Number of iterations for k-Means algorithm Default is 3
`KMNS_MIN_PCT_ATTR_SUPPORT`	`TO_CHAR(0<=` `numeric_expr` `<=1)`	The fraction of attribute values that must be non-null in order for the attribute to be included in the rule description for the cluster. Setting the parameter value too high in data with missing values can result in very short or even empty rules. Default is 0.1.
`KMNS_NUM_BINS`	`TO_CHAR(numeric_expr` `>0)`	Number of histogram bins. Specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin. Default is 10.
`KMNS_SPLIT_CRITERION`	`KMNS_SIZE` `KMNS_VARIANCE`	Split criterion for k-Means Clustering. The default criterion is the variance.

See Also:

Oracle Data Mining Concepts for information about k-Means.

Algorithm Settings: Naive Bayes

These settings affect the behavior of the Naive Bayes Algorithm.

Table 38-13 Naive Bayes Settings

Setting Name	Setting Value	Description
`NABS_PAIRWISE_THRESHOLD`	`TO_CHAR(0<=` `numeric_expr` `<=1)`	Value of pairwise threshold for NB algorithm Default is 0.01.
`NABS_SINGLETON_THRESHOLD`	`TO_CHAR(0<=` `numeric_expr` `<=1)`	Value of singleton threshold for NB algorithm Default value is 0.01

See Also:

Oracle Data Mining Concepts for information about Naive Bayes

Algorithm Settings: Non-Negative Matrix Factorization

These settings affect the behavior of the Non-Negative Matrix Factorization algorithm.

Table 38-14 NMF Settings

Setting Name	Setting Value	Description
`NMFS_CONV_TOLERANCE`	`TO_CHAR(0<` `numeric_expr` `<=0.5)`	Convergence tolerance for NMF algorithm Default is 0.05
`NMFS_NUM_ITERATIONS`	`TO_CHAR(1 <=` `numeric_expr` `<=500)`	Number of iterations for NMF algorithm Default is 50
`NMFS_RANDOM_SEED`	`TO_CHAR(numeric_expr)`	Random seed for NMF algorithm. Default is –1.

See Also:

Oracle Data Mining Concepts for information about NMF

Algorithm Settings: O-Cluster

These settings affect the behavior of the O-Cluster algorithm.

Table 38-15 O-CLuster Settings

Setting Name	Setting Value	Description
`OCLT_MAX_BUFFER`	`TO_CHAR(numeric_expr` `>0)`	Buffer size for O-Cluster. Default is 50,000.
`OCLT_SENSITIVITY`	`TO_CHAR(0 <=numeric_expr` `<=1)`	A fraction that specifies the peak density required for separating a new cluster. The fraction is related to the global uniform density. Default is 0.5.

See Also:

Oracle Data Mining Concepts for information about O-Cluster.

Algorithm Settings: Support Vector Machine

These settings affect the behavior of the Support Vector Machine algorithm. SVM can be used for classification or regression, or for anomaly detection (classification with a null target).

Table 38-16 SVM Settings

Setting Name	Setting Value	Description
`SVMS_ACTIVE_LEARNING`	`SVMS_AL_DISABLE` `SVMS_AL_ENABLE`	Whether active learning is enabled or disabled. By default, active learning is enabled. When active learning is enabled, the SVM algorithm uses active learning to build a reduced size model. When active learning is disabled, the SVM algorithm builds a standard model.
`SVMS_COMPLEXITY_FACTOR`	`TO_CHAR(numeric_expr` `>0)`	Value of complexity factor for SVM algorithm (both classification and regression). Default value estimated from the data by the algorithm.
`SVMS_CONV_TOLERANCE`	`TO_CHAR(numeric_expr` `>0)`	Convergence tolerance for SVM algorithm. Default is 0.001.
`SVMS_EPSILON`	`TO_CHAR(numeric_expr` `>0)`	Value of epsilon factor for SVM regression. Default value estimated from the data by the algorithm.
`SVMS_KERNEL_CACHE_SIZE`	`TO_CHAR(numeric_expr` `>0)`	Value of kernel cache size for SVM algorithm. Applies to Gaussian kernel only. Default is 50000000 bytes.
`SVMS_KERNEL_FUNCTION`	`svm_gaussian` `svms_linear`	Kernel for Support Vector Machine. The default is determined by the algorithm based on the number of attributes in the training data. When there are many attributes, the algorithm uses a linear kernel, otherwise it uses a nonlinear (Gaussian) kernel. The number of attributes does not correspond to the number of columns in the training data. The algorithm explodes categorical attributes to binary, numeric attributes. In addition, Oracle Data Mining handles each row in a nested column as a separate attribute. SVM takes these factors into account when choosing the kernel function.
`SVMS_OUTLIER_RATE`	`TO_CHAR(0< numeric_expr` `<1)`	The desired rate of outliers in the training data. Valid for One-Class SVM models only (anomaly detection). Default is.1.
`SVMS_STD_DEV`	`TO_CHAR(numeric_expr` `>0)`	Value of standard deviation for SVM algorithm. This is applicable only for Gaussian kernel. Default value estimated from the data by the algorithm.

See Also:

Oracle Data Mining Concepts for information about SVM.

Data Types

The DBMS_DATA_MINING package uses object data types to store information about model attributes. Most of these types are returned by the table functions GET_n, where n identifies the type of information to return. These functions take a model name as input and return the requested information as a collection of rows.

For a list of the GET functions, see "Summary of DBMS_DATA_MINING Subprograms".

Oracle Data Mining also uses object data types for handling transactional data. These types, DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS specify nested tables that can be used for storing a set of mining attributes in a single column. For more information on nested tables, see the Oracle Data Mining Application Developer's Guide.

All the table functions use pipelining, which causes each row of output to be materialized as it is read from model storage, without waiting for the generation of the complete table object. For more information on pipelined, parallel table functions, consult the Oracle Database PL/SQL Language Reference.

The Data Mining object data types are described in Table 38-17.

Table 38-17 DBMS_DATA_MINING Summary of Data Types

Data Type	Description
`DM_ABN_DETAIL`	Information about an attribute in an Adaptive Bayes Network model.
`DM_ABN_DETAILS`	A collection of `DM_ABN_DETAIL`. Returned by GET_MODEL_DETAILS_ABN Function.
`DM_CENTROID`	The centroid of a cluster.
`DM_CENTROIDS`	A collection of `DM_CENTROID`. A member of `DM_CLUSTER`.
`DM_CHILD`	A child node of a cluster.
`DM_CHILDREN`	A collection of `DM_CHILD`. A member of `DM_CLUSTER`.
`DM_CLUSTER`	A cluster. A cluster includes `DM_PREDICATES`, `DM_CHILDREN`, `DM_CENTROIDS`, and `DM_HISTOGRAMS`. It also includes a `DM_RULE`.
`DM_CLUSTERS`	A collection of `DM_CLUSTER`. Returned by GET_MODEL_DETAILS_KM Function and GET_MODEL_DETAILS_OC Function.
`DM_CONDITIONAL`	The conditional probability of an attribute in a Naive Bayes model.
`DM_CONDITIONALS`	A collection of `DM_CONDITIONAL`. Returned by GET_MODEL_DETAILS_NB Function.
`DM_COST_ELEMENT`	The actual and predicted values in a cost matrix.
`DM_COST_MATRIX`	A collection of `DM_COST_ELEMENT`. Returned by GET_MODEL_COST_MATRIX Function.
`DM_GLM_COEFF`	The coefficient and associated statistics of an attribute in a Generalized Linear Model.
`DM_GLM_COEFF_SET`	A collection of `DM_GLM_COEFF`. Returned by GET_MODEL_DETAILS_GLM Function.
`DM_HISTOGRAM_BIN`	A histogram associated with a cluster.
`DM_HISTOGRAMS`	A collection of `DM_HISTOGRAM_BIN`. A member of `DM_CLUSTER`.
`DM_ITEM`	An item in an association rule.
`DM_ITEMS`	A collection of `DM_ITEM`.
`DM_ITEMSET`	A collection of `DM_ITEMS`.
`DM_ITEMSETS`	A collection of `DM_ITEMSET`. Returned by GET_FREQUENT_ITEMSETS Function.
`DM_MODEL_GLOBAL_DETAIL`	High-level statistics about a model.
`DM_MODEL_GLOBAL_DETAILS`	A collection of `DM_MODEL_GLOBAL_DETAIL`. Returned by GET_MODEL_DETAILS_GLOBAL Function.
`DM_MODEL_SETTING`	A model setting.
`DM_MODEL_SETTINGS`	A collection of `DM_MODEL_SETTING`. Returned by GET_MODEL_SETTINGS Function and GET_DEFAULT_SETTINGS Function.
`DM_MODEL_SIGNATURE_ATTRIBUTE`	An attribute in the model signature.
`DM_MODEL_SIGNATURE`	A collection of `DM_MODEL_SIGNATURE`. Returned by GET_MODEL_SIGNATURE Function.
`DM_NB_DETAIL`	Information about an attribute in a Naive Bayes model.
`DM_NB_DETAILS`	A collection of `DM_DB_DETAIL`. Returned by GET_MODEL_DETAILS_NB Function.
`DM_NESTED_CATEGORICAL`	The name and value of a categorical attribute.
`DM_NESTED_CATEGORICALS`	A collection of `DM_NESTED_CATEGORICAL`. A collection of attributes defined as a single model attribute. Transactional data must be defined as nested attributes for Data Mining.
`DM_NESTED_NUMERICAL`	The name and value of a numerical attribute.
`DM_NESTED_NUMERICALS`	A collection of `DM_NESTED_NUMERICAL`. A collection of attributes defined as a single model attribute. Transactional data must be defined as nested attributes for Data Mining.
`DM_NMF_ATTRIBUTE`	An attribute in a feature of a Non-Negative Matrix Factorization model.
`DM_NMF_ATTRIBUTE_SET`	A collection of `DM_NMF_ATTRIBUTE`. A member of `DM_NMF_FEATURE`.
`DM_NMF_FEATURE`	A feature in a Non-Negative Matrix Factorization model.
`DM_NMF_FEATURE_SET`	A collection of `DM_NMF_FEATURE`. Returned by GET_MODEL_DETAILS_NMF Function.
`DM_PREDICATE`	Antecedent and consequent attributes.
`DM_PREDICATES`	A collection of `DM_PREDICATE`. A member of `DM_RULE`, `DM_CLUSTER`, and `DM_ABN_DETAIL`.
`DM_RANKED_ATTRIBUTE`	An attribute ranked by its importance in an Attribute Importance model.
`DM_RANKED_ATTRIBUTES`	A collection of `DM_RANKED_ATTRIBUTE`. Returned by GET_MODEL_DETAILS_AI Function.
`DM_RULE`	A rule that defines a conditional relationship. The rule can be one of the association rules returned by GET_ASSOCIATION_RULES Function, or it can be a rule associated with a cluster in the collection of clusters returned by GET_MODEL_DETAILS_KM Function and GET_MODEL_DETAILS_OC Function.
`DM_RULES`	A collection of `DM_RULE`. Returned by GET_ASSOCIATION_RULES Function.
`DM_SVM_ATTRIBUTE`	The name, value, and coefficient of an attribute in a Support Vector Machine model.
`DM_SVM_ATTRIBUTE_SET`	A collection of `DM_SVM_ATTRIBUTE`. Returned by GET_MODEL_DETAILS_SVM Function. Also a member of `DM_SVM_LINEAR_COEFF`.
`DM_SVM_LINEAR_COEFF`	The linear coefficient of each attribute in a Support Vector Machine model.
`DM_SVM_LINEAR_COEFF_SET`	A collection of `DM_SVM_LINEAR_COEFF`. Returned by GET_MODEL_DETAILS_SVM Function for an SVM model built using the linear kernel.
`DM_TRANSFORM`	The transformation and reverse transformation expressions for an attribute.
`DM_TRANSFORMS`	A collection of `DM_TRANSFORM`. Returned by GET_MODEL_TRANSFORMATIONS Function.
`TRANSFORM_LIST`	A list of user-specified transformations for a model. Accepted as a parameter by the CREATE_MODEL Procedure. This collection type is defined in the `DBMS_DATA_MINING_TRANSFORM` package.

Summary of DBMS_DATA_MINING Subprograms

Table 38-18 summarizes the subprograms included in the DBMS_DATA_MINING package.

Table 38-18 DBMS_DATA_MINING Package Subprograms

Data Type	Purpose
ADD_COST_MATRIX Procedure	Adds a cost matrix to a classification model
ALTER_REVERSE_EXPRESSION Procedure	Changes the reverse transformation expression to an expression that you specify
APPLY Procedure	Applies a model to a data set (scores the data)
COMPUTE_CONFUSION_MATRIX Procedure	Computes the confusion matrix from the `APPLY` results on test data for a classification model; also provides the accuracy of the model
COMPUTE_LIFT Procedure	Computes lift for a given positive target value from the `APPLY` results on test data for a classification model
COMPUTE_ROC Procedure	Computes Receiver Operating Characteristic (ROC) for a classification model
CREATE_MODEL Procedure	Creates (builds) a model
DROP_MODEL Procedure	Drops a model
EXPORT_MODEL Procedure	Exports a model to a dump file
GET_ASSOCIATION_RULES Function	Returns the rules from an association model
GET_DEFAULT_SETTINGS Function	Returns all the default settings for all mining functions and algorithms
GET_FREQUENT_ITEMSETS Function	Returns the frequent itemsets for an association model
GET_MODEL_COST_MATRIX Function	Returns the cost matrix for a model
GET_MODEL_DETAILS_ABN Function	Returns the details of an Adaptive Bayes Network model
GET_MODEL_DETAILS_AI Function	Returns the details of an Attribute Importance model
GET_MODEL_DETAILS_GLM Function	Returns the details of a Generalized Linear Model
GET_MODEL_DETAILS_GLOBAL Function	Returns high-level statistics about a model
GET_MODEL_DETAILS_KM Function	Returns the details of a k-Means model
GET_MODEL_DETAILS_NB Function	Returns the details of a Naive Bayes model
GET_MODEL_DETAILS_NMF Function	Returns the details of an NMF model
GET_MODEL_DETAILS_OC Function	Returns the details of an O-Cluster model
GET_MODEL_DETAILS_SVM Function	Returns the details of an SVM model with a linear kernel
GET_MODEL_DETAILS_XML Function	Returns the details of a Decision Tree model
GET_MODEL_SETTINGS Function	Returns the settings used to build a model
GET_MODEL_SIGNATURE Function	Returns the signature of a model
GET_MODEL_TRANSFORMATIONS Function	Returns the tuser-specified ransformation definitions embedded in the model, as well as many of the ADP transformations
GET_TRANSFORM_LIST Procedure	Converts between two different transformation specification formats
IMPORT_MODEL Procedure	Imports a model into a user schema
RANK_APPLY Procedure	Ranks the predictions from the `APPLY` results for a classification model
REMOVE_COST_MATRIX Procedure	Removes a cost matrix from a model
RENAME_MODEL Procedure	Renames a model

ADD_COST_MATRIX Procedure

This procedure associates a cost matrix table with a classification model. The cost matrix biases the model by assigning costs or benefits to specific model outcomes.

The cost matrix is stored with the model and taken into account when the model is scored. The stored cost matrix is the default scoring matrix for the model.

You can also specify a cost matrix inline when you invoke a Data Mining SQL function for scoring. When an inline cost matrix is specified, it is used instead of the default, stored cost matrix (if one exists).

To obtain the default scoring matrix for a model, use the GET_MODEL_COST_MATRIX function. To remove the default scoring matrix from a model, use the REMOVE_COST_MATRIX procedure. See "GET_MODEL_COST_MATRIX Function" and "REMOVE_COST_MATRIX Procedure".

See Also:

"Biasing a Classification Model" in Oracle Data Mining Concepts for more information about costs
Oracle Database SQL Language Reference for syntax of inline cost matrix

Syntax

DBMS_DATA_MINING.ADD_COST_MATRIX (
       model_name                IN VARCHAR2,
       cost_matrix_table_name    IN VARCHAR2,
       cost_matrix_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 38-19 ADD_COST_MATRIX Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is assumed.
`cost_matrix_table_name`	Name of the cost matrix table (described in Table 38-20).
`cost_matrix_schema_name`	Schema of the cost matrix table. If no schema is specified, the current schema is used.

Usage Notes

If the model is not in your schema, then ADD_COST_MATRIX requires the ALTER ANY MINING MODEL system privilege or the ALTER object privilege for the mining model.

The cost matrix table must have the columns shown in Table 38-20. Note that the actual and predicted target values must have the same data type.

Table 38-20 Required Columns in a Cost Matrix Table

Column Name	Data Type
`ACTUAL_TARGET_VALUE`	`VARCHAR2(4000)` for categorical targets `NUMBER` for numeric targets
`PREDICTED_TARGET_VALUE`	`VARCHAR2(4000)`for categorical targets `NUMBER` for numeric targets
`COST`	`NUMBER`

Since a benefit can be viewed as a negative cost, you can specify a benefit for a given outcome by providing a negative number in the costs column of the cost matrix table.
All classification algorithms can use a cost matrix for scoring. The Decision Tree algorithm can also use a cost matrix at build time.If you want to build a Decision Tree model with a cost matrix, specify the cost matrix table name in the CLAS_COST_TABLE_NAME setting in the settings table for the model. See Table 38-7, "Mining Function Settings".

The cost matrix used to create a Decision Tree model becomes the default scoring matrix for the model. If you want to specify different costs for scoring, use the REMOVE_COST_MATRIX procedure to remove the cost matrix and the ADD_COST_MATRIX procedure to add a new one.

Example

This example creates a cost matrix table called COSTS_NB and adds it to a Naive Bayes model called NB_SH_CLAS_SAMPLE. The model has a binary target: 1 means that the customer responds to a promotion; 0 means that the customer does not respond. The cost matrix assigns a cost of 1 to misclassifications of customers who do not respond and a cost of 4 to misclassifications of customers who do respond. This means that it is four times more costly to misclassify responders than it is to misclassify nonresponders.

CREATE TABLE costs_nb (
  actual_target_value           NUMBER,
  predicted_target_value        NUMBER,
  cost                          NUMBER);
INSERT INTO costs_nb values (0, 0, 0);
INSERT INTO costs_nb values (0, 1, 1);
INSERT INTO costs_nb values (1, 0, 4);
INSERT INTO costs_nb values (1, 1, 0);
COMMIT;
 
EXEC dbms_data_mining.add_cost_matrix('nb_sh_clas_sample', 'costs_nb');
 
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
   FROM mining_data_apply_v
   WHERE PREDICTION(nb_sh_clas_sample COST MODEL
      USING cust_marital_status, education, household_size) = 1
   GROUP BY cust_gender
   ORDER BY cust_gender;
   
C        CNT    AVG_AGE
- ---------- ----------
F         80         39
M        572         43

ALTER_REVERSE_EXPRESSION Procedure

This procedure changes a reverse transformation expression to an expression that you specify.

A transformation expression specifies how an attribute will be changed before it is used to create a model. The reverse transformation expression specifies how the attribute will be returned to its original form (or some approximation of its original form) for purposes of model transparency.

The GET_MODEL_DETAILS functions apply reverse transformations before returning information about a model's attributes. When the target is transformed, its reverse transformation is applied to the results of scoring. See the Usage Notes.

You can use ALTER_REVERSE_EXPRESSION to associate a name/description with a cluster or feature; the descriptor that you specify will be displayed when the model is scored.You can use the CLUSTER_ID, CLUSTER_PROBABILITY, and CLUSTER_SET functions to score clustering models. You can use the FEATURE_ID, FEATURE_SET, or FEATURE_VALUE functions to score feature extraction models. See the example and Oracle Database SQL Language Reference.

You can use ALTER_REVERSE_EXPRESSION to associate a name/description with the results of an anomaly detection model. These models use One-Class SVM to predict either 1 or 0 for each record, indicating whether or not the record is anomalous. With ALTER_REVERSE_EXPRESSION, you can replace the zeros and ones with labels that are more meaningful.

Syntax

DBMS_DATA_MINING. ALTER_REVERSE_EXPRESSION (
         model_name             VARCHAR2,
         expression             CLOB,
         attribute_name         VARCHAR2 DEFAULT NULL,
         attribute_subname      VARCHAR2 DEFAULT NULL);

Parameters

Table 38-21 ALTER_REVERSE_EXPRESSION Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.
`expression`	A SQL expression.
`attribute_name`	Name of the attribute. Specify `NULL` if you wish to apply `expression` to a cluster , feature, or One-Class SVM prediction.
`attribute_subname`	Name of the nested attribute if `attribute_name` is a nested column, otherwise `NULL`.

Usage Notes

If you are using automatic data preparation, the transformations and reverse transformations are performed automatically by the system. When you create the model, you can specify transformations and reverse transformations to be applied in addition to the system-generated expressions. If you are not using automatic data preparation, the transformations and reverse transformations that you specify are the only ones that are applied.

Example

This example specifies labels for the clusters that result from the sh_clus model. The labels consist of the word "Cluster" and the internal numeric identifier for the cluster.

SQL> begin
  2    dbms_data_mining.alter_reverse_expression(
  3      'sh_clus', '''Cluster ''||value');
  4  end;
  5  /
 
SQL> select cust_id, cluster_id(sh_clus using *) cluster_id
  2    from sh_aprep_num
  3   where cust_id < 100011
  4   order by cust_id;
 
CUST_ID CLUSTER_ID
------- ------------------------------------------------
 100001 Cluster 18
 100002 Cluster 14
 100003 Cluster 14
 100004 Cluster 18
 100005 Cluster 19
 100006 Cluster 7
 100007 Cluster 18
 100008 Cluster 14
 100009 Cluster 8
 100010 Cluster 8

APPLY Procedure

This procedure applies a mining model to the data of interest, and generates the results in a table. The apply process is also referred to as scoring.

For predictive mining functions, the apply process generates predictions in a target column. For descriptive mining functions such as clustering, the apply process assigns each case to a cluster with a probability.

The apply operation is not applicable to association models and attribute importance models.

Note:

Scoring can also be performed directly in SQL using the Data Mining functions. See "Scoring and Deployment" in Oracle Data Mining Application Developer's Guide.

Syntax

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name  IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 38-22 APPLY Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.
`data_table_name`	Name of table or view representing data to be scored
`case_id_column_name`	Name of the case identifier column
`result_table_name`	Name of the table to store apply results
`data_schema_name`	Name of the schema containing the data to be scored

Usage Notes

The data provided for APPLY must undergo the same preprocessing as the data used to create and test the model. When you use Automatic Data Preparation, the preprocessing required by the algorithm is handled for you by the model — both at build time and apply time. (See "Automatic Data Preparation Setting".)
APPLY creates a table in the user's schema to hold the results. The columns are algorithm-specific.

The columns in the results table are listed in subsequent sections. The CASE_ID column will match the case identifier column name provided by you. The type of the incoming case ID column is preserved in APPLY output.
The data type for PREDICTION, CLUSTER_ID, and FEATURE_ID is influenced by any reverse expression that is embedded in the model by the user. If the user does not provide a reverse expression that alters the scored value type, then the types will conform to the descriptions in the following tables. See "ALTER_REVERSE_EXPRESSION Procedure".

Classification

The results table for classification has the columns described in Table 38-23. If the target of the model is categorical, the PREDICTION column will have a VARCHAR2 data type. If the target is numerical, the PREDICTION column will have a NUMBER data type.

Table 38-23 Columns in the APPLY Results Table for Classification Models

Column Name	Data Type
`CASE_ID`	`VARCHAR2` or `NUMBER`
`PREDICTION`	`VARCHAR2` or `NUMBER`
`PROBABILITY`	`NUMBER`

One-Class SVM (Anomaly Detection)

The results table for anomaly detection has the columns described in Table 38-24.

Table 38-24 Columns in the APPLY Results Table for Anomaly Detection Models

Column Name	Data Type
`CASE_ID`	`VARCHAR2` or `NUMBER`
`PREDICTION`	`NUMBER`
`PROBABILITY`	`NUMBER`

Values in the PREDICTION column can be either 0 or 1. When the prediction is 1, the case is a typical example. When the prediction is 0, the case is an outlier.

Regression using SVM or GLM

The results table for regression has the columns described in Table 38-25.

Table 38-25 Columns in the APPLY Results Table for Regression Models

Column Name	Data Type
`CASE_ID`	`VARCHAR2` or `NUMBER`
`PREDICTION`	`NUMBER`

Clustering using k-Means or O-Cluster

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the cluster identifier corresponding to a case, and the associated probability. The results table has the columns described in Table 38-26.

Table 38-26 Columns in the APPLY Results Table for Clustering Models

Column Name	Data Type
`CASE_ID`	`VARCHAR2` or `NUMBER`
`CLUSTER_ID`	`NUMBER`
`PROBABILITY`	`NUMBER`

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the feature identifier corresponding to a case, and the associated match quality. The results table has the columns described in Table 38-27.

Table 38-27 Columns in the APPLY Results Table for Feature Extraction Models

Column Name	Data Type
`CASE_ID`	`VARCHAR2` or `NUMBER`
FEATURE_ID	`NUMBER`
`MATCH_QUALITY`	`NUMBER`

Examples

This example applies the GLM regression model GLMR_SH_REGR_SAMPLE to the data in the MINING_DATA_APPLY_V view. The apply results are output to the table REGRESSION_APPLY_RESULT.

SQL> BEGIN
       DBMS_DATA_MINING.APPLY (
       model_name     => 'glmr_sh_regr_sample',
       data_table_name     => 'mining_data_apply_v',
       case_id_column_name => 'cust_id',
       result_table_name   => 'regression_apply_result');
    END;
    /
 
SQL> SELECT * FROM regression_apply_result WHERE cust_id >  101485;
 
   CUST_ID PREDICTION
---------- ----------
    101486 22.8048824
    101487 25.0261101
    101488 48.6146619
    101489   51.82595
    101490 22.6220714
    101491 61.3856816
    101492 24.1400748
    101493  58.034631
    101494 45.7253149
    101495 26.9763318
    101496 48.1433425
    101497 32.0573434
    101498 49.8965531
    101499  56.270656
    101500 21.1153047

COMPUTE_CONFUSION_MATRIX Procedure

This procedure computes the confusion matrix for a classification model and also provides the accuracy of the model.

See Oracle Data Mining Concepts for a description of confusion matrix.

Before executing a COMPUTE_CONFUSION_MATRIX procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
      accuracy                     OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      confusion_matrix_table_name  IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      cost_matrix_table_name       IN  VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 38-28 COMPUTE_CONFUSION_MATRIX Procedure Parameters

Parameter	Description
`accuracy`	Accuracy of the model
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the target table and the apply results table.
`target_column_name`	Name of the target column in the target table
`confusion_matrix_table_name`	Name of the table into which the confusion matrix is to be generated
`score_column_name`	Name of the column representing the score from the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score from the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`cost_matrix_table_name`	Name of the fixed-schema cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can also provide a cost matrix as an optional input in order to have the cost of predictions reflected in the results.

It is important to note that the inputs to COMPUTE_CONFUSION_MATRIX do not always have to be generated using APPLY. As long as the definition of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can produce the confusion matrix and accuracy. The quality of the results depends on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_CONFUSION_MATRIX procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. Use the result table name from APPLY as apply_result_table_name in the COMPUTE_CONFUSION_MATRIX procedure.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. Use the name of this second table as target_table_name.

The definition for the second view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

The definition for the second view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

You must provide the name of the table in which the confusion matrix is to be generated. The resulting fixed schema table will always be created in the schema owning the model.

For numerical target attributes, the confusion matrix table will have the definition:

(actual_target_value    NUMBER,
predicted_target_value  NUMBER,
value                   NUMBER)

For categorical target attributes, the confusion matrix table will have the definition:

(actual_target_value     VARCHAR2,
predicted_target_value   VARCHAR2,
value                    NUMBER)

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test',
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');
CREATE VIEW census_2d_test_view as select person_id, class from census_2d_test;

/* now compute the confusion matrix from the two
 * data streams, also providing a cost matrix as input.
 */
DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
  accuracy                     => v_accuracy,
  apply_result_table_name      => 'census_test_result',
  target_table_name            => 'census_2d_test_view',
  case_id_column_name          => 'person_id',
  target_column_name           => 'class',
  confusion_matrix_table_name  => 'census_confusion_matrix',
  cost_matrix_table_name       => 'census_cost_matrix');
DBMS_OUTPUT.PUT_LINE('Accuracy of the model: ' || v_accuracy);
END;
/

-- View the confusion matrix using Oracle SQL
SELECT actual_target_value, predicted_target_value, value
  FROM census_confusion_matrix;

COMPUTE_LIFT Procedure

This procedure computes a lift table for a given positive target for a classification model. See Oracle Data Mining Concepts for a description of lift.

Before executing a COMPUTE_LIFT procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_LIFT (
      apply_result_table_name      IN VARCHAR2,
      target_table_name            IN VARCHAR2,
      case_id_column_name          IN VARCHAR2,
      target_column_name           IN VARCHAR2,
      lift_table_name              IN VARCHAR2,
      positive_target_value        IN VARCHAR2,
      score_column_name            IN VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN VARCHAR2 DEFAULT 'PROBABILITY',
      num_quantiles                IN NUMBER DEFAULT 10,
      cost_matrix_table_name       IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN VARCHAR2 DEFAULT NULL,
      target_schema_name           IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 38-29 COMPUTE_LIFT Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column
`lift_table_name`	Name of the table into which the lift table is to be generated
`positive_target_value`	Value of the positive target. If the target column is of `NUMBER` type, use `TO_CHAR()` operator to provide the value as a string.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. This column must be a numerical type.
`num_quantiles`	Number of quantiles required in the lift table
`cost_matrix_table_name`	Name of the cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can also provide a cost matrix as an optional input to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_LIFT do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the lift table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_LIFT procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_LIFT procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The definition for this view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name             NUMBER)

The definition for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name            NUMBER)

You must provide the name of the table in which the lift table is to be generated. The resulting fixed schema table is always created in the schema that owns the model.

The resulting lift table will have the following definition:

(quantile_number               NUMBER,
 probability_threshold         NUMBER,
 gain_cumulative               NUMBER,
 quantile_total_count          NUMBER,
 quantile_target_count         NUMBER,
 percent_records_cumulative    NUMBER,
 lift_cumulative               NUMBER,
 target_density_cumulative     NUMBER,
 targets_cumulative            NUMBER,
 non_targets_cumulative        NUMBER,
 lift_quantile                 NUMBER,
 target_density                NUMBER)

When a cost matrix is passed to the COMPUTE_LIFT procedure, the cost threshold is returned in the probability_threshold column.

The output columns are explained in Oracle Data Mining Concepts.

Examples

DECLARE
  v_sql_stmt VARCHAR2(4000);
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test,
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */

/* now compute lift with the default 10 quantiles
 * from the two data streams
 */
DBMS_DATA_MINING.COMPUTE_LIFT (
  apply_result_table_name   => 'census_test_result',
  target_table_name         => 'census_2d_test_view',
  case_id_column_name       => 'person_id',
  target_column_name        => 'class',
  lift_table_name           => 'census_lift',
  positive_target_value     => '1',
  cost_matrix_table_name    => 'census_cost_matrix');
END;
/

-- View the lift table contents using SQL
SELECT *
  FROM census_lift;

COMPUTE_ROC Procedure

This procedure computes the receiver operating characteristic (ROC) for a binary classification model. See Oracle Data Mining Concepts for a description of receiver operating characteristic.

Before executing a COMPUTE_ROC procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_ROC (
      roc_area_under_curve         OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      roc_table_name               IN  VARCHAR2,
      positive_target_value        IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 38-30 COMPUTE_ROC Procedure Parameters

Parameter	Description
`roc_area_under_the_curve`	A measure of model accuracy, specifically, the probability that the model will correctly rank a randomly chosen pair of rows of opposite classes.
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing the case identifiers and target values from the test data. (See the Usage Notes.)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column
`roc_table_name`	Name of the table into which ROC results are to be generated. See Table 38-31, "COMPUTE_ROC Output".
`positive_target_value`	Value of the positive target. If the target column is of `NUMBER` type, use `TO_CHAR()` operator to provide the value as a string.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table

Usage Notes

It is important to note that the data inputs to COMPUTE_ROC do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the ROC table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_ROC procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_ROC procedure identifies the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifiers and target values from the test data. The parameter target_table_name identifies this table. For a numerical target attribute, the columns of this table are:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           NUMBER

For a categorical target attribute, the columns of this table are:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           VARCHAR2

You must provide the name of the table in which the ROC table is to be generated. The resulting table will always be created in the schema that owns the model, and it will always have the following columns.

(probability              NUMBER,
 true_positives           NUMBER,
 false_negatives          NUMBER,
 false_positives          NUMBER,
 true_negatives           NUMBER,
 true_positive_fraction   NUMBER,
 false_positive_fraction  NUMBER)

The output columns are explained in Table 38-31.

Table 38-31 COMPUTE_ROC Output

Output Column	Description
`probability`	Minimum predicted positive class probability resulting in a positive class prediction. Thus, different threshold values result in different hit rates and `false_alarm_rates`.
`true_negatives`	Negative cases in the test data with predicted probabilities below the `probability_threshold` (correctly predicted)
`true_positives`	Positive cases in the test data with predicted probabilities above the `probability_threshold` (correctly predicted)
`false_negatives`	Positive cases in the test data with predicted probabilities below the `probability_threshold` (incorrectly predicted)
`false_positives`	Negative cases in the test data with predicted probabilities above the `probability_threshold` (incorrectly predicted)
`true_positive_fraction`	t`rue_positives/(true_positives + false_negatives)`
`false_positive_fraction`	`false_positives/(false_positives + true_negatives)`

The typical use scenario is to examine the true_positive_fraction and false_positive_fraction to determine the most desirable probability_threshold. This threshold is then used to predict class values in subsequent apply operations. For example, to identify positively predicted cases in probability rank order from an apply result table, given a probability_threshold:

select case_id_column_name from apply_result_table_name where probability > probability_threshold order by probability DESC;

There are two procedures one might use to identify the most desirable probability_threshold. One procedure applies when the relative cost of positive class versus negative class prediction errors are known to the user. The other applies when such costs are not well known to the user. In the first instance, one can apply the relative costs to the ROC table to compute the minimum cost probability_threshold. Suppose the relative cost ratio, Positive Class Error Cost / Negative Class Error Cost = 20. Then execute a query like:

WITH cost AS (
  SELECT probability_threshold, 20 * false_negatives + false positives cost 
    FROM ROC_table 
  GROUP BY probability_threshold), 
    minCost AS (
      SELECT min(cost) minCost 
        FROM cost)
      SELECT max(probability_threshold)probability_threshold 
        FROM cost, minCost 
    WHERE cost = minCost;

If relative costs are not well known, the user simply scans the values in the table (in sorted order) and makes a determination about which of the displayed trade-offs (misclassified positives versus misclassified negatives) is most desirable:

select * from ROC_table order by probability_threshold

Examples

Assume that you have built a classification model census_model using the SVM algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test',
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */
v_sql_stmt :=
'CREATE VIEW census_2d_test_view AS ' ||
'SELECT person_id, class FROM census_2d_test';
EXECUTE IMMEDIATE v_sql_stmt;

/* now compute the receiver operating characterestics from
 * the two data streams, also providing a cost matrix
 * as input.
 */
DBMS_DATA_MINING.COMPUTE_ROC (
  accuracy                 => v_accuracy,
  apply_result_table_name  => 'census_test_result',
  target_table_name        => 'census_2d_test_view',
  case_id_column_name      => 'person_id',
  target_column_name       => 'class',
  roc_table_name           => 'census_roc',
  cost_matrix_table_name   => 'census_cost_matrix');
END;
/

-- View the ROC results using Oracle SQL
SELECT *
  FROM census_roc;

CREATE_MODEL Procedure

This procedure creates a mining model with a given mining function.

By passing an xform_list to CREATE_MODEL, you can specify a list of transformations to be performed on the input data. If the PREP_AUTO setting is on, the transformations are used in addition to the automatic transformations. If the PREP_AUTO setting is off, the specified transformations are the only ones implemented by the model. In both cases, the transformation definitions are embedded in the model, causing the data for model building, testing, and scoring to be prepared. See "Automatic Data Preparation Setting".

Syntax

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL,
      xform_list            IN TRANSFORM_LIST DEFAULT NULL);

Parameters

Table 38-32 CREATE_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used. See also "Mining Model Naming Restrictions".
`mining_function`	Constant representing the mining function. The values are listed in Table 38-2, "Mining Functions"
`data_table_name`	Name of the table or view containing the training data
`case_id_column_name`	Name of the case identifier column
`target_column_name`	Name of the target column — null for descriptive models and for One-Class SVM models
`settings_table_name`	Name of the table or view containing mining function settings and algorithm settings
`data_schema_name`	Name of the schema hosting the training data
`settings_schema_name`	Name of the schema hosting the settings table/view
`transform_list`	A list of transformations to be used in addition to or instead of automatic transformations, depending on the value of the `PREP_AUTO` setting. (See "Automatic Data Preparation Setting".) The data type is `TRANSFORM_LIST`, which consists of records of type `TRANSFORM_REC`. TYPE TRANFORM_REC IS RECORD ( attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), expression EXPRESSION_REC, reverse_expression EXPRESSION_REC, attribute_spec VARCHAR2(4000)); Each `TRANSFORM_REC` describes the transformation and reverse transformation for an attribute. `EXPRESSION_REC` stores a SQL expression for an individual transformation. The SQL expression stored in `EXPRESSION_REC` can be manipulated using routines in `DBMS_DATA_MINING_TRANSFORM`. Examples are: `SET_EXPRESSION`, `GET_EXPRESSION`, and `SET_TRANSFORM`.

Usage Notes

You can obtain information about a model by querying these data dictionary views.

ALL_MINING_MODELS
ALL_MINING_MODEL_ATTRIBUTES
ALL_MINING_MODEL_SETTINGS

Specify the USER prefix instead of ALL to obtain information about models in your own schema only.

See Also:

Oracle Data Mining Application Developer's Guide for information on the data dictionary views.

Examples

The first example builds a classification model using the Support Vector Machine algorithm.

/* prepare a settings table to override default
 * settings (Naive Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
DBMS_DATA_MINING.ALGO_NAME, DBMS_DATA_MINING.ALGO_SUPPORT_VECTOR_MACHINES);

/* override the default value for complexity factor */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_complexity_factor, TO_CHAR(0.081));
COMMIT;

/* build a model with name census_model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'census_model',
  mining_function      => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name      => 'census_2d_build',
  case_id_column_name  => 'person_id',
  target_column_name   => 'class',
  settings_table_name  => 'census_settings');
END;
/

You use similar code to build a One-Class SVM model. The main difference is that the target column is empty.

/* prepare a settings table to override default
 * settings (Naive Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
DBMS_DATA_MINING.ALGO_NAME, DBMS_DATA_MINING.ALGO_SUPPORT_VECTOR_MACHINES);

/* override the default value for outlier rate */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_outlier_rate, TO_CHAR(0.05));
COMMIT;

/* build a model with name census_model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'census_model',
  mining_function      => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name      => 'census_2d_build',
  case_id_column_name  => 'person_id',
  target_column_name   => NULL,
  settings_table_name  => 'census_settings');
END;
/

DROP_MODEL Procedure

This procedure drops an existing mining model.

Syntax

DBMS_DATA_MINING.DROP_MODEL (model_name IN VARCHAR2);

Parameters

Table 38-33 DROP_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Usage Notes

If an APPLY operation is using a model, and you attempt to drop the model during that time, the DROP will succeed and APPLY will return indeterminate results.

Examples

Assume the existence of a model census_model. The following example shows how to drop this model.

BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'census_model');
END;
/

EXPORT_MODEL Procedure

This procedure exports the specified data mining models to a dump file set. You can import from the dump file set using the IMPORT_MODEL procedure. Both EXPORT_MODEL and IMPORT_MODEL use Oracle Data Pump technology.

See Also:

Oracle Data Mining Administrator's Guide for more information on model export and import.

Syntax

DBMS_DATA_MINING.EXPORT_MODEL (
      filename          IN VARCHAR2,
      directory         IN VARCHAR2,
      model_filter      IN VARCHAR2 DEFAULT NULL,
      filesize          IN VARCHAR2 DEFAULT NULL,
      operation         IN VARCHAR2 DEFAULT NULL,
      remote_link       IN VARCHAR2 DEFAULT NULL,
      jobname           IN VARCHAR2 DEFAULT NULL);

Parameters

Table 38-34 EXPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set to which the models should be exported. The name must be unique within the schema. The dump file set can contain one or more files. The number of files in a dump file set is determined by the size of the models being exported (both metadata and data) and a specified or estimated maximum file size. You can specify the file size in the `filesize` parameter, or you can use the `operation` parameter to cause Oracle Data Pump to estimate the file size. If the size of the models to export is greater than the maximum file size, one or more additional files are created. When the export operation completes successfully, the name of the dump file set is automatically expanded to `filename01.dmp`, even if there is only one file in the dump set. If there are additional files, they are named sequentially as `filename02.dmp`, `filename03.dmp`, and so forth.
`directory`	Name of a pre-defined directory object that specifies where the dump file set should be created. You must have read/write privileges on the directory object and on the file system directory that it identifies.
`model_filter`	Optional parameter that specifies which model or models to export. If you do not specify a value for `model_filter`, all models in the schema are exported. You can also specify `NULL` (the default) or `'ALL'` to export all models. You can export individual models by name and groups of models based on mining function or algorithm. For instance, you could export all regression models or all Naive Bayes models. Examples are provided in Table 38-35.
`filesize`	Optional parameter that specifies the maximum size of a file in the dump file set. The size may be specified in bytes, kilobytes (K), megabytes (M), or gigabytes (G). The default size is 50 MB. If the size of the models to export is larger than `filesize`, one or more additional files are created within the dump set. See the description of the `filename` parameter for more information.
`operation`	Optional parameter that specifies whether or not to estimate the size of the files in the dump set. By default the size is not estimated and the value of the `filesize` parameter determines the size of the files. You can specify either of the following values for `operation`: `'EXPORT'` — Export all or the specified models. (Default) `'ESTIMATE'` — Estimate the size of the exporting models.
`remote_link`	Optional parameter not used in this release. Set to `NULL`.
`jobname`	Optional parameter that specifies the name of the export job. By default, the name has the form `username_exp_nnnn`, where `nnnn` is a number. For example, a job name in the `SCOTT` schema might be `SCOTT_exp_134`. If you specify a job name, it must be unique within the schema. The maximum length of the job name is 30 characters. A log file for the export job, named `jobname.log`, is created in the same directory as the dump file set.

Usage Notes

The model_filter parameter specifies which models to export. You can list the models by name, or you can specify all models that have the same mining function or algorithm. You can query the USER_MINING_MODELS view to list the models in your schema.

SQL> describe user_mining_models
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 MINING_FUNCTION                                    VARCHAR2(30)
 ALGORITHM                                          VARCHAR2(30)
 CREATION_DATE                             NOT NULL DATE
 BUILD_DURATION                                     NUMBER
 MODEL_SIZE                                         NUMBER
 COMMENTS                                           VARCHAR2(4000)

For more information on data dictionary views of mining models, see "Mining Model Objects".

Examples of model filters are provided in Table 38-35.

Table 38-35 Sample Values for the Model Filter Parameter

Sample Value	Meaning
`'mymodel'`	Export the model named `mymodel`
`'name= ''mymodel'''`	Export the model named `mymodel`
`'name IN (''mymodel2'',''mymodel3'')'`	Export the models named `mymodel2` and `mymodel3`
`'ALGORITHM_NAME = ''NAIVE_BAYES'''`	Export all Naive Bayes models. See Table 38-5 for a list of algorithm names.
`'FUNCTION_NAME =''CLASSIFICATION'''`	Export all classification models. See Table 38-2 for a list of mining functions.

Examples

The following statement exports all the models in the DMUSER3 schema to a dump file set called models_out in the directory $ORACLE_HOME/rdbms/log. This directory is mapped to a directory object called DATA_PUMP_DIR. The DMUSER3 user has read/write access to the directory and to the directory object.

SQL>execute dbms_data_mining.export_model ('models_out', 'DATA_PUMP_DIR');

You can exit SQL*Plus and list the resulting dump file and log file.

SQL>exit
>cd $ORACLE_HOME/rdbms/log
>ls
>DMUSER3_exp_1027.log  models_out01.dmp

The following example uses the same directory object and is executed by the same user. It exports the models called NMF_SH_SAMPLE and SVMR_SH_REGR_SAMPLE to a different dump file set in the same directory.

SQL>execute dbms_data_mining.export_model ( 'models2_out', 'DATA_PUMP_DIR',
            'name in (''NMF_SH_SAMPLE'', ''SVMR_SH_REGR_SAMPLE'')');
SQL>exit
>cd $ORACLE_HOME/rdbms/log
>ls
>DMUSER3_exp_1027.log  models_out01.dmp
 DMUSER3_exp_924.log  models2_out01.dmp

GET_ASSOCIATION_RULES Function

This table function returns the rules from an Association model.

You can specify filtering criteria to cause GET_ASSOCIATION_RULES to return a subset of the rules. Filtering criteria can improve the performance of the table function. If the number of rules is large, the greatest performance improvement will result from specifying the topn parameter.

Syntax

DBMS_DATA_MINING.GET_ASSOCIATION_RULES (
   model_name            IN VARCHAR2,
   topn                  IN NUMBER DEFAULT NULL,
   rule_id               IN INTEGER DEFAULT NULL,
   min_confidence        IN NUMBER DEFAULT NULL,
   min_support           IN NUMBER DEFAULT NULL,
   max_rule_length       IN INTEGER DEFAULT NULL,
   min_rule_length       IN INTEGER DEFAULT NULL,
   sort_order            IN ORA_MINING_VARCHAR2_NT DEFAULT NULL,
   antecedent_items      IN DM_ITEMS DEFAULT NULL,
   consequent_items      IN DM_ITEMS DEFAULT NULL,
   min_lift              IN NUMBER DEFAULT NULL)
 RETURN DM_RULES PIPELINED;

Parameters

Table 38-36 GET_ASSOCIATION_RULES Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used. This is the only required parameter of `GET_ASSOCIATION_RULES`. All other parameters specify optional filters on the rules to return.
`topn`	Return the n top rules ordered by confidence and then support, both descending. If you specify a sort order, the top n rules are derived after the sort is performed. If `topn` is specified and no maximum or minimum rule length is specified, then the only columns allowed in the sort order are `RULE_CONFIDENCE` and `RULE_SUPPORT`. If topn is specified and a maximum or minimum rule length is specified, then `RULE_CONFIDENCE`, `RULE_SUPPORT`, and `NUMBER_OF_ITEMS` are allowed in the sort order.
`rule_id`	Identifier of the rule to return. If you specify a value for `rule_id`, do not specify values for the other filtering parameters.
`min_confidence`	Return the rules with confidence greater than or equal to this number
`min_support`	Return the rules with support greater than or equal to this number
`max_rule_length`	Return the rules with a length less than or equal to this number. Rule length refers to the number of items in the rule (See `NUMBER_OF_ITEMS` in Table 38-37). For example, in the rule A=>B (if A, then B), the number of items is 2. If `max_rule_length` is specified, then the `NUMBER_OF_ITEMS` column is permitted in the sort order.
`min_rule_length`	Return the rules with a length greater than or equal to this number. See `max_rule_length` for a description of rule length. If `min_rule_length` is specified, then the `NUMBER_OF_ITEMS` column is permitted in the sort order.
`sort_order`	Sort the rules by the values in one or more of the returned columns. Specify one or more column names, each followed by `ASC` for ascending order or `DESC` for descending order. For example, to sort the result set in descending order first by the `NUMBER_OF_ITEMS` column, then by the `RULE_CONFIDENCE` column, you would specify: `ORA_MINING_VARCHAR2_NT('NUMBER_OF_ITEMS DESC', 'RULE_CONFIDENCE DESC')` If you specify `topn`, the results will vary depending on the sort order. By default, the results are sorted by confidence in descending order, then by support in descending order. See the examples.
`antecedent_items`	Return the rules with these items in the antecedent. See the examples.
`consequent_items`	Return the rules with this item in the consequent. See the examples.
`min_lift`	Return the rules with lift greater than or equal to this number.

Return Values

Table 38-37 GET_ASSOCIATION RULES Function Return Values

Return Value Description

Return Value	Description
`DM_RULES`	Represents a set of rows of type `DM_RULE`. The rows have the following columns: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER, rule_lift NUMBER, antecedent_support NUMBER, consequent_support NUMBER, number_of_items INTEGER )
	The `antecedent` and `consequent` columns each return nested tables of type `DM_PREDICATES`.The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

DM_RULES

Represents a set of rows of type DM_RULE. The rows have the following columns:

(rule_id              INTEGER,
 antecedent           DM_PREDICATES,
 consequent           DM_PREDICATES,
 rule_support         NUMBER,
 rule_confidence      NUMBER,
 rule_lift            NUMBER,
 antecedent_support   NUMBER,
 consequent_support   NUMBER,
 number_of_items      INTEGER )

The antecedent and consequent columns each return nested tables of type DM_PREDICATES.The rows, of type DM_PREDICATE, have the following columns:

(attribute_name            VARCHAR2(4000),
      attribute_subname         VARCHAR2(4000),
      conditional_operator      CHAR(2)/*=,<>,<,>,<=,>=*/,
      attribute_num_value       NUMBER,
      attribute_str_value       VARCHAR2(4000),
      attribute_support         NUMBER,
      attribute_confidence      NUMBER)

Usage Notes

This table function pipes out rows of type DM_RULES. For information on Data Mining data types and piped output from table functions, see "Data Types".

The ORA_MINING_VARCHAR2_NT type is defined as a table of VARCHAR2(4000).

Examples

The following example demonstrates an Association model build followed by several invocations of the GET_ASSOCIATION_RULES table function.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

-- build an AR model 
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/
-- View the (unformatted) rules 
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM TABLE(DBMS_DATA_MINING.GET_ASSOCIATION_RULES('market_model'));

In the previous example, you view all rules. To view just the top 20 rules, use the following statement.

-- View the top 20 (unformatted) rules
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM TABLE(DBMS_DATA_MINING.GET_ASSOCIATION_RULES('market_model', 20));

The following example returns all the rules which have 'AQUATIC' or 'EGGS' in the antecedent, and has 'VENOMOUS' as the consequent. The rules are sorted first by NUMBER_OF_ITEMS in descending order, then by RULE_CONFIDENCE in descending order, and finally by RULE_SUPPORT in descending order.

SELECT * FROM TABLE
(    DBMS_DATA_MINING.GET_ASSOCIATION_RULES
       ('AR_Model_31', 120, NULL, 1, .51, 7,
         ORA_MINING_VARCHAR2_NT
          ('NUMBER_OF_ITEMS DESC', 'RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC'),
         ORA_MINING_VARCHAR2_NT('AQUATIC', 'EGGS'),
         ORA_MINING_VARCHAR2_NT('VENOMOUS')));

GET_DEFAULT_SETTINGS Function

The GET_DEFAULT_SETTINGS function is deprecated in 11g Release 1 (11.1). It is replaced with the data dictionary view, *_MINING_MODEL_SETTINGS. USER_, ALL_, and DBA_ versions of the view are available. See Oracle Database Reference.

Oracle recommends that you do not use deprecated procedures in new applications. Support for deprecated features is for backward compatibility only.

Note:

See "Deprecated Subprograms"

This table function returns the default settings for all mining functions and algorithms supported in the DBMS_DATA_MINING package.

Syntax

DBMS_DATA_MINING.GET_DEFAULT_SETTINGS
  RETURN DM_MODEL_SETTINGS PIPELINED;

Return Values

Table 38-38 GET_DEFAULT_SETTINGS Function Return Values

Return Value	Description
`DM_MODEL_SETTINGS`	Represents a set of rows of type `DM_MODEL_SETTING`. The rows have the following columns: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

Usage Notes

This table function pipes out rows of type DM_MODEL_SETTING. For information on Data Mining data types and piped output from table functions, see "Data Types".

This function is particularly useful if you do not know what settings are associated with a particular function or algorithm, and you want to override some or all of them.

Examples

For example, if you want to override some or all of k-Means clustering settings, you can create a settings table as shown, and update individual settings as required.

BEGIN
  CREATE TABLE mysettings AS
  SELECT * 
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
   WHERE setting_name LIKE 'KMNS%';
  -- now update individual settings as required
  UPDATE mysettings
     SET setting_value = 0.02
   WHERE setting_name = DBMS_DATA_MINING.KMNS_MIN_PCT_ATTR_SUPPORT;
END;
/

GET_FREQUENT_ITEMSETS Function

This table function returns a set of rows that represent the frequent itemsets from an Association model. For a detailed description of frequent itemsets, consult Oracle Data Mining Concepts.

Syntax

DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS (
    model_name          IN VARCHAR2,
    topn                IN NUMBER DEFAULT NULL,
    max_itemset_length  IN NUMBER DEFAULT NULL)
  RETURN DM_ITEMSETS PIPELINED;

Parameters

Table 38-39 GET_FREQUENT_ITEMSETS Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.
`topn`	When not `NULL`, return the top n rows ordered by support in descending order
`max_itemset_length`	Maximum length of an item set.

Return Values

Table 38-40 GET_FREQUENT_ITEMSETS Function Return Values

Return Value Description

Return Value	Description
`DM_ITEMSETS`	Represents a set of rows of type `DM_ITEMSET`. The rows have the following columns: (itemsets_id NUMBER, items DM_ITEMS, support NUMBER, number_of_items NUMBER) The `items` column returns a nested table of type `DM_ITEMS`. The rows have type `DM_ITEM`: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000))

DM_ITEMSETS

Represents a set of rows of type DM_ITEMSET. The rows have the following columns:

(itemsets_id      NUMBER,
items             DM_ITEMS,
support           NUMBER,
number_of_items   NUMBER)

The items column returns a nested table of type DM_ITEMS. The rows have type DM_ITEM:

(attribute_name      VARCHAR2(4000),
attribute_subname    VARCHAR2(4000),
attribute_num_value  NUMBER,
attribute_str_value  VARCHAR2(4000))

Usage Notes

This table function pipes out rows of type DM_ITEMSETS. For information on Data Mining data types and piped output from table functions, see "Data Types".

Examples

The following example demonstrates an Association model build followed by an invocation of GET_FREQUENT_ITEMSETS table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS

SELECT *

FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE market_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build a AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'market_model',
  function             => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name      => 'market_build',
  case_id_column_name  => 'item_id',
  target_column_name   => NULL,
  settings_table_name  => 'census_settings');
END;
/

-- View the (unformatted) Itemsets from SQL*Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model'));

In the example above, you view all itemsets. To view just the top 20 itemsets, use the following statement:

-- View the top 20 (unformatted) Itemsets from SQL*Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model', 20));

GET_MODEL_COST_MATRIX Function

This function returns the rows of the default scoring matrix associated with the specified model.

By default, this function returns the scoring matrix that was added to the model with the ADD_COST_MATRIX procedure. If you wish to obtain the cost matrix used to create a model, specify cost_matrix_type_create as the matrix_type. See Table 38-41.

Return Value	Description
`DM_COST_MATRIX`	Represents a set of rows of type `DM_COST_ELEMENT`. The rows have the following columns: actual VARCHAR2(4000), predicted VARCHAR2(4000), cost NUMBER)

GET_MODEL_DETAILS_ABN Function

The Adaptive Bayes Network algorithm ABN algorithm is deprecated in 11g Release 1 (11.1).

Oracle recommends that you do not use deprecated procedures in new applications. Support for deprecated features is for backward compatibility only.

Note:

See "Deprecated Subprograms"

This table function returns a set of rows that provide the details of an Adaptive Bayes Network model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN (
    model_name         IN VARCHAR2)
  RETURN DM_ABN_DETAILS PIPELINED;

Parameters

Table 38-43 GET_MODEL_DETAILS_ABN Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-44 GET_MODEL_DETAILS_ABN Function Return Values

Return Value Description

Return Value	Description
`DM_ABN_DETAILS`	Represents a set of rows of type `DM_ABN_DETAIL`. The rows have the following columns: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER)
	The `antecedent` and `consequent` columns of `DM_ABN_DETAIL` each return nested tables of type `DM_PREDICATES`. The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2), /=,<>,<,>,<=,>=/ attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

DM_ABN_DETAILS

Represents a set of rows of type DM_ABN_DETAIL. The rows have the following columns:

(rule_id           INTEGER,
 antecedent        DM_PREDICATES,
 consequent        DM_PREDICATES,
 rule_support      NUMBER)

The antecedent and consequent columns of DM_ABN_DETAIL each return nested tables of type DM_PREDICATES. The rows, of type DM_PREDICATE, have the following columns:

(attribute_name          VARCHAR2(4000),
      attribute_subname       VARCHAR2(4000),
      conditional_operator    CHAR(2), /*=,<>,<,>,<=,>=*/
      attribute_num_value     NUMBER,
      attribute_str_value     VARCHAR2(4000),
      attribute_support       NUMBER,
      attribute_confidence    NUMBER)

Usage Notes

This table function pipes out rows of type DM_ABN_DETAIL. For information on Data Mining data types and piped output from table functions, see "Data Types".

This function returns details only for a single feature ABN model.

Examples

The following example demonstrates an ABN model build followed by an invocation of GET_MODEL_DETAILS_ABN table function from Oracle SQL.

BEGIN
  -- prepare a settings table to override default algorithm and model type
  CREATE TABLE abn_settings (setting_name VARCHAR2(30),
  setting_value 
VARCHAR2(128));
  INSERT INTO abn_settings VALUES (DBMS_DATA_MINING.ALGO_NAME,
    DBMS_DATA_MINING.ALGO_ADAPTIVE_BAYES_NETWORK);
  INSERT INTO abn_settings VALUES    (DBMS_DATA_MINING.ABNS_MODEL_TYPE,     DBMS_DATA_MINING.ABNS_SINGLE_FEATURE);
   COMMIT;
  -- create a model
  DBMS_DATA_MINING.CREATE_MODEL (
    model_name           => 'abn_model',
    function             => DBMS_DATA_MINING.CLASSIFICATION,
    data_table_name      => 'abn_build',
    case_id_column_name  => 'id',
    target_column_name   => NULL,
    settings_table_name  => 'abn_settings');
END;
/
-- View the (unformatted) results from SQL*Plus
SELECT *
    FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN('abn_model'));

GET_MODEL_DETAILS_AI Function

This table function returns a set of rows that provide the details of an Attribute Importance model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_AI (
  model_name         IN VARCHAR2)
 RETURN DM_RANKED_ATTRIBUTES PIPELINED;

Parameters

Table 38-45 GET_MODEL_DETAILS_AI Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-46 GET_MODEL_DETAILS_AI Function Return Values

Return Value	Description
`DM_RANKED_ATTRIBUTES`	Represents a set of rows of type `DM_RANKED_ATTRIBUTE`. The rows have the following columns: (attribute_name VARCHAR2(4000, attribute_subname VARCHAR2(4000), importance_value NUMBER, rank NUMBER(38))

GET_MODEL_DETAILS_GLM Function

This table function returns the coefficient statistics for a Generalized Linear Model.

The same set of statistics is returned for both linear and logistic regression, but statistics that do not apply to the mining function are returned as NULL. For more details, see the Usage Notes.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_GLM (
             model_name             VARCHAR2)
RETURN DM_GLM_COEFF_SET PIPELINED;

Parameters

Table 38-47 GET_MODEL_DETAILS_GLM Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-48 GET_MODEL_DETAILS_GLM Return Values

Return Value Description

DM_GLM_COEFF_SET

Represents a set of rows of type DM_GLM_COEFF. The rows have the following columns:

(class                   VARCHAR2(4000),
 attribute_name          VARCHAR2(4000),
 attribute_subname       VARCHAR2(4000),
 attribute_value         VARCHAR2(4000),
 coefficient             NUMBER,
 std_error               NUMBER,
 test_statistic          NUMBER,
 p_value                 NUMBER,
 VIF                     NUMBER,
 std_coefficient         NUMBER,
 lower_coeff_limit       NUMBER,
 upper_coeff_limit       NUMBER,
 exp_coefficient         BINARY_DOUBLE,
 exp_lower_coeff_limit   BINARY_DOUBLE,
 exp_upper_coeff_limit   BINARY_DOUBLE)

GET_MODEL_DETAILS_GLM returns a row of statistics for each attribute and one extra row for the intercept, which is identified by a null value in the attribute name. Each row has the DM_GLM_COEFF data type. The statistics are described in Table 38-49.

Table 38-49 DM_GLM_COEFF Data Type Description

Column	Description
`class`	The non-reference target class for logistic regression. The model is built to predict the probability of this class. The other class (the reference class) is specified in the model setting `GLMS_REFERENCE_CLASS_NAME`. See Table 38-11, "GLM Settings". For linear regression, `class` is null.
`attribute_name`	The attribute name when there is no subname, or first part of the attribute name when there is a subname. The value of `attribute_name` is also the name of the column in the case table that is the source for this attribute. For the intercept, `attribute_name` is null. Intercepts are equivalent to the bias term in SVM models.
`attribute_subname`	The name of an attribute in a nested table. The full name of a nested attribute has the form: `attribute_name.attribute_subname` where `attribute_name` is the name of the nested column in the case table that is the source for this attribute. If the attribute is not nested, `attribute_subname` is null. If the attribute is an intercept, both the `attribute_name` and the `attribute_subname` are null.
`attribute_value`	The value of the attribute (categorical attribute only). For numerical attributes, `attribute_value` is null.
`coefficient`	The linear coefficient estimate.
`std_error`	Standard error of the coefficient estimate.
`test_statistic`	For linear regression, the t-value of the coefficient estimate. For logistic regression, the Wald chi-square value of the coefficient estimate.
`p-value`	Probability of the `test_statistic`. Used to analyze the significance of specific attributes in the model.
`VIF`	Variance Inflation Factor. The value is zero for the intercept. For logistic regression, `VIF` is null.
`std_coefficient`	Standardized estimate of the coefficient.
`lower_coeff_limit`	Lower confidence bound of the coefficient.
`upper_coeff_limit`	Upper confidence bound of the coefficient.
`exp_coefficient`	Exponentiated coefficient for logistic regression. For linear regression, `exp_coefficient` is null.
`exp_lower_coeff_limit`	Exponentiated coefficient for lower confidence bound of the coefficient for logistic regression. For linear regression, `exp_lower_coeff_limit` is null.
`exp_upper_coeff_limit`	Exponentiated coefficient for upper confidence bound of the coefficient for logistic regression. For linear regression, `exp_lower_coeff_limit` is null.

Usage Notes

Not all statistics are necessarily returned for each coefficient. Statistics will be null if:

They do not apply to the mining function. For example, exp_coefficient does not apply to linear regression.
They cannot be computed from a theoretical standpoint. For example, when ridge regression is enabled, the coefficient values are returned with no statistics except VIF if it is enabled. (For information on ridge regression, see Table 38-11, "GLM Settings".)
They cannot be computed because of limitations in system resources.
Their values would be infinity.

GET_MODEL_DETAILS_GLOBAL Function

This table function returns statistics about the model as a whole. Global details are available for GLM and for association rules.

Separate global details are returned for linear and logistic regression. When ridge regression is enabled, fewer global details are returned. For information about ridge, see Table 38-11, "GLM Settings".

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_GLOBAL (
      model_name     IN  VARCHAR2)
RETURN DM_MODEL_GLOBAL_DETAILS PIPELINED;

Parameters

Table 38-50 GET_MODEL_DETAILS_GLOBAL Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-51 GET_MODEL_DETAILS_GLOBAL Function Return Values

Return Value	Description
`DM_MODEL_GLOBAL_DETAILS`	A collection of rows of type `DM_MODEL_GLOBAL_DETAIL`. The rows have the following columns: (global_detail_name VARCHAR2(30), global_detail_value NUMBER)

Global Details for GLM: Linear Regression

Table 38-52 Global Details for Linear Regression

GLOBAL_DETAIL_NAME	Description
`MODEL_DF`	Model degrees of freedom
`MODEL_SUM_SQUARES`	Model sum of squares
`MODEL_MEAN_SQUARE`	Model mean square
`F_VALUE`	Model F value statistic
`MODEL_F_P_VALUE`	Model F value probability
`ERROR_DF`	Error degrees of freedom
`ERROR_SUM_SQUARES`	Error sum of squares
`ERROR_MEAN_SQUARE`	Error mean square
`CORRECTED_TOTAL_DF`	Corrected total degrees of freedom
`CORRECTED_TOT_SS`	Corrected total sum of squares
`ROOT_MEAN_SQ`	Root mean square error
`DEPENDENT_MEAN`	Dependent mean
`COEFF_VAR`	Coefficient of variation
`R_SQ`	R-Square
`ADJUSTED_R_SQUARE`	Adjusted R-Square
`AIC`	Akaike's information criterion
`SBIC`	Schwarz's Bayesian information criterion
`GMSEP`	Estimated mean square error of the prediction, assuming multivariate normality
`HOCKING_SP`	Hocking Sp statistic
`J_P`	JP statistic (the final prediction error)
`NUM_PARAMS`	Number of parameters (the number of coefficients, including the intercept)
`NUM_ROWS`	Number of rows
`MODEL_CONVERGED`	Whether or not the model converged. Value is `1` if it converged, or `0` if it did not converge
`VALID_COVARIANCE_MATRIX`	Valid covariance matrix. Value is `1` if the covariance matrix was computed, or `0` if it was not computed

Global Details for GLM: Logistic Regression

Table 38-53 Global Details for Logistic Regression

GLOBAL_DETAIL_NAME	Description
`AIC_INTERCEPT`	Akaike's criterion for the fit of the intercept only model
`AIC_MODEL`	Akaike's criterion for the fit of the intercept and the covariates (predictors) mode
`SC_INTERCEPT`	Schwarz's Criterion for the fit of the intercept only model
`SC_MODEL`	Schwarz's Criterion for the fit of the intercept and the covariates (predictors) model
`NEG2_LL_INTERCEPT`	-2 log likelihood of the intercept only model
`NEG2_LL_MODEL`	-2 log likelihood of the model
`LR_DF`	Likelihood ratio degrees of freedom
`LR_CHI_SQ`	Likelihood ratio chi-square value
`LR_CHI_SQ_P_VALUE`	Likelihood ratio chi-square probability value
`PSEUDO_R_SQ_CS`	Pseudo R-square Cox and Snell
`PSEUDO_R_SQ_N`	Pseudo R-square Nagelkerke
`DEPENDENT_MEAN`	Dependent mean
`PCT_CORRECT`	Percent of correct predictions
`PCT_INCORRECT`	Percent of incorrectly predicted rows
`PCT_TIED`	Percent of cases where probability for both cases is the same
`NUM_PARAMS`	Number of parameters (the number of coefficients, including the intercept)
`NUM_ROWS`	Number of rows
`MODEL_CONVERGED`	Whether or not the model converged. Value is `1`if it converged, or `0` if it did not converge.
`VALID_COVARIANCE_MATRIX`	Valid covariance matrix. Value is `1` if the covariance matrix was computed, or `0` if the covariance matrix not computed

Global Detail for Association Rules

A single global detail is produced by an Association model.

Table 38-54 Global Detail for Association Rules

GLOBAL_DETAIL_NAME	Description
`RULE_COUNT`	The number of association rules in the model.

GET_MODEL_DETAILS_KM Function

This table function returns a set of rows that provide the details of a k-Means clustering model.

You can provide input to GET_MODEL_DETAILS_KM to request specific information about the model, thus improving the performance of the query. If you do not specify filtering parameters, GET_MODEL_DETAILS_KM returns all the information about the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_KM (
          model_name          VARCHAR2,
          cluster_id          NUMBER    DEFAULT NULL,
          attribute           VARCHAR2  DEFAULT NULL,
          centroid            NUMBER    DEFAULT 1, 
          histogram           NUMBER    DEFAULT 1, 
          rules               NUMBER    DEFAULT 2,
          attribute_subname   VARCHAR2  DEFAULT NULL)
RETURN DM_CLUSTERS PIPELINED;

Parameters

Table 38-55 GET_MODEL_DETAILS_KM Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.
`cluster_id`	The ID of a cluster in the model. When a valid cluster ID is specified, only the details of this cluster are returned. Otherwise the details for all clusters are returned.
`attribute`	The name of an attribute. When a valid attribute name is specified, only the details of this attribute are returned. Otherwise the details for all attributes are returned
`centroid`	This parameter accepts the following values: 1 — Details about centroids are returned (default) 0 — Details about centroids are not returned
`histogram`	This parameter accepts the following values: 1 — Details about histograms are returned (default) 0 — Details about histograms are not returned
`rules`	This parameter accepts the following values: 2 — Details about rules are returned (default) 1 — Rule summaries are returned 0 — No information about rules is returned

Return Values

Table 38-56 GET_MODEL_DETAILS_KM Function Return Values

Return Value	Description
`DM_CLUSTERS`	Represents a set of rows of type `DM_CLUSTER`. The rows have the following columns: (id NUMBER, cluster_id VARCHAR2(4000), record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE)
	The `split_predicate` column of `DM_CLUSTER` returns a nested table of type `DM_PREDICATES`. Each row, of type `DM_PREDICATE`, has the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2) /=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)
	The `child` column of `DM_CLUSTER` returns a nested table of type `DM_CHILDREN`. The rows, of type `DM_CHILD`, have a single column of type `NUMBER`, which contains the identifiers of each child.
	The `centroid` column of `DM_CLUSTER` returns a nested table of type `DM_CENTROIDS`. The rows, of type `DM_CENTROID`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), mean NUMBER, mode_value VARCHAR2(4000), variance NUMBER)
	The `histogram` column of `DM_CLUSTER` returns a nested table of type `DM_HISTOGRAMS`. The rows, of type `DM_HISTOGRAM_BIN`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER)
	The `rule` column of `DM_CLUSTER` returns a single row of type `DM_RULE`. The columns are: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER, rule_lift NUMBER, antecedent_support NUMBER, consequent_support NUMBER, number_of_items INTEGER)
	The `antecedent` and `consequent` columns of `DM_RULE` each return nested tables of type `DM_PREDICATES`. The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Usage Notes

The table function pipes out rows of type DM_CLUSTERS. For information on Data Mining data types and piped output from table functions, see "Data Types".

Examples

The following example demonstrates a k-Means clustering model build followed by an invocation of GET_MODEL_DETAILS_KM table function from Oracle SQL.

BEGIN
-- create a settings table
UPDATE cluster_settings
   SET setting_value = 3 
 WHERE setting_name = DBMS_DATA_MINING.KMEANS_BLOCK_GROWTH;

/* build a k-Means clustering model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'eight_clouds',
  function             => DBMS_DATA_MINING.CLUSTERING,
  data_table_name      => 'eight_clouds_build',
  case_id_column_name  => 'id',
  target_column_name   => NULL,
  settings_table_name  => 'cluster_settings');
END;
/

-- View the (unformatted) rules from SQL*Plus
SELECT id, record_count, parent, tree_level, dispersion,
       child, centroid, histogram, rule
  FROM TABLE(DBMS_DATA_MINING_GET_MODEL_DETAILS_KM('eight_clouds'));

GET_MODEL_DETAILS_NB Function

This table function returns a set of rows that provide the details of a Naive Bayes model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NB (
   model_name      IN       VARCHAR2)
 RETURN DM_NB_DETAILS PIPELINED;

Parameters

Table 38-57 GET_MODEL_DETAILS_NB Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-58 GET_MODEL_DETAILS_NB Function Return Values

Return Value Description

DM_NB_DETAILS

Represents a set of rows of type DM_NB_DETAIL. The rows have the following columns:

(target_attribute_name          VARCHAR2(30),
 target_attribute_str_value     VARCHAR2(4000),
 target_attribute_num_value     NUMBER,
 prior_probability              NUMBER,
 conditionals                   DM_CONDITIONALS)

The conditionals column of DM_NB_DETAIL returns a nested table of type DM_CONDITIONALS. The rows, of type DM_CONDITIONAL, have the following columns:

(attribute_name             VARCHAR2(4000),
      attribute_subname          VARCHAR2(4000),
      attribute_str_value        VARCHAR2(4000),
      attribute_num_value        NUMBER,
      conditional_probability    NUMBER)

Usage Notes

The table function pipes out rows of type DM_NB_DETAILS. For information on Data Mining data types and piped output from table functions, see "Data Types".

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details as shown in this example.

-- You can view the Naive Bayes model details in many ways
-- Consult the Oracle Application Developer's Guide -
-- Object-Relational Features for different ways of 
-- accessing Oracle Objects.

-- View the (unformatted) details from SQL*Plus
SELECT attribute_name, attribute_num_value, attribute_str_value,
       prior_probability, conditionals,
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NB('census_model');

See nbdemo.sql for generation of formatted rules.

GET_MODEL_DETAILS_NMF Function

This table function returns a set of rows that provide the details of a Non-Negative Matrix Factorization model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF (
   model_name        IN        VARCHAR2)
 RETURN DM_NMF_FEATURE_SET PIPELINED;

Parameters

Table 38-59 GET_MODEL_DETAILS_NMF Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-60 GET_MODEL_DETAILS_NMF Function Return Values

Return Value	Description
`DM_NMF_FEATURE_SET`	Represents a set of rows of `DM_NMF_FEATURE`. The rows have the following columns: (feature_id NUMBER, mapped_feature_id VARCHAR2(4000), attribute_set DM_NMF_ATTRIBUTE_SET)
	The `attribute_set` column of `DM_NMF_FEATURE` returns a nested table of type `DM_NMF_ATTRIBUTE_SET`. The rows, of type `DM_NMF_ATTRIBUTE`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), attribute_value VARCHAR2(4000), coefficient NUMBER)

Usage Notes

The table function pipes out rows of type DM_NMF_FEATURE_SET. For information on Data Mining data types and piped output from table functions, see "Data Types".

Examples

Assume you have built an NMF model called my_nmf_model. You can retrieve model details as shown:

--View (unformatted) details from SQL*Plus 
SELECT feature_id, attribute_set 
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF(
        'my_nmf_model'));

GET_MODEL_DETAILS_OC Function

This table function returns a set of rows that provide the details of an O-Cluster clustering model. The rows are an enumeration of the clustering patterns generated during the creation of the model.

You can provide input to GET_MODEL_DETAILS_OC to request specific information about the model, thus improving the performance of the query. If you do not specify filtering parameters, GET_MODEL_DETAILS_OC returns all the information about the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_OC (
          model_name         VARCHAR2,
          cluster_id         NUMBER    DEFAULT NULL,
          attribute          VARCHAR2  DEFAULT NULL,
          centroid           NUMBER    DEFAULT 1, 
          histogram          NUMBER    DEFAULT 1, 
          rules              NUMBER    DEFAULT 2)
RETURN DM_CLUSTERS PIPELINED;

Parameters

Table 38-61 GET_MODEL_DETAILS_OC Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.
`cluster_id`	The ID of a cluster in the model. When a valid cluster ID is specified, only the details of this cluster are returned. Otherwise the details for all clusters are returned.
`attribute`	The name of an attribute. When a valid attribute name is specified, only the details of this attribute are returned. Otherwise the details for all attributes are returned
`centroid`	This parameter accepts the following values: 1 — Details about centroids are returned (default) 0 — Details about centroids are not returned
`histogram`	This parameter accepts the following values: 1 — Details about histograms are returned (default) 0 — Details about histograms are not returned
`rules`	This parameter accepts the following values: 2 — Details about rules are returned (default) 1 — Rule summaries are returned 0 — No information about rules is returned

Return Values

Table 38-62 GET_MODEL_DETAILS_OC Function Return Values

Return Value	Description
`DM_CLUSTERS`	Represents a set of rows of type `DM_CLUSTER`. The rows have the following columns: (id NUMBER, cluster_id VARCHAR2(4000), record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE)
	The `split_predicate` column of `DM_CLUSTER` returns a nested table of type `DM_PREDICATES`. Each row, of type `DM_PREDICATE`, has the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2) /=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)
	The `child` column of `DM_CLUSTER` returns a nested table of type `DM_CHILDREN`. The rows, of type `DM_CHILD`, have a single column of type `NUMBER`, which contains the identifiers of each child.
	The `centroid` column of `DM_CLUSTER` returns a nested table of type `DM_CENTROIDS`. The rows, of type `DM_CENTROID`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), mean NUMBER, mode_value VARCHAR2(4000), variance NUMBER)
	The `histogram` column of `DM_CLUSTER` returns a nested table of type `DM_HISTOGRAMS`. The rows, of type `DM_HISTOGRAM_BIN`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER)
	The `rule` column of `DM_CLUSTER` returns a single row of type `DM_RULE`. The columns are: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER, rule_lift NUMBER, antecedent_support NUMBER, consequent_support NUMBER, number_of_items INTEGER)
	The `antecedent` and `consequent` columns each return nested tables of type `DM_PREDICATES`.The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Usage Notes

The table function pipes out rows of type DM_CLUSTER. For information about Data Mining data types and piped output from table functions, see "Data Types".

Examples

Assume you have built an OC model called my_oc_model. You can retrieve information from the model details as shown:

--View (unformatted) details from SQL*Plus 
SELECT T.id           clu_id,
       T.record_count rec_cnt,
       T.parent       parent,
       T.tree_level   tree_level
  FROM (SELECT *
          FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_OC(
                 'my_oc_model'))
        ORDER BY id) T
 WHERE ROWNUM < 11;

GET_MODEL_DETAILS_SVM Function

This table function returns a set of rows that provide the details of a Support Vector Machine model. This is applicable only for classification or regression models built using a linear kernel. For any other kernel, the table function returns ORA-40215.

In linear SVM models, only nonzero coefficients are stored. This reduces storage and speeds up model loading. As a result, if an attribute is missing in the coefficient list returned by GET_MODEL_DETAILS_SVM, then the coefficient of this attribute should be interpreted as zero.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM (
  model_name      IN       VARCHAR2)
 RETURN DM_SVM_LINEAR_COEFF_SET PIPELINED;

Parameters

Table 38-63 GET_MODEL_DETAILS_SVM Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-64 GET_MODEL_DETAILS_SVM Function Return Values

Return Value Description

DM_SVM_LINEAR_COEFF_SET

Represents a set of rows of type DM_SVM_LINEAR_COEFF. The rows have the following columns:

(class            VARCHAR2(4000),
 attribute_set    DM_SVM_ATTRIBUTE_SET)

The attribute_set column returns a nested table of type DM_SVM_ATTRIBUTE_SET. The rows, of type DM_SVM_ATTRIBUTE, have the following columns:

(attribute_name      VARCHAR2(4000),
      attribute_subname   VARCHAR2(4000),
      attribute_value     VARCHAR2(4000),
      coefficient         NUMBER)

See Usage Notes.

Usage Notes

The table function pipes out rows of type DM_SVM_LINEAR_COEFF. For information on Data Mining data types and piped output from table functions, see "Data Types".

The class column of DM_SVM_LINEAR_COEFF represents classification target values. For regression targets, class is null. For each classification target value for classification models, a set of coefficients is returned. For binary classification, one-class classifier, and regression models, only a single set of coefficients is returned.

The attribute_value column in the nested table DM_SVM_ATTRIBUTE_SET is used for categorical attributes. The coefficient column is the linear coefficient value.

Examples

The following example demonstrates an SVM model build followed by an invocation of GET_MODEL_DETAILS_SVM table function from Oracle SQL:

-- Create SVM model
BEGIN
  dbms_data_mining.create_model(
    model_name           => 'SVM_Clas_sample',
    mining_function      => dbms_data_mining.classification,
    data_table_name      => 'svmc_sample_build_prepared',
    case_id_column_name  => 'id',
    target_column_name   => 'affinity_card',
    settings_table_name  => 'svmc_sample_settings');
END;
/
-- Display model details
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM('SVM_Clas_sample'))
ORDER BY class;

GET_MODEL_DETAILS_XML Function

This table function returns an XML object that provides the details of a Decision Tree model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_XML (
         model_name      IN       VARCHAR2)
   RETURN XMLTYPE;

Parameters

Table 38-65 GET_MODEL_DETAILS_XML Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-66 GET_MODEL_DETAILS_XML Function Return Value

Return Value	Description
`XMLTYPE`	The PMML 2.1 XML definition for the decision tree model.

Usage Notes

The function returns the XML representing the decision tree; the definition is the one specified in the Data Mining Group Predictive Model Markup Language (PMML) version 2.1 specification. The specification is available at http://www.dmg.org.

GET_MODEL_SETTINGS Function

The GET_MODEL_SETTINGS function is deprecated in 11g Release 1 (11.1). It is replaced with the data dictionary view, *_MINING_MODEL_SETTINGS. USER_, ALL_, and DBA_ versions of the view are available. See Oracle Database Reference.

Oracle recommends that you do not use deprecated procedures in new applications. Support for deprecated features is for backward compatibility only.

Note:

See "Deprecated Subprograms"

This table function returns the list of settings that were used to build the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_SETTINGS(
   model_name           IN VARCHAR2)
 RETURN DM_MODEL_SETTINGS PIPELINED;

Parameters

Table 38-67 GET_MODEL_SETTINGS Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-68 GET_MODEL_SETTINGS Function Return Values

Return Value	Description
`DM_MODEL_SETTINGS`	Represents a set of rows of type `DM_MODEL_SETTING`. The rows have the following columns: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

Usage Notes

The table function pipes out rows of type DM_MODEL_SETTING. For information about Data Mining data types and piped output from table functions, see "Data Types".

You can use this table function to determine the settings that were used to build the model. This is purely for informational purposes only — you cannot alter the model to adopt new settings.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model settings using Oracle SQL as follows:

SELECT setting_name, setting_value
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SETTINGS('census_model'));

GET_MODEL_SIGNATURE Function

The GET_MODEL_SIGNATURE function is deprecated in 11g Release 1 (11.1). It is replaced with the data dictionary view, *_MINING_MODEL_ATTRIBUTES. USER_, ALL_, and DBA_ versions of the view are available. See Oracle Database Reference.

Oracle recommends that you do not use deprecated procedures in new applications. Support for deprecated features is for backward compatibility only.

Note:

See "Deprecated Subprograms"

This table function returns the model signature, which lists the column attributes used to build the model and which should be present in the scoring data.

The case identifier is not considered a mining attribute. For classification and regression models, the target attribute is also not considered part of the model signature.

See Also:

Instead of using the GET_MODEL_SIGNATURE function, you can query the data dictionary view, ALL_MINING_MODEL_ATTRIBUTES. See the Oracle Database Reference.

Syntax

DBMS_DATA_MINING.GET_MODEL_SIGNATURE(
  model_name           IN VARCHAR2)
RETURN DM_MODEL_SIGNATURE PIPELINED;

Parameters

Table 38-69 GET_MODEL_SIGNATURE Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-70 GET_MODEL_SIGNATURE Function Return Values

Return Value	Description
`DM_MODEL_SIGNATURE`	Represents a set of rows of type `DM_MODEL_SIGNATURE_ATTRIBUTE`. The rows have the following columns: (attribute_name VARCHAR2(30), attribute_type VARCHAR2(106))

Usage Notes

This table function pipes out rows of type DM_MODEL_SIGNATURE. For information on Data Mining data types and piped output from table functions, see "Data Types".

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details using Oracle SQL as follows:

SELECT attribute_name, attribute_type
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SIGNATURE('census_model');

GET_MODEL_TRANSFORMATIONS Function

This function returns the transformation definitions associated with a model.

The data is transformed and the transformation definitions are embedded in the model during the model training (create) process. The transformations can be automatically generated, user-specified, or both.

Syntax

DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS (
      model_name     IN VARCHAR2)
RETURN DM_TRANSFORMS PIPELINED;

Parameters

Table 38-71 GET_MODEL_TRANSFORMATIONS Function Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Return Values

Table 38-72 GET_MODEL_TRANSFORMATIONS Function Return Values

Return Value	Description
`DM_TRANSFORMS`	Represents a set of rows of type `DM_TRANSFORM`. The rows have the following columns: (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), expression CLOB, reverse_expression CLOB)

In addition to the transformation definitions, GET_MODEL_TRANSFORMATIONS returns the reverse transformation definitions. The reverse transformations are applied to the model attributes returned by the GET_MODEL_DETAILS functions. They are also applied to the target when a predictive model is applied. Reverse transformations enable model transparency, meaning that attribute information generated by the model is in its original, untransformed state, to the extent that it is possible.

GET_TRANSFORM_LIST Procedure

This function is used to convert a transformation specification from type DM_TRANSFORMS to type TRANSFORM_LIST.

DM_TRANSFORMS is used to output model transforms that are returned from the GET_MODEL_TRANSFORMATIONS function.

CREATE_MODEL and functions in the DBMS_DATA_MINING_TRANSFORM package accept TRANSFORM_LIST. To use them on the output of GET_MODEL_TANSFORMATIONS, you must first call GET_TRANSFORM_LIST to convert to TRANSFORM_LIST type.

Syntax

DBMS_DATA_MINING.GET_TRANSFORM_LIST (
      xform_list           OUT NOCOPY TRANSFORM_LIST,
      model_xforms         IN  DM_TRANSFORMS);

Parameters

Table 38-73 GET_TRANSFORM_LIST Procedure Parameters

Parameter	Description
`xform_list`	The transformation specification converted from `model_xform`s to type `TRANSFORM_LIST`.
`model_xforms`	A transformation specification returned by `GET_MODEL_TRANSFORMATIONS`

IMPORT_MODEL Procedure

This procedure imports the specified data mining models from a dump file set that was created with EXPORT_MODEL or with the expdp export utility. Both IMPORT_MODEL and EXPORT_MODEL use Oracle Data Pump technology.

See Also:

Oracle Data Mining Administrator's Guide for more information on model export and import.

Syntax

DBMS_DATA_MINING.IMPORT_MODEL (
    filename             IN  VARCHAR2,
    directory            IN  VARCHAR2,
    model_filter         IN  VARCHAR2 DEFAULT NULL,
    operation            IN  VARCHAR2 DEFAULT NULL,
    remote_link          IN  VARCHAR2 DEFAULT NULL,
    jobname              IN  VARCHAR2 DEFAULT NULL,
    schema_remap         IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 38-74 IMPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set from which the models should be imported. The dump file set must have been created by the `EXPORT_MODEL` procedure or the `expdp` export utility of Oracle Data Pump. The dump file set can contain one or more files. (Refer to "EXPORT_MODEL Procedure" for details.) If the dump file set contains multiple files, you can specify `'filename%U'` instead of listing them. For example, if your dump file set contains 3 files, `archive01.dmp`, `archive02.dmp`, and `archive03.dmp`, you can import them by specifying `'archive%U'`.
`directory`	Name of a pre-defined directory object that specifies where the dump file set is located. You must have read/write privileges on the directory object and on the file system directory that it identifies.
`model_filter`	Optional parameter that specifies one or more models to import. If you do not specify a value for `model_filter`, all models in the dump file set are imported. You can also specify `NULL` (the default) or 'ALL' to import all models. The value of `model_filter` can be one or more model names. The following are valid filters. 'mymodel1' 'name IN (''mymodel2'',''mymodel3'')' The first causes `IMPORT_MODEL` to import a single model named `mymodel1`. The second causes `IMPORT_MODEL` to import two models, `mymodel2` and `mymodel3`.
`operation`	Optional parameter that specifies whether to import the models or the SQL statements that create the models. By default, the models are imported. You can specify either of the following values for `operation`: `'IMPORT'` — Import the models (Default) `'SQL_FILE'`— Write the SQL DDL for creating the models to a text file. The text file is named `job_name.sql` and is located in the dump set directory.
`remote_link`	Optional parameter not used in this release. Set to `NULL`
`jobname`	Optional parameter that specifies the name of the import job. By default, the name has the form `username_imp_nnnn`, where `nnnn` is a number. For example, a job name in the `SCOTT` schema might be `SCOTT_imp_134`. If you specify a job name, it must be unique within the schema. The maximum length of the job name is 30 characters. A log file for the import job, named `jobname.log,` is created in the same directory as the dump file set.
`schema_remap`	Optional parameter for importing into a different schema. By default, models are exported and imported within the same schema. If the dump file set belongs to a different schema, you must specify a schema mapping in the form `export_user`:`import_user`. For example, you would specify `'SCOTT:MARY'` to import a model exported by `SCOTT` into the `MARY` schema. NOTE: In some cases, you may need to have the `IMPORT_FULL_DATABASE` privilege or the `SYS` role to import a model from a different schema.

Examples

This example shows a model being exported and imported within the schema dmuser2. Then the same model is imported into the dmuser3 schema. The dmuser3 user has the IMPORT_FULL_DATABASE privilege.

SQL>CONNECT dmuser2/dmuser2_psw
SQL> select model_name from user_mining_models;
 
MODEL_NAME
------------------------------
NMF_SH_SAMPLE
SVMO_SH_CLAS_SAMPLE
SVMR_SH_REGR_SAMPLE

-- export the model called NMF_SH_SAMPLE to a dump file in same schema
SQL>EXECUTE DBMS_DATA_MINING.EXPORT_MODEL ('NMF_SH_SAMPLE_out', 'DATA_PUMP_DIR',
                            'name = ''NMF_SH_SAMPLE''');
-- import the model back into the same schema
SQL>EXECUTE DBMS_DATA_MINING.IMPORT_MODEL ('NMF_SH_SAMPLE_out01.dmp', 
                            'DATA_PUMP_DIR', 'name = ''NMF_SH_SAMPLE''');

-- connect as different user
-- import same model into that schema
SQL>CONNECT dmuser3/dmuser3_psw
SQL>EXECUTE DBMS_DATA_MINING.IMPORT_MODEL ('NMF_SH_SAMPLE_out01.dmp', 
                            'DATA_PUMP_DIR', 'name = ''NMF_SH_SAMPLE''',
                            'IMPORT', NULL, 'nmf_imp_job', 'dmuser2:dmuser3');

The following example shows user MARY importing all models from a dump file, model_exp_001.dmp, which was created by user SCOTT. The dump file is located in the file system directory mapped to a directory object called DM_DUMP. If user MARY does not have IMPORT_FULL_DATABASE privileges, IMPORT_MODEL will raise an error.

-- import all models
DECLARE
  file_name       VARCHAR2(40);
BEGIN
  file_name := 'model_exp_001.dmp';
  DBMS_DATA_MINING.IMPORT_MODEL(
                filename=>file_name,
               directory=>'DM_DUMP',                 schema_remap=>'SCOTT:MARY');
  DBMS_OUTPUT.PUT_LINE(
'DBMS_DATA_MINING.IMPORT_MODEL of all models from SCOTT done!');
END;
/

RANK_APPLY Procedure

This procedure ranks the results of an APPLY operation based on a top-N specification for predictive and descriptive model results. For classification models, you can provide a cost matrix as input, and obtain the ranked results with costs applied to the predictions.

Syntax

DBMS_DATA_MINING.RANK_APPLY (
      apply_result_table_name        IN VARCHAR2,
      case_id_column_name            IN VARCHAR2,
      score_column_name              IN VARCHAR2,
      score_criterion_column_name    IN VARCHAR2,
      ranked_apply_table_name        IN VARCHAR2,
      top_N                          IN INTEGER DEFAULT 1,
      cost_matrix_table_name         IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name       IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name        IN VARCHAR2 DEFAULT NULL);

Parameters

Table 38-75 RANK_APPLY Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`case_id_column_name`	Name of the case identifier column. This must be the same as the one used for generating `APPLY` results.
`score_column_name`	Name of the prediction column in the apply results table
`score_criterion_column_name`	Name of the probability column in the apply results table
`ranked_apply_result_tab_name`	Name of the table containing the ranked apply results
`top_N`	Top N predictions to be considered from the `APPLY` results for precision recall computation
`cost_matrix_table_name`	Name of the cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can use RANK_APPLY to generate ranked apply results, based on a top-N filter and also with application of cost for predictions, if the model was built with costs.

The behavior of RANK_APPLY is similar to that of APPLY with respect to other DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. The procedure does not depend on the model; the only input of relevance is the apply results generated in a fixed schema table from APPLY.

The main intended use of RANK_APPLY is for the generation of the final APPLY results against the scoring data in a production setting. You can apply the model against test data using APPLY, compute various test metrics against various cost matrix tables, and use the candidate cost matrix for RANK_APPLY.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will be the same case identifier column as that of the apply results.

Classification Models — NB, ABN, SVM

For numerical targets, the ranked results table will have the definition as shown:

(case_id       VARCHAR2/NUMBER,
prediction     NUMBER,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

For categorical targets, the ranked results table will have the following definition:

(case_id       VARCHAR2/NUMBER,
prediction     VARCHAR2,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

Clustering using k-Means or O-Cluster

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the cluster identifier corresponding to a case, and the associated probability. Cost matrix is not considered here. The ranked results table will have the definition as shown, and contains the cluster ids ranked by top-N.

(case_id       VARCHAR2/NUMBER,
cluster_id     NUMBER,
probability    NUMBER,
rank           INTEGER)

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the feature identifier corresponding to a case, and the associated match quality. Cost matrix is not considered here. The ranked results table will have the definition as shown, and contains the feature ids ranked by top-N.

(case_id        VARCHAR2/NUMBER,
feature_id      NUMBER,
match_quality   NUMBER,
rank            INTEGER)

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if training data was pre-processed in any manner,
 * perform the same pre-processing steps on apply
 * data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
DBMS_DATA_MINING.RANK_APPLY(
  apply_result_table_name       => 'census_apply_result',
  case_id_column_name           => 'person_id',
  score_column_name             => 'prediction',
  score_criterion_column_name   => 'probability
  ranked_apply_result_tab_name  => 'census_ranked_apply_result',
  top_N                         => 3,
  cost_matrix_table_name        => 'census_cost_matrix');
END;
/

-- View Ranked Apply Results
SELECT *
  FROM census_ranked_apply_result;

REMOVE_COST_MATRIX Procedure

Removes the default scoring matrix from a classification model.

See Also:

"ADD_COST_MATRIX Procedure"
"REMOVE_COST_MATRIX Procedure"

Syntax

DBMS_DATA_MINING.REMOVE_COST_MATRIX (
      model_name   IN  VARCHAR2);

Parameters

Table 38-76 Remove_Cost_Matrix Procedure Parameters

Parameter	Description
`model_name`	Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used.

Usage Notes

If the model is not in your schema, then REMOVE_COST_MATRIX requires the ALTER ANY MINING MODEL system privilege or the ALTER object privilege for the mining model.

Example

The Naive Bayes model NB_SH_CLAS_SAMPLE has an associated cost matrix that can be used for scoring the model.

SQL>SELECT *
      FROM TABLE(dbms_data_mining.get_model_cost_matrix('nb_sh_clas_sample'))
      ORDER BY predicted, actual;
 
ACTUAL     PREDICTED        COST
---------- ---------- ----------
0          0                   0
1          0                   4
0          1                   1
1          1                   0

You can remove the cost matrix with REMOVE_COST_MATRIX.

SQL>EXECUTE dbms_data_mining.remove_cost_matrix('nb_sh_clas_sample');

SQL>SELECT *
      FROM TABLE(dbms_data_mining.get_model_cost_matrix('nb_sh_clas_sample'))
      ORDER BY predicted, actual;

no rows selected

RENAME_MODEL Procedure

This procedure renames a mining model to a new name that you specify.

The model name is in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used. For mining model naming restrictions, see

Syntax

DBMS_DATA_MINING.RENAME_MODEL (
     model_name            IN VARCHAR2,
     new_model_name        IN VARCHAR2);

Parameters

Table 38-77 RENAME_MODEL Procedure Parameters

Parameter	Description
`model_name`	Old name of the model
`new_model_name`	New name of the model.

Usage Notes

If an APPLY operation is using a model, and you attempt to rename the model during that time, the RENAME will succeed and APPLY will return indeterminate results.

Examples

Assume the existence of a model census_model. The following example shows how to rename this model.

BEGIN
  DBMS_DATA_MINING.RENAME_MODEL(
    model_name      => 'census_model',
    new_model_name  => 'census_new_model');
END;
/