Oracle® Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) Part Number B28419-01 |
|
|
View PDF |
The DBMS_DATA_MINING_TRANSFORM
package contains transformation routines that can be used to prepare data for mining.
See Also:
Oracle Data Mining Concepts for information about automatic and embedded data transformation in Oracle Data Mining.
This chapter contains the following topics:
Using DBMS_DATA_MINING_TRANSFORM
Data Transformation with Oracle Data Mining
Types of Transformations
The DBMS_DATA_MINING_TRANSFORM Framework
Data Types
This section contains topics that relate to using the DBMS_DATA_MINING_TRANSFORM
package.
A transformation is a SQL expression that modifies one or more of the columns used in data mining.
Most data mining algorithms require some form of data transformation, and many columns of data can be affected. Data sets used for creating, testing, and scoring the model must all undergo the same transformations. To replicate these changes in each data set can be very time consuming.
Oracle Data Mining embeds transformation expressions in the model. The transformations used to create the model are automatically performed on any data sets to which the model is applied. This ability to deploy a model together with its transformations significantly enhances the model's usability while simplifying the process of model management.
Oracle Data Mining further simplifies the data transformation process by offering an Automatic Data Preparation mode (ADP). When you create a model using ADP, the training data is automatically transformed according to the requirements of the algorithm; the automatic transformations are embedded in the model. To enable ADP, set the PREP_AUTO
setting to PREP_AUTO_ON
in the settings table for the model.
You can supplement automatic transformations by passing a list of transformation expressions to DBMS_DATA_MINING.CREATE_MODEL
. Your transformations are embedded in the model along with the automatic transformations. If ADP is turned off, only the transformations that you pass to CREATE_MODEL
are embedded.
See Also:
"CREATE_MODEL Procedure" in Chapter 38, "DBMS_DATA_MINING" for information about the transformation list parameterThe routines in DBMS_DATA_MINING_TRANSFORM
are convenience routines to assist you in creating your own transformations. If these routines do not entirely suit your needs, you can write SQL to modify them, or you can write your own routines.
Note on Notation:
This chapter uses standard interval notation for number sets:
[a,b] is the set of all real numbers greater than or equal to a and less than or equal to b
[a,b) is the set of all real numbers greater than or equal to a and less than b.
(b is in the set [a,b]; b is not in the set [a,b).)
Subscripts do not conform to standard notation; instead "X_N" is used for "XN".
The DBMS_DATA_MINING_TRANSFORM
package supports transformations that are typically used in data mining: binning, normalization, outlier treatment, and missing value treatment.
Binning involves mapping both continuous and discrete values to discrete values of reduced cardinality. For example, the age of persons can be binned into discrete numeric bins: 1-20 to 1, 21-40 to 2, and so on. Popular car manufacturers such as Ford, Chrysler, BMW, Volkswagen can be binned into discrete categorical bins: {Ford, Chrysler} to US_Car_Makers, and {BMW, Volkswagen} to European_Car_Makers.
DBMS_DATA_MINING_TRANSFORM
supports binning for both categorical and numerical attributes. Categorical attributes have VARCHAR2/CHAR
data types; numerical attributes have NUMBER
data types.
The bin contents are groupings calculated by a Decision Tree model.
The bin contents are based on the frequency of each value in the column. The number of bins is user-specified. The first bin contains the most frequent values, the second bin contains the second-most frequent values, and so on. The final bin contains all the remaining values.
Each bin contains the same number of values. The number of bins is user-specified or estimated from the data.The first bin contains the highest values, the second bin contains the next highest values, and so on.
Each bin contains the numbers in a quantile that is computed using the SQL NTILE function. Quantiles are partitions of a data distribution. For example, percentiles are quantiles that divide a data distribution into 100 partitions.
Normalization is the process of scaling continuous values down to a specific range, typically between zero and one. Normalization applies only to numerical attributes.
Normalization transforms each number by subtracting (shifting) it by a given quantity and dividing the result by the scale. For example:
x_new = (x_old-shift)/scale
The normalization definition for each attribute is computed based on the minimum and maximum values of the data. The values for shift and scale are:
shift = min scale = max-min
The normalization definition for each attribute is computed based on the minimum and maximum values of the data. The values for shift and scale are:
shift = 0 scale = max{abs(max), abs(min)}
The normalization definition for each attribute is computed based on the values for mean and standard deviation that are computed from the data. The values for shift and scale are:
shift = mean scale = standard_deviation
Some computations on attribute values can be significantly affected by extreme values. One approach to achieving a more robust computation is to either winsorize or trim (clip) the data as a preprocessing step.
Winsorizing involves setting the tail values of a particular attribute to some specified value. For example, for a 90% winsorization, the bottom 5% are set equal to the minimum value in the 6th percentile, while the upper 5% are set equal to the value corresponding to the maximum value in the 95th percentile.
Trimming sets the tail values to NULL.
Oracle Data Mining distinguishes between sparse data and data that contains random missing values. The latter means that some attribute values are unknown. Sparse data, on the other hand, contains values that are assumed to be known, although they are not represented in the data.
NULLs in columns with simple data types (not nested) are assumed to represent random missing values. Values that are not present in nested columns indicate sparsity. Oracle Data Mining replaces numerical data that is missing at random with the mean and categorical data that is missing at random with the mode. Missing numerical values in sparse data are replaced with zeros. Missing categorical values in sparse data are replaced with a zero vector or a special value DM$SPARSE
, depending on the algorithm.
DBMS_DATA_MINING_TRANSFORM
provides routines for handling missing values. These routines are available for anyone who wishes to manage missing value treatment explicitly, instead of relying on the default treatment.
See Also:
Oracle Data Mining Application Developer's Guide for more information about missing value treatment by Oracle Data MiningDBMS_DATA_MINING_TRANSFORM
provides a framework for defining transformations. The framework is implemented with CREATE
, INSERT
, STACK
, and XFORM
operations. You can modify the output or implement your own version of any of these operations.
The DBMS_DATA_MINING_TRANSFORM
framework consists of:
CREATE routines that create transformation definition tables.
The tables have the columns needed for specifying any of the following transformations:
Binning (categorical and numerical)
Column removal
Normalization
Outlier treatment
Missing value treatment
INSERT routines that populate the definition tables with information necessary for transforming selected columns of input data.
You can invoke CREATE
and INSERT
routines several times to transform all relevant columns from various data sources until the definition tables fully represents all the data attributes for a given mining problem.
STACK routines that generate SQL expressions based on the definition tables.
STACK
routines add the expressions to a transformation list. The transformation list can be passed to the CREATE_MODEL
procedure or to other STACK
routines. (See "CREATE_MODEL Procedure".)
XFORM routines that generate views based on the definition tables.
The views are suitable for mining (creating, testing, or scoring a model).
DBMS_DATA_MINING_TRANSFORM
defines the data types described in this section.
This type is used to store quoted and non-quoted identifiers for column names.
TYPE COLUMN_LIST IS VARRAY(1000) OF VARCHAR2(32);
This type holds the results of describe operations.
SUBTYPE DESCRIBE_LIST IS dbms_sql.DESC_TAB2;
This type is used for storing transformation expressions.
TYPE EXPRESSION_REC IS RECORD ( ...........);,
Unlike VARCHAR2
, EXPRESSION_REC
can be used for expressions that can grow larger than 32K.
Use SET_EXPRESSION
to write to an EXPRESSION_REC
. Use GET_EXPRESSION
to retrieve the contents of an EXPRESSION_REC
.
This type is used for storing the in-memory transformation stack definition for a single attribute.
TYPE TRANSFORM_REC IS RECORD ( attribute_name VARCHAR2(30), attribute_subname VARCHAR2(4000), expression EXPRESSION_REC, reverse_expression EXPRESSION_REC, attribute_spec VARCHAR2(4000));
The attribute_spec
field is used to specify additional information and actions for an attribute. You can specify the value NOPREP
for attribute_spec
.
NOPREP
disables ADP for an individual attribute. NOPREP
can be used in the transformation list passed to DBMS_DATA_MINING.CREATE_MODEL
. (See "CREATE_MODEL Procedure" .)
This type is used for storing the in-memory stack definition.
TYPE TRANSFORM_LIST IS TABLE OF TRANSFORM_REC;
These constants are used for specifying data types of nested columns.
NEST_NUM_COL_TYPE CONSTANT NUMBER := 100001; NEST_CAT_COL_TAYPE CONSTANT NUMBER := 100002;
Table 39-1 DBMS_DATA_MINING_TRANSFORM Package Subprograms
Subprogram | Purpose |
---|---|
CREATE_BIN_CAT Procedure |
Creates a categorical bin definition table |
CREATE_BIN_NUM Procedure |
Creates a numerical bin definition table |
CREATE_CLIP Procedure |
Creates a clipping definition table |
CREATE_COL_REM Procedure |
Creates a column removal definition table |
CREATE_MISS_CAT Procedure |
Creates a categorical missing value treatment definition table |
CREATE_MISS_NUM Procedure |
Creates a numerical missing value treatment definition table |
CREATE_NORM_LIN Procedure |
Creates a normalization definition table |
DESCRIBE_STACK Procedure |
Describes the stack definition |
INSERT_AUTOBIN_NUM_EQWIDTH Procedure | Populates the numerical bin definition table, using the number of bins estimated from the data |
GET_EXPRESSION Function |
Returns a transformation expression |
INSERT_BIN_CAT_FREQ Procedure | Populates the categorical bin definition table, applying frequency-based binning to the categorical input data |
INSERT_BIN_NUM_EQWIDTH Procedure |
Populates the numerical bin definition table, applying equi-width binning to the numerical input data |
INSERT_BIN_NUM_QTILE Procedure |
Populates the numerical bin definition table, applying quantile binning to the numerical input data |
INSERT_BIN_SUPER Procedure |
Populates a binning definition table, applying supervised binning to both numerical and categorical data |
INSERT_CLIP_TRIM_TAIL Procedure |
Populates the clipping definition table, applying trimming based on tail fraction to the numerical input data |
INSERT_CLIP_WINSOR_TAIL Procedure |
Populates the clipping definition table, applying Winsorizing based on tail fraction to the numerical input data |
INSERT_MISS_CAT_MODE Procedure |
Populates the categorical missing value treatment definition table, applying the mode to each missing value |
INSERT_MISS_NUM_MEAN Procedure |
Populates the numerical missing value treatment definition table, applying the mean to each missing value |
INSERT_NORM_LIN_MINMAX Procedure |
Populates the normalization definition table, applying min-max normalization to the numerical input data |
INSERT_NORM_LIN_SCALE Procedure | Populates the normalization definition table, applying scale normalization to the numerical input data |
INSERT_NORM_LIN_ZSCORE Procedure |
Populates the normalization definition table applying z-score normalization to the numerical input data |
SET_TRANSFORM Procedure |
Specifies a transformation expression |
STACK_BIN_CAT Procedure |
Add categorical binning expressions to the transformation list |
STACK_BIN_NUM Procedure |
Add numerical binning expressions to the transformation list |
STACK_CLIP Procedure |
Add clipping expressions to the transformation list |
STACK_COL_REM Procedure |
Add column removal expressions to the transformation list |
STACK_MISS_CAT Procedure |
Add categorical missing value treatment expressions to the transformation list |
STACK_MISS_NUM Procedure |
Add numerical missing value treatment expressions to the transformation list |
STACK_NORM_LIN Procedure |
Add linear normalization expressions to the transformation list |
XFORM_BIN_CAT Procedure |
Creates a view that applies a categorical binning transformation to the data |
XFORM_BIN_NUM Procedure |
Creates a view that applies a numerical binning transformation to the data |
XFORM_CLIP Procedure |
Creates a view that applies a clipping transformation to the data |
XFORM_COL_REM Procedure |
Creates a view that removes the specified columns from the data |
XFORM_EXPR_NUM Procedure |
Creates a view that applies a transformation expression to a numerical column in the data |
XFORM_EXPR_STR Procedure |
Creates a view that applies a transformation expression to a character column in the data |
XFORM_MISS_CAT Procedure |
Creates a view that performs missing value treatment on the categorical attributes in the data |
XFORM_MISS_NUM Procedure |
Creates a view that performs missing value treatment on the numerical attributes in the data |
XFORM_NORM_LIN Procedure |
Creates a view that performs linear normalization to numerical attributes in the data |
XFORM_STACK Procedure |
Creates a view that transforms the data as specified by the stack definition (transformation list) |
This procedure creates a categorical binning definition table. This table is used as input to the INSERT_BIN_CAT_FREQ
and XFORM_BIN_CAT
procedures.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_CAT ( bin_table_name IN VARCHAR2, bin_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-2 CREATE_BIN_CAT Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the bin definition table |
bin_schema_name |
Name of the schema hosting the bin definition table |
Usage Notes
The generated bin definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
val |
VARCHAR2(4000) |
bin |
VARCHAR2(4000) |
Examples
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_CAT('build_bin_cat_table'); END; / DESCRIBE build_bin_cat_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) VAL VARCHAR2(4000) BIN VARCHAR2(4000)
This procedure creates a numerical binning definition table. This table is used as input to the INSERT_BIN_NUM_EQWIDTH
, INSERT_BIN_NUM_QTILE
, INSERT_AUTOBIN_NUM_EQWIDTH
, and XFORM_BIN_NUM
procedures.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_NUM ( bin_table_name IN VARCHAR2, bin_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-3 CREATE_BIN_NUM Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the bin definition table |
bin_schema_name |
Name of the schema hosting the bin definition table |
Usage Notes
The generated bin definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
val |
NUMBER |
bin |
VARCHAR2(4000) |
Examples
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_NUM('build_bin_num_table'); END; / DESCRIBE build_bin_num_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) VAL NUMBER BIN VARCHAR2(4000)
This procedure creates a clipping definition table. This table is used as input to the INSERT_CLIP_WINSOR_TAIL
, INSERT_CLIP_TRIM_TAIL
, and XFORM_CLIP
procedures.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP ( clip_table_name IN VARCHAR2, clip_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-4 CREATE_CLIP Procedure Parameters
Parameter | Description |
---|---|
clip_table_name |
Name of the clipping definition table |
clip_schema_name |
Name of the schema hosting the clipping definition table |
Usage Notes
The generated clipping definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
lcut |
NUMBER |
lval |
NUMBER |
rcut |
NUMBER |
rval |
NUMBER |
Examples
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP('build_clip_table'); END; / DESCRIBE build_clip_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) LCUT NUMBER LVAL NUMBER RCUT NUMBER RVAL NUMBER
This procedure creates a column removal definition table. This table is used by the STACK_COL_REM
, XFORM_COL_REM
, INSERT_BIN_SUPER
, and INSERT_AUTOBIN_NUM_EQWIDTH
procedures.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_COL_REM ( rem_table_name VARCHAR2, rem_schema_name VARCHAR2 DEFAULT NULL);
Parameters
Table 39-5 CREATE_COL_REM Procedure Parameters
Parameter | Description |
---|---|
rem_table_name |
Name of the column removal definition table. |
rem_schema_name |
Schema of the column removal definition table. |
Usage Notes
The generated column removal definition table has the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
The column removal definition table lists the column names and the associated attribute names to remove from the data source. The col
column is case sensitive, since it generates quoted identifiers. There can be multiple entries in the column removal definition table for the same col
when it is a nested column.
Examples
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_COL_REM ('col_rem_tbl'); END; / DESCRIBE col_rem_tbl Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000)
This procedure creates a categorical missing value treatment definition table. This table is used as input to the INSERT_MISS_CAT_MODE
procedure.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT ( miss_table_name IN VARCHAR2, miss_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-6 CREATE_MISS_CAT Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the categorical missing value treatment definition table |
miss_schema_name |
Name of the schema hosting the categorical missing value treatment definition table |
Usage Notes
The generated categorical missing value treatment definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
val |
VARCHAR2(4000) |
Examples
BEGIN
DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT('build_miss_cat_table'); END; / DESCRIBE build_miss_cat_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) VAL VARCHAR2(4000)
This procedure creates a numerical missing value treatment definition table. This table is used as input to the INSERT_MISS_NUM_MEAN
procedure.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM ( miss_table_name IN VARCHAR2, miss_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-7 CREATE_MISS_NUM Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the numeric missing value treatment definition table |
miss_schema_name |
Name of the schema hosting the numeric missing value treatment definition table |
Usage Notes
The generated numeric missing value definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR(4000) |
val |
NUMBER |
Example
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM('build_miss_num_table'); END; / DESCRIBE build_miss_num_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) VAL NUMBER
This procedure creates a linear normalization definition table. This table is used as input to the INSERT_NORM_LIN_MINMAX
, INSERT_NORM_LIN_SCALE
, INSERT_NORM_LIN_ZSCORE
, and XFORM_NORM_LIN
procedures.
Syntax
DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN ( norm_table_name IN VARCHAR2, norm_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-8 CREATE_NORMALIZE_LIN Procedure Parameters
Parameter | Description |
---|---|
norm_table_name |
Name of the normalization definition table |
norm_schema_name |
Name of the schema hosting the normalization definition table |
Usage Notes
The generated linear normalization definition table will have the following columns.
Column Name | Data Type |
---|---|
col |
VARCHAR2(30) |
att |
VARCHAR2(4000) |
shift |
NUMBER |
scale |
NUMBER |
Examples
BEGIN DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN('build_norm_table'); END; / DESCRIBE build_norm_table Name Null? Type ----------------------------------------- -------- ---------------------------- COL VARCHAR2(30) ATT VARCHAR2(4000) SHIFT NUMBER SCALE NUMBER
This procedure describes the transformation of the data table specified by the stack definition.
Only the columns that are specified in the stack definition are transformed. The remaining columns do not change.
All columns in the stack definition are applied. Columns with NULL in the expression field are excluded. Columns in the stack definition that do not have a matching column in the data are added to the describe list.
Syntax
DBMS_DATA_MINING_TRANSFORM.DESCRIBE_STACK ( xform_list IN TRANSFORM_LIST, data_table_name IN VARCHAR2, describe_list OUT DESCRIBE_LIST data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-9 DESCRIBE_STACK Procedure Parameters
Parameter | Description |
---|---|
xform_list |
List of stacked transformations |
data_table_name |
Table that contains the mining data |
describe_list |
Descriptions of stacked transformations |
data_schema_name |
Schema of table that contains the mining data. If you do not specify a schema, your own schema is used. |
Usage Notes
For DM_NESTED_NUMERICALS
, the value of col_type
is nest_num_col_type
. For DM_NESTED_CATEGORICALS
, the value of col_type
is nest_cat_col_type
.
This procedure returns a VARCHAR2
chunk from the expression.
Chunk numbering starts with one. For chunks outside of the range, the return value is NULL. When a chunk number is NULL the whole expression is returned as a string. If the expression is too big, a VALUE_ERROR
is raised.
Syntax
DBMS_DATA_MINING_TRANSFORM.GET_EXPRESSION ( expression IN EXPRESSION_REC, chunk_num IN PLS_INTEGER DEFAULT NULL); RETURN VARCHAR2;
Parameters
Table 39-10 GET_EXPRESSION Function Parameters
Parameter | Description |
---|---|
expression |
Transformation expression |
chunk_num |
Number of the chunk |
This procedure finds the numerical binning definition for every numerical column in the data table that is not specified in the exclusion list and inserts the definition into the numerical binning definition table that was created using CREATE_BIN_NUM
. Based on the statistical information it collects on the input data, this procedure calculates the number of bins.
Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table.
N, the number of bins, is computed for each column separately and is based on the number of non-NULL
values (cnt
), the maximum (max
), the minimum (min
), the standard deviation (dev
) and the constant C=3.49/0.9 as follows:
N=floor(power(cnt,1/3)*(max-min)/(c*dev))
Each of the bin_num
(= N) bins bin_1,..., bin_N span ranges of equal width inc = (max – min) / N where bin_I = I when N > 0 or bin_I = N+1–I when N < 0, and bin_0 = bin_(N+1) = NULL
. The values of the val
column are rounded to round_num
significant digits prior to scoring them in the definition table.
The parameter bin_num
is used to adjust N to be at least bin_num
. No adjustment is done when bin_num
is NULL
or 0. The parameter max_bin_num
is used to adjust N to be at most max_bin_num
. No adjustment is done when bin_num
is NULL
or 0. For columns with all integer values (discrete columns), N is adjusted to be at most the maximum number of distinct values in the observed range max-min+1
.
The parameter sample_size
is used to adjust cnt
to be at most sample_size
. No adjustment is done when sample_size
is NULL
or 0.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_AUTOBIN_NUM_EQWIDTH ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, bin_num IN PLS_INTEGER DEFAULT 3, max_bin_num IN PLS_INTEGER DEFAULT 100, exclude_list IN Column_List DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, sample_size IN PLS_INTEGER DEFAULT 50000, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, rem_table_name IN VARCHAR2 DEFAULT NULL, rem_schema_name IN VARCHAR2 DEFAULT NULL));
Parameters
Table 39-11 INSERT_AUTOBIN_EQWIDTH Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the categorical bin table generated using CREATE_BIN_NUM procedure |
data_table_name |
Name of the table containing the data |
bin_num |
Minimum number of bins; default number is 3 |
max_bin_num |
Maximum number of bins that sets the upper limit for estimates of bin numbers; default is 100 |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the binning process; categorical attributes are automatically excluded |
round_num |
Number of significant digits; default is 6 |
sample_size |
Size of the data sample; default is 50,000 |
bin_schema_name |
Name of the schema hosting the bin definition table; default is user schema |
data_schema_name |
Name of the schema hosting the table with data; default is user schema |
rem_table_name |
Name of the column removal definition table created with CREATE_COL_REM . |
rem_schema_name |
Schema of the column removal definition table. |
Usage Notes
For a given input table, you can call this routine several times with different specifications for number of bins for a given input table. For each call, you can selectively exclude attributes (that is, column names) using the exclude_list
parameter for a particular binning specification.
Columns with all NULL
values or only one unique value are ignored. The sign of bin_num
, max_bin_num
, and sample_size
have no effect on the result; absolute values are used. The value adjustment of N is done in the following order: First bin_num
, next, max_bin_num
, and, finally, discrete column adjustment.
Examples
The simplest invocation of this routine populates bin definitions in the num_bin_table
for all the numerical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_AUTOBIN_NUM_EQUIWIDTH( 'num_bin_table', 'build_data_table');
END;
/
This procedure finds the categorical binning definition for every VARCHAR2
and CHAR
column in the data table that is not specified in the exclusion list and inserts the definition into the categorical binning definition table created using CREATE_BIN_CAT
.
Definition for each relevant column is computed based on the occurrence frequency of column values that are computed from the data table. Each of the bin_num(N) bins bin_1, ..., bin_N corresponds to the values with top frequencies when N > 0 or bottom frequencies when N < 0, and bin_(N+1) to all remaining values, where bin_I = I. Ordering ties among identical frequencies are broken by ordering on column values (ASC for N > 0 or DESC for N < 0). When the number of distinct values C < N only C+1 bins will be created.
The parameter default_num
(D) is used for pruning based on the number of values that fall into the default bin. When D > 0 only columns that have at least D defaults are kept while others are ignored. When D < 0 only columns that have at most D values are kept. No pruning is done when D is NULL
or D = 0. Parameter bin_support (SUP) is used for restricting bins to frequent (SUP > 0) values frq >= SUP*tot, or infrequent (SUP < 0) ones frq <= –SUP*tot, where frq is a given value count and tot is a sum of all counts as computed from the data. No support filtering is done when SUP is NULL
or when SUP = 0.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_CAT_FREQ ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, bin_num IN PLS_INTEGER DEFAULT 9, exclude_list IN Column_List DEFAULT NULL, default_num IN PLS_INTEGER DEFAULT 2, bin_support NUMBER DEFAULT NULL, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-12 INSERT_BIN_CAT_FREQ Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the categorical bin table generated using CREATE_BIN_CAT procedure |
data_table_name |
Name of the table containing the data |
bin_num |
Number of bins |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the binning process |
default_num |
Number of default values |
bin_support |
Bin support as a fraction |
bin_schema_name |
Name of the schema hosting the bin definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with different specifications for number of bins for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list
parameter for a particular binning specification.
Columns with all NULL
s are ignored. No bin definitions are populated when bin_num = 0, or bin_num, is NULL
.
Examples
The simplest invocation of this routine populates bin definitions in the cat_bin_table
for all the categorical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM( 'cat_bin_table', 'build_table');
END; /
This procedure finds the numerical binning definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition into the numerical binning definition table that was created using CREATE_BIN_NUM
.
Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table. Each of the bin_num (= N) bins bin_1,..., bin_N span ranges of equal width inc = (max – min) / N where bin_I = I when N > 0 or bin_I = N+1–I when N < 0, and bin_0 = bin_(N+1) = NULL
.
The values of the val
column in the bin definition table are rounded to round_num
significant digits. For more information, see the Usage Notes.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_EQWIDTH ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, bin_num IN PLS_INTEGER DEFAULT 10, exclude_list IN Column_List DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-13 INSERT_BIN_EQWIDTH Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the numerical bin table generated using CREATE_BIN_NUM procedure |
data_table_name |
Name of the table containing the data |
bin_num |
Number of bins |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the binning process |
round_num |
Number of significant digits. See Usage Notes. |
bin_schema_name |
Name of the schema hosting the bin definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine with different specifications for number of bins for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list
parameter for a particular binning specification.
Columns with all NULL
values or only one unique value are ignored. No bin definitions are populated when bin_num
= 0, or bin_num
is NULL
.
For example, when N=2, col='mycol', min=10, and max = 21, the following three rows are inserted into the definition table (inc = 5.5):
COL VAL BIN ----- ----- ----- mycol 10 NULL mycol 15.5 1 mycol 21 2
The round_num
parameter specifies how to round the number in the VAL
column of the definition table. When round_num
is positive, it specifies the most significant digits to retain. When round_num
is negative, it specifies the least significant digits to remove. In both cases, the result is rounded to the specified number of digits. When round_num
is 0, the value is unchanged.
For example, a value of 308.162 would be rounded as follows.
For a value of 308.162: when round_num = 1 result is 300 when round_num = 2 result is 310 when round_num = 3 result is 308 when round_num = 0 result is 308.162 when round_num = -1 result is 308.16 when round_num = -2 result is 308.2 when round_num = NULL result is NULL
Examples
The simplest invocation of this routine populates bin definitions in the num_bin_table
for all the numerical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM( 'num_bin_table', 'build_table');
END;
/
This procedure finds a numerical binning definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition into the binning definition table that was created using CREATE_BIN_NUM
.
The definition for each relevant column is computed based on the minimum values for each quantile, where quantiles are computed from the data using NTILE
function. Bins bin_1,..., bin_N span the following ranges: bin_1 spans [min_1,min_2]; bin_2,..., bin_i,..., bin_N-1 span (min_i, min_(i+1)] and bin_N spans (min_N, max_N]. Bins with equal left and right boundaries are collapsed.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_QTILE ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, bin_num IN PLS_INTEGER DEFAULT 10, exclude_list IN Column_List DEFAULT NULL, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-14 INSERT_BIN_NUM_QTILE Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the numerical binning definition table generated using the CREATE_BIN_NUM procedure |
data_table_name |
Name of the table containing the data |
bin_num |
Number of bins |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the binning process |
bin_schema_name |
Name of the schema hosting the numerical binning definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with different specifications for bin_num
for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list
parameter for a particular specification. Columns with all NULL
values are ignored.
Example 1. When N = 4, col='mycol', and data is {1,2,2,2,2,3,4}, the following three rows are inserted into the definition table:
COL VAL BIN ----- ----- ----- mycol 1 NULL mycol 2 1 mycol 4 2
Here quantities are {1,2}, {2,2}, {2,3}, {4} and min(1) = 1, min(2) = 2, min(3) = 2, min(4) = 4, max(4) = 4, and ranges are [1,2], (2,2], (2,4], (4,4]. After collapsing [1,2] and (2,4].
Examples
The simplest invocation of this routine populates numerical binning definitions in the num_bin_table
for all the numerical attributes found in build_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_QTILE( 'num_bin_table', 'build_table'); END;
This procedure finds a numerical and categorical binning definition for every NUMBER
, FLOAT
, VARCHAR2
, or CHAR
column in the data table that is not in the exclusion list and inserts the definition into the numerical or categorical binning definition table.
The bin boundaries are computed using a technique called Supervised Binning. The definition for each relevant column is computed based on the splits found by a Decision Tree model built on a single predictor. Columns that have no interesting splits are inserted into the column removal definition table (if one is specified).
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_SUPER ( num_table_name IN VARCHAR2, cat_table_name IN VARCHAR2, data_table_name IN VARCHAR2, target_column_name IN VARCHAR2, max_bin_num IN PLS_INTEGER DEFAULT 1000, exclude_list IN COLUMN_LIST DEFAULT NULL, num_schema_name IN VARCHAR2 DEFAULT NULL, cat_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, rem_table_name IN VARCHAR2 DEFAULT NULL, rem_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-15 INSERT_BIN_SUPER Procedure Parameters
Parameter | Description |
---|---|
num_table_name |
Name of the numerical bin definition table generated using the CREATE_BIN_NUM procedure. |
cat_table_name |
Name of the categorical bin definition table generated using the CREATE_BIN_CAT procedure. |
data_table_name |
Name of the table containing the data. |
target_column_name |
Name of the target column. |
max_bin_num |
The maximum number of bins. |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the binning process. |
num_schema_name |
Schema of the numerical bin definition table. |
cat_schema_name |
Schema of the categorical bin definition table. |
data_schema_name |
Schema of the table containing the data. |
rem_table_name |
Name of the column removal definition table created with CREATE_COL_REM . |
rem_schema_name |
Schema of the column removal definition table. |
This procedure finds the trimming definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition into the clipping definition table that was created using CREATE_CLIP
.
The definition for each relevant column is computed based on the non-NULL
values sorted in ascending order such that val(1) < val(2) <... < val(N), where N is a total number of non-NULL
values in a column:
lcut = val(1+floor(N*q)) lval = NULL rcut = val(N–floor(*N*q)) rval = NULL
where q = ABS(NVL(tail_frac,0)). Nothing is done when q >= 0.5.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_TRIM_TAIL ( clip_table_name IN VARCHAR2, data_table_name IN VARCHAR2, tail_frac IN NUMBER DEFAULT 0.025, exclude_list IN Column_List DEFAULT NULL, clip_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-16 INSERT_CLIP_TRIM_TAIL Procedure Parameters
Parameter | Description |
---|---|
clip_table_name |
Name of the clipping definition table generated using the CREATE_CLIP procedure |
data_table_name |
Name of the table containing the data |
tail_frac |
Tail fraction |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the clipping process |
clip_schema_name |
Name of the schema hosting the clipping definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with different specifications for tail_frac
for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list
parameter for a particular specification.
Example 1. When q = 0.2, col='mycol', and data is {1,2,2,2,3,4,4}, the following row is inserted into the definition table:
COL LCUT LVAL RCUT RVAL ----- ----- ----- ----- ----- mycol 2 NULL 4 NULL
Here 1 + floor(N*q) = 1 + floor(7*0.2) = 2, lcut = val(2) = 2.
N – floor(N*q) = 7 – floor(7*0.2) = 6, rcut = val(6) = 4.
Examples
The simplest invocation of this routine populates clipping definitions in the clip_table
for all the numerical attributes found in build_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_TRIM_TAIL( 'clip_table', 'build_table'); END;
This procedure finds the Winsorizing definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition into the clipping definition table that was created using CREATE_CLIP
.
Definition for each relevant column is computed based on the non-NULL
values sorted in ascending order such that val(1) < val(2) <... < val(N), where N is a total number of non-NULL
values in a column:
lcut = val(1+floor(N*q)) lval = lcut rcut = val(N–floor(N*q)) rval = rcut
where q = ABS(NVL(tail_fraq,0)). Nothing is done when q >= 0.5.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL ( clip_table_name IN VARCHAR2, data_table_name IN VARCHAR2, tail_frac IN NUMBER DEFAULT 0.025, exclude_list IN Column_List DEFAULT NULL, clip_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-17 INSERT_CLIP_WINSOR_TAIL Procedure Parameters
Parameter | Description |
---|---|
clip_table_name |
Name of the clipping definition table generated using CREATE_CLIP procedure |
data_table_name |
Name of the table containing the data |
tail_frac |
Tail fraction |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the clipping process |
clip_schema_name |
Name of the schema hosting the clipping definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with different specifications for tail_frac
for a given input table. For each iteration, you can selectively exclude attribute (that is, column names using the exclude_list
parameter for a particular specification. Columns with all NULL
values are ignored.
Example 1. When q = 0.2, col='mycol', and data is {1,2,2,2,3,4,4}, the following row is inserted into the definition table:
COL LCUT LVAL RCUT RVAL ----- ----- ----- ----- ----- mycol 2 2 4 4
Here 1 + floor(N*q) = 1 + floor(7*0.2) = 2, lcut = val(2) = 2.
N – floor(N*q) = 7 – floor(7*0.2) = 6, rcut = val(6) = 4.
Examples
The simplest invocation of this routine populates clipping definitions in the clip_table
for all the numerical attributes found in build_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL( 'clip_table', 'build_table'); END;
This procedure finds the categorical missing value treatment definition for every VARCHAR2
and CHAR
column in the data table that is not specified in the exclusion list and inserts the definition into the definition table that was created using CREATE_MISS_CAT
.
The definition for each selected column is computed based on the mode value that is computed from the data table.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE ( miss_table_name IN VARCHAR2, data_table_name IN VARCHAR2, exclude_list IN COLUMN_LIST DEFAULT NULL, miss_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-18 INSERT_MISS_CAT_MODE Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the categorical missing value treatment definition table generated using CREATE_MISS_CAT |
data_table_name |
Name of the table containing the data |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the missing value treatment. See "Data Types" for the definition of COLUMN_LIST . |
miss_schema_name |
Name of the schema hosting the categorical missing value treatment definition table |
data_schema_name |
Name of the schema hosting the table containing the data |
Usage Notes
You can choose the categorical attributes that will receive missing value treatment by using the exclude_list
parameter. NULL
values in all the selected attributes will be replaced with the mode (the most commonly occurring value) for the attribute.
If you wish to replace NULL
s with some other value, you can edit the definition table.
Example
The simplest invocation of this routine populates missing value definitions (the mode) in miss_table
for all categorical attributes found in build_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE( 'miss_table', 'build_table'); END;
This procedure finds the numerical missing value treatment definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition into the definition table that was created using CREATE_MISS_NUM
.
The definition for each selected column is computed based on the mean value that is computed from the data table. The value of mean is rounded to round_num
significant digits prior to storing it in the definition table.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_NUM_MEAN ( miss_table_name IN VARCHAR2, data_table_name IN VARCHAR2, exclude_list IN COLUMN_LIST DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, miss_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-19 INSERT_MISS_NUM_MEAN Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the categorical missing value treatment definition table generated using CREATE_MISS_CAT |
data_table_name |
Name of the table containing the data |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the miss value treatment. See "Data Types" for the definition of COLUMN_LIST . |
round_num |
The number of significant digits |
miss_schema_name |
Name of the schema hosting the numerical missing value treatment definition table |
data_schema_name |
Name of the schema hosting the table containing the data |
Usage Notes
You can choose the numerical attributes that will receive missing value treatment by using the exclude_list
parameter. NULL
values in all the selected attributes will be replaced with the mean (average) value for the attribute.
If you wish to replace NULL
s with some other value, you can edit the definition table.
Example
The simplest invocation of this routine populates missing value definitions (the mode) in miss_table
for all numerical attributes found in build_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE( 'miss_table', 'build_table'); END;
This procedure finds the normalization definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition based on min-max normalization into the table that was created using CREATE_NORM_LIN
.
Definition for each relevant column is computed based on the mean and standard deviation that are computed from the data table, such that shift = mean and scale = standard deviation. The values of shift and scale are rounded to round_num
significant digits prior to storing them in the definition table.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX ( norm_table_name IN VARCHAR2, data_table_name IN VARCHAR2, exclude_list IN Column_List DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, norm_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-20 INSERT_NORM_LIN_MINMAX Procedure Parameters
Parameter | Description |
---|---|
norm_table_name |
Name of the normalization table generated using CREATE_NORM_LIN procedure |
data_table_name |
Name of the table containing the data |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the normalization process |
round_num |
Number of significant digits |
norm_schema_name |
Name of the schema hosting the normalization definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with selective exclusion of attributes (that is, column names) using the exclude_list
parameter for a particular normalization specification.
Columns with all NULL
values or only one unique value are ignored.
Examples
The simplest invocation of this routine populates normalization definitions in the norm_minmax_table
for all the numerical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX( 'norm_minmax_table', 'build_table');
END;
This procedure finds the normalization definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition based on min-max normalization into the table that was created using CREATE_NORM_LIN
.
The normalization definition for each attribute is computed based on the minimum and maximum values of the data. The values for shift
and scale
are shift = 0
and scale = max{abs(max), abs(min)}.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_SCALE ( norm_table_name IN VARCHAR2, data_table_name IN VARCHAR2, exclude_list IN Column_List DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, norm_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-21 INSERT_NORM_LIN_SCALE Procedure Parameters
Parameter | Description |
---|---|
norm_table_name |
Name of the normalization table generated using CREATE_NORM_LIN procedure |
data_table_name |
Name of the table containing the data |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the normalization process |
round_num |
Number of significant digits |
norm_schema_name |
Name of the schema hosting the normalization definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with selective exclusion of attributes (that is, column names) using the exclude_list parameter for a particular normalization specification.
Columns with all NULL
values or only one unique value are ignored.
Examples
The simplest invocation of this routine populates normalization definitions in the norm_minmax_table
for all the numerical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_SCALE( 'norm_scale_table', 'build_table');
END;
This procedure finds the normalization definition for every NUMBER
column in the data table that is not specified in the exclusion list and inserts the definition based on z-score normalization into the table that was created using CREATE_NORM_LIN
.
Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table, such that shift = min and scale = max – min. The values of shift and scale are rounded to round_num significant digits prior to storing them in the definition table.
Syntax
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_ZSCORE ( norm_table_name IN VARCHAR2, data_table_name IN VARCHAR2, exclude_list IN Column_List DEFAULT NULL, round_num IN PLS_INTEGER DEFAULT 6, norm_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-22 INSERT_NORM_LIN_ZSCORE Procedure Parameters
Parameter | Description |
---|---|
norm_table_name |
Name of the normalization table generated using CREATE_NORM_LIN procedure |
data_table_name |
Name of the table containing the data |
exclude_list |
List of columns (attributes) to be excluded from this iteration of the normalization process |
round_num |
Number of significant digits |
norm_schema_name |
Name of the schema hosting the normalization definition table |
data_schema_name |
Name of the schema hosting the table with data |
Usage Notes
For a given input table, you can iteratively call this routine several times with selective exclusion of attributes (that is, column names) using the exclude_list
parameter for a particular binning specification.
Columns with all NULL
values or only one unique value are ignored.
Examples
The simplest invocation of this routine populates normalization definitions in the norm_zscore_table
for all the numerical attributes found in build_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_ZSCORE( 'norm_zscore_table', 'build_table');
END;
/
This procedure appends a VARCHAR2
chunk to the expression. Appending NULL clears the expression.
Syntax
DBMS_DATA_MINING_TRANSFORM.SET_EXPRESSION ( expression IN OUT NOCOPY EXPRESSION_REC, chunk VARCHAR2 DEFAULT NULL);
Parameters
Table 39-23 SET_EXPRESSION Procedure Parameters
Parameter | Description |
---|---|
expression |
Transformation expression |
chunk |
Chunk to add to the expression |
This procedure specifies a transformation expression and a reverse transformation expression for an attribute.
The new transformation specification is appended to the transformation list.
Syntax
DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM ( xform_list IN OUT NOCOPY TRANSFORM_LIST, attribute_name VARCHAR2, attribute_subname VARCHAR2, expression VARCHAR2, reverse_expression VARCHAR2, attribute_spec VARCHAR2 DEFAULT NULL);
Parameters
Table 39-24 SET_TRANSFORM Procedure Parameters
Parameter | Description |
---|---|
xform_list |
The transformation list for the attribute |
attribute_name |
Name of the attribute |
attribute_subname |
Name of the nested attribute if attribute_name is a nested column, otherwise NULL . |
expression |
A SQL expression. |
reverse_expression |
SQL expression that reverses the transformation for readability in model details and in the target of a supervised model (if the attribute is a target) |
attribute_spec |
You can specify the value NOPREP for attribute_spec to prevent this attribute from being automatically transformed in DBMS_DATA_MINING.CREATE_MODEL . (See "CREATE_MODEL Procedure".) |
For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks a categorical binning expression on top of the existing expression and updates the stack definition.
Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When a reverse expression is NULL, it is treated as an identity function ("{attr}" or "VALUE" for nested attributes).
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, {expr} has the following form after stacking:
DECODE({expr}, '{val1}', '{bin1}', ... '{valN}', '{binN}', NULL, NULL, '{bin(N+1)}')
and {rexp} has the following form:
DECODE("{attr}", '{bin1}', '{rev1}', ... '{binN}', '{revN}', NULL, 'NULL', '{bin(N+1)}', 'DEFAULT')
where {revI} is the result of applying reverse expression to {valI}.
If {binI} and {binJ} are equal, then the corresponding entries of the DECODE function above are merged into:
'{binI}', '{revI}, {revJ}'
If more than one entry maps to the default bin {bin(N+1)}, they are all merged to
'{bin(N+1)}', 'DEFAULT'
Note that the reverse expression implicitly maps invalid bins to NULL.
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_BIN_CAT ( bin_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, literal_flag IN BOOLEAN DEFAULT FALSE, bin_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-25 STACK_BIN_CAT Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the bin definition table |
transform_list |
List of stacked transformations |
literal_flag |
Literal flag (See the Usage Notes) |
bin_schema_name |
Schema of the bin definition table |
Usage Notes
The Literal flag indicates whether the values in {bin} are valid SQL literals. When the the flag is set to TRUE, the value of {bin} is used "as is" in the expression generation, otherwise it is converted into a valid text literal (surrounded by quotes and double the quotes inside). By default the flag is set to FALSE. One example when it can be set to TRUE is in cases when all {bin} are numbers. In that case the transformed column will remain numeric as opposed to textual (default behavior). For example, for the following transform definition:
COL VAL BIN ----- -------------- ---- mycol Waltham 1 mycol Burlington 1 mycol Redwood Shores 2
the following {expr} and {rexp} are generated when the flag is FALSE:
expr = DECODE("mycol", 'Waltham', '1', 'Burlington', '1', 'Redwood Shores', '2') rexp = DECODE("COL1", '1', '''Waltham'', ''Burlington''', '2', '''Redwood Shores''', NULL, 'DEFAULT')
and when the flag is set to TRUE:
expr = DECODE("mycol", 'Waltham', 1, 'Burlington', 1, 'Redwood Shores', 2) rexp = DECODE("COL1", 1, '''Waltham'', ''Burlington''', 2, '''Redwood Shores''', NULL, 'DEFAULT')
Example
Given transformation definition:
{col = COL1, val = waltham, bin = MA} {col = COL1, val = burlington, bin = MA} {col = COL1, val = redwood shores, bin = CA} {col = COL2, val = MA, bin = East} {col = COL2, val = CA, bin = West} {col = COL2, val = NULL, bin = USA}
and stack definition:
{attr = COL1, expr = lower(COL1), rexp = initcap(COL1)} {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = DECODE(lower(COL1), 'waltham', 'MA', 'burlington', 'MA', 'redwood shores', 'CA'), rexp = DECODE("COL1", 'MA', '''Waltham'', ''Burlington''', 'CA', '''Redwood Shores''', NULL, 'DEFAULT')} {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)} {attr = COL2, expr = DECODE("COL2", 'MA', 'East', 'NY', 'East', 'CA', 'West', NULL, NULL, 'USA') rexp = DECODE("COL2", 'East', '''MA''', 'West', '''CA''', NULL, 'NULL', 'USA', 'DEFAULT')}
For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks numerical a binning expression on top of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added, or remains unchanged, then a corresponding reverse expression is also updated, added, or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes).
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, {expr} has the following form after stacking:
CASE WHEN ({expr}) < {val0} THEN '{bin0}' WHEN ({expr}) <= {val1} THEN '{bin1}' ... WHEN ({expr}) <= {valN} THEN '{binN}' WHEN ({expr}) IS NOT NULL THEN '{bin(N+1)}' END
and {rexp} has the following form:
DECODE("{attr}", '{bin0}', '( ; {rev0})', '{bin1}', '[{rev0}; {rev1})', ... '{binN}', '[{rev(N-1)}; {revN}]', '{bin(N+1)}', '({revN}; )', NULL, 'NULL')
where {revI} is the result of applying reverse expression to {valI}.
If {binI} and {binJ} are equal then the corresponding entries of the DECODE function above are merged into:
'{binI}', '[{rev(I-1)}; {revI}), [{rev(J-1)}; {revJ})'
Note that reverse expression implicitly maps invalid bins to NULL.
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_BIN_NUM ( bin_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, literal_flag IN BOOLEAN DEFAULT FALSE, bin_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-26 STACK_BIN_NUM Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the bin definition table. |
transform_list |
Stack definition. |
literal_flag |
Literal flag (See Usage Notes) |
bin_schema_name |
Schema of the bin definition table |
Usage Notes
Literal flag indicates whether the values in {bin} are valid SQL literals. When the flag is set to TRUE the value of {bin} is used "as is" in the expression generation, otherwise it is converted into a valid text literal (surround by quotes and double the quotes inside. By default the flag is set to FALSE. One example when it can be set to TRUE is in cases when all {bin} are numbers. In that case the transformed column will remain numeric as opposed to textual (default behavior). For example, for the following xform definition:
COL VAL BIN ----- ----- ----- mycol 10 NULL mycol 15 1 mycol 20 2
the following {expr} and {rexp} are generated when the flag is FALSE:
expr = CASE WHEN "mycol" < 10 THEN NULL WHEN "mycol" <= 15 THEN '1' WHEN "mycol" <= 20 THEN '2' END rexp = DECODE("mycol", '1', '[10; 15)', '2', '[15; 20]' NULL, '( ; 10) OR (20; ) OR NULL')
and when the flag is set to TRUE:
expr = CASE WHEN "mycol" < 10 THEN NULL WHEN "mycol" <= 15 THEN 1 WHEN "mycol" <= 20 THEN 2 END rexp = DECODE("mycol", 1, '[10; 15)', 2, '[15; 20]', NULL, '( ; 10) OR (20; ) OR NULL')
Examples
Given transformation definition:
{col = COL1, val = 0, bin = NULL} {col = COL1, val = 1, bin = A} {col = COL1, val = 2, bin = B} {col = COL1, val = 3, bin = A} {col = COL2, val = 10, bin = NULL} {col = COL2, val = 15, bin = 1} {col = COL2, val = 20, bin = 2}
and stack definition:
{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = CASE WHEN (log(10, COL1)) < 0 THEN NULL WHEN (log(10, COL1)) <= 1 THEN 'A' WHEN (log(10, COL1)) <= 2 THEN 'B' WHEN (log(10, COL1)) <= 3 THEN 'A' END, rexp = DECODE("COL1", 'A', '[1; 10), [100; 1000]', 'B', '[10; 100)', NULL, '( ; 1), (1000; ), NULL')} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)} {attr = COL2, expr = CASE WHEN "COL2" < 10 THEN NULL WHEN "COL2" <= 15 THEN '1' WHEN "COL2" <= 20 THEN '2' END rexp = DECODE("COL2", '1', '[10; 15)', '2', '[15; 20]', NULL, '( ; 10) OR (20; ) OR NULL')}
For every column in the stack definition that has a matching entry in the transformation definition. Stacks clipping expression on top of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed.
Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When the reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes).
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:
CASE WHEN {expr} < {lcut} THEN {lval} WHEN {expr} > {rcut} THEN {rval} ELSE {expr} END
and {rexp} remains unchanged
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_CLIP ( clip_table_name VARCHAR2, xform_list IN OUT NOCOPY TRANSFORM_LIST, clip_schema_name VARCHAR2 DEFAULT NULL);
Parameters
Table 39-27 STACK_CLIP Procedure Parameters
Parameter | Description |
---|---|
clip_table_name |
Clipping definition table |
xform_list |
Stack definition |
clip_schema_list |
Schema of the clipping definition table. If you do not specify a schema, you own schema is assumed. |
Example
Given transformation definition:
{col = COL1, lcut = -1.5, lval = -1.5, rcut = 4.5, rval = 4.5} {col = COL2, lcut = 0, lval = 0, rcut = 1, rval = 1}
and stack definition:
{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = CASE WHEN log(10, COL1) < -1.5 THEN -1.5 WHEN log(10, COL1) > 4.5 THEN 4.5 ELSE log(10, COL1) END, rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)} {attr = COL2, expr = CASE WHEN COL2 < 0 THEN 0 WHEN COL2 > 1 THEN 1 ELSE COL2 END, rexp = NULL}
For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks column removal expression (NULL expression) instead of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (with NULL expression).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When reverse expression is NULL it is treated as "{attr}" (identity function) or "VALUE" for nested columns.
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} and {rexp} are both changed to NULL.
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_COL_REM ( rem_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, rem_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-28 STACK_COL_REM Procedure Parameters
Parameter | Description |
---|---|
rem_table_name |
Name of the column removal definition table. |
transform_list |
Stack definition |
rem_schema_name |
Schema of the column removal definition table. |
Examples
Given transformation definition:
{col = COL1, att = NULL} {col = COL2, att = NULL}
and stack definition:
{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = NULL, rexp = NULL} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)} {attr = COL2, expr = NULL, rexp = NULL}
For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks categorical missing value treatment expression on top of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes). Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:
NVL({expr}, {val})
and {rexp} remains unchanged.
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_MISS_CAT ( miss_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, miss_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-29 STACK_MISS_CAT Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the missing value definition table. |
transform_list |
Stack definition. |
miss_schema_name |
Schema of the missing value definition table. |
Examples
Given transformation definition:
{col = COL1, val = 'ma'} {col = COL2, val = 'CA'}
and stack definition:
{attr = COL1, expr = lower(COL1), rexp = initcap(COL1)} {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = NVL(lower(COL1), 'ma'), rexp = initcap(COL1)} {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)} {attr = COL2, expr = NVL(COL2, 'CA'), rexp = NULL}
For every column in the stack definition that has a matching entry in the transformation definition, stacks numerical missing value treatment expression on top of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes).
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:
NVL({expr}, {val})
and {rexp} remains unchanged
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_MISS_NUM ( miss_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, miss_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-30 STACK_MISS_NUM Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the missing value definition table. |
transform_list |
Stack definition. |
miss_schema_name |
Schema of the missing value definition table. |
Examples
Given transformation definition:
{col = COL1, val = 4.5} {col = COL2, val = 0}
and stack definition:
{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = NVL(log(10, COL1), 4.5), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)} {attr = COL2, expr = NVL(COL2, 0), rexp = NULL}
For every column in the stack definition that has a matching entry in the transformation definition, stacks linear normalization expression on top of the existing expression and updates the stack definition.
Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).
Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes).
Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:
({expr} - {shift})/{scale}
and {rexp} maintains the following form with every occurrence of {attr} replaced with:
{attr}*{scale} + {shift}
Syntax
DBMS_DATA_MINING_TRANSFORM.STACK_NORM_LIN ( norm_table_name IN VARCHAR2, transform_list IN/OUT DM_TRANSFORMS, norm_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-31 STACK_NORM_LIN Procedure Parameters
Parameter | Description |
---|---|
norm_table_name | Name of the linear normalization definition table. |
transform_list | Stack definition. |
norm_schema_name | Schema of the linear normalization definition table. |
Examples
Given transformation definition:
{col = COL1, shift = -1.5, scale = 20} {col = COL2, shift = 0, scale = 10}
and stack definition:
{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
the following updated stack definition is generated:
{attr = COL1, expr = (log(10, COL1) - (-1.5)) / 20, rexp = power(10, COL1*20 + (-1.5))} {attr = COL3, expr = ln(COL3), rexp = exp(COL3)} {attr = COL2, expr = (COL2 - 0) / 10, rexp = COL2*10 + 0}
This procedure creates the view that performs categorical binning. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_CAT ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, literal_flag IN BOOLEAN DEFAULT FALSE, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-32 XFORM_BIN_CAT Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the categorized binning definition table generated using CREATE_BIN_CAT procedure |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
literal_flag |
Literal flag |
bin_schema_name |
Name of the schema hosting the bin definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Usage Notes
The bin table created by CREATE_BIN_CAT
and populated with bin definitions by INSERT_BIN_CAT_FREQ
is used to guide the query generation process to construct categorical binning expressions of the following form:
DECODE("col", val_1, bin_1, ... val_N, bin_N, NULL, NULL, bin_(N+1)) "col"
This expression maps values val_1,..., val_N
into N bins bin_1,..., bin_N
, and other values into bin_(N+1)
, while NULL
values remain unchanged. bin_(N+1) is optional. If not specified, it defaults to NULL
. To specify bin_(N+1)
provide a row with val set to NULL
.
The literal_flag
parameter indicates whether the values in bin are valid SQL literals. When the flag is set to TRUE, the value of bin is used as is in query generation; otherwise it is converted into a valid text literal (surrounded by quotes and each single quote is replaced by two single quotes). By default, the flag is set to FALSE. One example of when it can be set to TRUE is in cases when all bin are numbers. In that case the transformed column will be numeric as opposed to textual (default behavior).
Set literal_flag
to TRUE when the data is binned for an O-Cluster model build.
The col
parameter is case-sensitive since it generates quoted identifiers. In cases when there are multiple entries with the same col
,val
combination with different bin
, the behavior is undefined — any one of the bin
values might be used.
Examples
Example 1. bin_cat contains four rows with col = 'mycol':
{col = 'mycol', val = 'Waltham', bin = 'MA'} {col = 'mycol', val = 'Burlington', bin = 'MA'} {col = 'mycol', val = 'Redwood Shores', bin = 'CA'} {col = 'mycol', val = NULL, bin = 'OTHER'}
the following expression is generated:
DECODE("mycol", 'Waltham', 'MA', 'Burlington', 'MA', 'Redwood Shores', 'CA', NULL, NULL, 'OTHER') "mycol"
Example 2. bin_cat contains three rows with col = 'mycol':
{col = 'mycol', val = 'Waltham', bin = 'MA'} {col = 'mycol', val = 'Burlington', bin = 'MA'} {col = 'mycol', val = 'Redwood Shores', bin = 'CA'}
the following expression is generated:
DECODE("mycol", 'Waltham', 'MA', 'Burlington', 'MA', 'Redwood Shores', 'CA') "mycol"
Example 3. For the definition:
COL VAL BIN ----- ---------- --- mycol Waltham 1 mycol Burlington 1 mycol Redwood Shores 2
the following expression is generated when the literal flag is set to FALSE:
DECODE ("mycol", 'Waltham', '1', 'Burlington' '1', 'Redwood Shores', '2') "mycol"
and when the flag is set to TRUE:
DECODE("mycol", 'Waltham', 1, 'Burlington', 1, 'Redwood Shores', 2) "mycol"
The simplest invocation of this routine generates a view build_view that represents the transformation query on build_table based on bin definitions in the cat_bin_table.
BEGIN
DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_CAT( 'cat_bin_table', 'build_table', 'build_view');
END; /
This procedure creates the view that performs numerical binning. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_NUM ( bin_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, literal_flag IN BOOLEAN DEFAULT FALSE, bin_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-33 XFORM_BIN_NUM Procedure Parameters
Parameter | Description |
---|---|
bin_table_name |
Name of the numerical binning definition table generated using CREATE_BIN_NUM procedure |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
literal_flag |
Literal flag |
bin_schema_name |
Name of the schema hosting the bin definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Usage Notes
The bin table created by CREATE_BIN_NUM
and populated with bin definitions by INSERT_BIN_NUM_EQWIDTH
or INSERT_BIN_NUM_QTILE
is used to guide the query generation process to construct numerical binning expressions of the following form:
CASE WHEN "col" < val_0 THEN 'bin0_0 WHEN "col" <= val_1 THEN 'bin_1' ... WHEN "col" <= val_N THEN 'bin_N' WHEN "col" IS NOT NULL THEN 'bin_(N+1)' END "col"
This expression maps values in the range [val_0;val_N] into N bins bin_1,..., bin_N, values outside of this range into bin_0 or bin_(N+1), such that
(-inf; val_0) -> bin_0 [val_0; val_1) -> bin_1 ... (val_(N-1); val_N] -> bin_N (val_N; +inf) -> bin_(N+1)
NULL
values remain unchanged. bin_(N+1) is optional. If it is not specified, the values ("col" > val_N) are mapped to NULL
. To specify bin_(N+1), provide a row with val set to NULL
. The order of the WHEN... THEN pairs is based on the ascending order of val for a given col.
The literal_flag parameter indicates whether the values in bin are valid SQL literals. When the flag is set to TRUE, the value of bin is used as is in query generation; otherwise it is converted into a valid text literal (surrounded by quotes and each single quote is replaced by two single quotes). By default, the flag is set to FALSE. One example of when it can be set to TRUE is in cases when all bin are numbers. In that case the transformed column will be numeric as opposed to textual (default behavior).
Note that col is case-sensitive since it generates quoted identifiers. In cases where there are multiple entries with the same col,val combination with different bin, the behavior is undefined — any one of the bin values might be used.
Examples
Example 1. bin_num contains four rows with col = 'mycol':
{col = 'mycol', val = 15.5, bin = 'small'} {col = 'mycol', val = 10, bin = 'tiny'} {col = 'mycol', val = 20, bin = 'large'} {col = 'mycol', val = NULL, bin = 'huge'}
the following expression is generated:
CASE WHEN "mycol" < 10 THEN 'tiny' WHEN "mycol" <= 15.5 THEN 'small' WHEN "mycol" <= 20 THEN 'large' WHEN "mycol" IS NOT NULL THEN 'huge' END "mycol"
Example 2. bin_num contains three rows with col = 'mycol':
{col = 'mycol', val = 15.5, bin = NULL} {col = 'mycol', val = 10, bin = 'tiny'} {col = 'mycol', val = 20, bin = 'large'}
the following expression is generated:
CASE WHEN "mycol" < 10 THEN NULL WHEN "mycol" <= 15.5 THEN 'small' WHEN "mycol" <= 20 THEN 'large' END "mycol"
Example 3. For the definition:
COL VAL BIN ----- ---- --- mycol 10 NULL mycol 15.5 1 mycol 21 2
the following expression is generated when the literal flag is set to FALSE:
CASE WHEN "mycol" < 10 THEN NULL WHEN "mycol" <= 15.5 THEN '1' WHEN "mycol" <= 20 THEN '2' END "mycol"
and when the flag is set to TRUE:
CASE WHEN "mycol" < 10 THEN NULL WHEN "mycol" <= 15.5 THEN 1 WHEN "mycol" <= 20 THEN 2 END "mycol"
The simplest invocation of this routine generates a view build_view
that represents the transformation query on build_table
based on transform definitions in bin definitions in the num_bin_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_NUM( 'num_bin_table', 'build_table', 'build_view');
END; /
This procedure creates the view that performs clipping. Only the columns that are specified in the transform definition are clipped; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP ( clip_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, clip_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2,DEFAULT NULL, xform_schema_name IN VARCHAR2,DEFAULT NULL;
Parameters
Table 39-34 XFORM_CLIP Procedure Parameters
Parameter | Description |
---|---|
clip_table_name |
Name of the clipping definition table generated using CREATE_CLIP |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
clip_schema_name |
Name of the schema hosting the clipping definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Usage Notes
The clipping definition table created by CREATE_CLIP
and populated with clipping definitions by INSERT_CLIP_WINSOR_TAIL
or INSERT_CLIP_TRIM_TAIL
is used to guide query generation process to construct clipping expressions of the following form:
CASE WHEN "col" < lcut THEN lval WHEN "col" > rcut THEN rval ELSE "col" END "col"
Note that col is case-sensitive since it generates quoted identifiers. When there are multiple entries in the transform definition table for the same col, the behavior is undefined. Any one of the definitions may be used in query generation. NULL
values remain unchanged.
Example 1 (Winsorizing). When col = 'my_col', lcut = –1.5, lval = –1.5, and rcut = 4.5 and rval = 4.5, the following expression is generated:
CASE WHEN "my_col" < –1.5 THEN -1.5 WHEN "my_col" > 4.5 THEN 4.5 ELSE "my_col" END "my_col"
Examples
The simplest invocation of this routine generates a view object build_view
that represents the transformation query on build_table
based on transform definitions in clipping definitions in the clip_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP( 'clip_table', 'build_table', 'build_view'); END;
This procedure creates a view that removes columns from the data table. Only the columns that are specified in the transform definition are removed; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_COL_REM ( rem_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, rem_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-35 XFORM_COL_REM Procedure Parameters
Parameter | Description |
---|---|
rem_table_name |
Name of the column removal definition table. |
data_table_name |
Name of the table containing the data. |
xform_view_name |
View representing the transformed output. |
rem_schema_name |
Schema of the column removal definition table. |
data_schema_name |
Schema of the data table. |
xform_schema_name |
Schema of the transforming view. |
Creates a view that applies a given expression for every NUMBER
column in the data table that is not in the exclusion list and in the inclusion list. The remaining columns do not change.
Expressions are constructed from the expression pattern by replacing every occurrence of the column pattern with an actual column name.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_EXPR_NUM ( expr_pattern IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, exclude_list IN COLUMN_LIST DEFAULT NULL, include_list IN COLUMN_LIST DEFAULT NULL, col_pattern IN VARCHAR2 DEFAULT ':col', data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-36 XFORM_EXPR_NUM Procedure Parameters
Parameter | Description |
---|---|
expr_pattern |
Expression pattern |
data_table_name |
Name of the table containing the data. |
xform_view_name |
View representing the transformed output. |
exclude_list |
List of columns to exclude. |
include_list |
List of columns to include. |
col_pattern |
Column pattern. |
data_schema_name |
Schema of the data table. |
xform_schema_name |
Schema of the transformation view. |
Examples
For a table TAB
with two NUMBER
columns CN1
, CN3
and one CHAR
columns CC2
and expression pattern TO_CHAR
(:col
) , the following query is generated:
SELECT TO_CHAR("CN1") "CN1", "CC2", TO_CHAR("CN3") "CN3" FROM TAB
This procedure can be used for clipping (winsorizing) normalized data to a [0..1] range, that is values x > 1 become 1 and values x < 0 become 0.
For the table in example 1 and pattern
CASE WHEN :col < 0 THEN 0 WHEN :col > 1 THEN 1 ELSE :col END
the following query is generated:
SELECT CASE WHEN "CN1" < 0 THEN 0 WHEN "CN1" > 1 THEN 1 ELSE "CN1" END "CN1", "CC2", CASE WHEN "CN3" < 0 THEN 0 WHEN "CN3" > 1 THEN 1 ELSE "CN3" END "CN3" FROM TAB
This procedure creates a view that applies a given expression for every VARCHAR2
or CHAR
column in the data table that is not in the exclusion list and in the inclusion list. The remaining columns do not change.
Similar to xform_expr_num
, except that it applies to CHAR
and VARCHAR2
columns instead of NUMBER
. See "XFORM_EXPR_NUM Procedure"
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_EXPR_STR ( expr_pattern IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, exclude_list IN COLUMN_LIST DEFAULT NULL, include_list IN COLUMN_LIST DEFAULT NULL, col_pattern IN VARCHAR2 DEFAULT ':col', data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-37 XFORM_EXPR_STR Procedure Parameters
Parameter | Description |
---|---|
expr_pattern |
Expression pattern |
data_table_name |
Name of the table containing the data. |
xform_view_name |
View representing the transformed output. |
exclude_list |
List of columns to exclude. |
include_list |
List of columns to include. |
col_pattern |
Column pattern. |
data_schema_name |
Schema of the data table. |
xform_schema_name |
Schema of the transformation view. |
This procedure creates a view that performs categorical missing value treatment. Only the columns that are specified in the xform definition are treated; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT ( miss_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, miss_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL;
Parameters
Table 39-38 XFORM_MISS_CAT Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the categorical missing value treatment definition table generated using CREATE_MISS_CAT |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
miss_schema_name |
Name of the schema hosting the categorical missing value treatment definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Usage Notes
The data type of the transformed columns is preserved by putting a CAST
expression around the NVL
function. For example, when col = 'state', val = 'MA' the data type is CHAR(2)
the following expression is generated:
CAST (NVL("state", 'MA') AS CHAR(2)) "state"
Examples
The simplest invocation of this routine generates a view object build_view
that represents the transformation query on build_table
based on transform definitions in missing value definitions in miss_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT( 'miss_table', 'build_table', 'build_view'); END;
This procedure creates a view that performs numerical missing value treatment of the data table. Only the columns that are specified in the xform definition are treated, the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM ( miss_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, miss_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL;
Parameters
Table 39-39 XFORM_MISS_NUM Procedure Parameters
Parameter | Description |
---|---|
miss_table_name |
Name of the numeric missing value treatment definition table generated using CREATE_MISS_NUM |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
miss_schema_name |
Name of the schema hosting the numerical missing value treatment definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Examples
The simplest invocation of this routine generates a view object build_view
that represents the transformation query on build_table
based on transform definitions in missing value definitions in miss_table
.
BEGIN DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM( 'miss_table', 'build_table', 'build_view'); END;
This procedure creates the view that performs linear normalization. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN ( norm_table_name IN VARCHAR2, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, norm_schema_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-40 XFORM_NORM_LIN Procedure Parameters
Parameter | Description |
---|---|
norm_table_name |
Name of the normalization definition table generated using CREATE_NORM_LIN procedure |
data_table_name |
Name of the table containing the data |
xform_view_name |
View representing the transformed output |
norm_schema_name |
Name of the schema hosting the normalization definition table |
data_schema_name |
Name of the schema hosting the data table |
xform_schema_name |
Name of the schema hosting the view representing the transformed output |
Usage Notes
The normalization table created by CREATE_NORM_LIN
is populated with definitions by either INSERT_NORM_LIN_ZSCORE
or INSERT_NORM_LIN_MINMAX
is used to guide the query generation process to construct normalization expressions of the following form:
("col" - shift)/scale "col"
Note that col is case-sensitive since it generates quoted identifiers. When there are multiple entries in the transform definition table for the same col, the behavior is undefined. Any one of the definitions may be used in query generation. NULL
values remain unchanged.
For example, when col = 'my_col', shift = -1.5, and scale = 20. The following expression is generated:
("my_col" - (-1.5))/20 "my_col"
Examples
The simplest invocation of this routine generates a view build_view
that represents the transformation query on build_table
based on normalization definitions in the norm_minmax_table
.
BEGIN
DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN( 'norm_minmax_table', 'build_table', 'build_view');
END;
Creates a view that performs transformation of the data table specified by the stack definition. Only the columns that are specified in the stack definition are transformed, the remaining columns do not change.
All columns in the stack definition are applied. Columns with NULL value in the expression field are excluded. Columns in the stack definition that do not have a matching column in the data are added to the view.
Syntax
DBMS_DATA_MINING_TRANSFORM.XFORM_STACK ( transform_list IN DM_TRANSFORMS, data_table_name IN VARCHAR2, xform_view_name IN VARCHAR2, data_schema_name IN VARCHAR2 DEFAULT NULL, xform_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 39-41 XFORM_STACK Procedure Parameters
Parameter | Description |
---|---|
transform_list |
Stack definition. |
data_table_name |
Name of the table containing the data. |
xform_view_name |
View representing the transformed output. |
data_schema_name |
Schema of the data table. |
xform_schema_name |
Schema of the transformation view. |