Oracle® Data Mining Application Developer's Guide 11g Release 1 (11.1) Part Number B28131-01 |
|
|
View PDF |
This chapter presents an overview of the Oracle Data Mining Java API. The Java API is based on JDM, the industry-standard Java API for data mining.
Note:
The Java API is layered on the PL/SQL and SQL language interfaces to Oracle Data Mining. All the SQL-based functionality described in this manual is also implemented in the Java API.However, the Java API supports several features not implemented in SQL, such as asynchronous execution of mining tasks and text transformation.
See Also:
Oracle Data Mining Java API Reference (javadoc)
"What's New in Oracle Data Mining" in Oracle Data Mining Concepts for a summary of the new features in the Oracle Data Mining Java API.
This chapter contains the following topics:
The Oracle Data Mining Java API requires Oracle Database 11g Release 1 (11.1)and J2SE 1.5. It is backward compatible and can be used with Oracle Database 10.2.
To use the Oracle Data Mining Java API, include the following libraries in your CLASSPATH
:
$ORACLE_HOME/rdbms/jlib/jdm.jar $ORACLE_HOME/rdbms/jlib/ojdm_api.jar $ORACLE_HOME/rdbms/jlib/xdb.jar $ORACLE_HOME/jdbc/lib/ojdbc5.jar $ORACLE_HOME//oc4j/j2ee/home/lib/connector.jar $ORACLE_HOME/oracle/jlib/orai18n.jar $ORACLE_HOME/oracle/jlib/orai18n-mapping.jar $ORACLE_HOME/lib/xmlparserv2.jar
The Data Mining Engine (DME) is the infrastructure that offers a set of data mining services to its JDM clients. The Oracle Database provides the in-database data mining functionality for JDM through the core Oracle Data Mining option. So in the rest of this document the Oracle Database is referred to as the DME.
To access data mining functionality in the database, a DME Connection needs to be created. To connect to the DME, OJDM supports following different options.
The DME Connection
object is described in detail in "Features of a DME Connection".
The Connection factory is used to create a DME connection. The JDM standard defines ConnectionFactory as a Java interface to provide a vendor neutral approach to create a DME Connection. In this approach, the application infrastructure needs to register the instance of ConnectionFactory in a JNDI server. Applications can lookup for ConnectionFactory in the JNDI server to instantiate a Connection using this factory.
OJDM provides oracle.dmt.jdm.resource.OraConnectionFactory
class, which can either be instantiated and accessed to create the connection or can be registered in JNDI server. Following code illustrates these two approaches to create a connection factory.
//Create OraConnectionFactory javax.datamining.resource.ConnectionFactory connFactory = oracle.dmt.jdm.resource.OraConnectionFactory();
//Setup the initial context to connect to the JNDI server Hashtable env = new Hashtable(); env.put( Context.INITIAL_CONTEXT_FACTORY, "oracle.dmt.jdm.resource.OraConnectionFactory" ); env.put( Context.PROVIDER_URL, "http://myHost:myPort/myService" ); env.put( Context.SECURITY_PRINCIPAL, "user" ); env.put( Context.SECURITY_CREDENTIALS, "password" ); InitialContext jndiContext = new javax.naming.InitialContext( env ); // Perform JNDI lookup to obtain the connection factory javax.datamining.resource.ConnectionFactory dmeConnFactory = (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer"); //Lookup ConnectionFactory javax.datamining.resource.ConnectionFactory connFactory = (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");
This option is useful when the applications want to control the JDBC Connections outside the OJDM and allow the OraConnectionFactory to use the specified OracleDataSource to create the database connection. This approach gives applications the ability to use the implicit connection caching features as required. By default, OJDM doesn't enable the implicit connection caching. Oracle Database JDBC Developer's Guide and Reference for information about connection caching.
//Create an OracleDataSource OracleDataSource ods = new OracleDataSource(); ods.setURL(URL); ods.setUser(user); ods.setPassword(password); //Create a connection factory using the OracleDataSource javax.datamining.resource.ConnectionFactory connFactory = oracle.dmt.jdm.resource.OraConnectionFactory(ods); //Create DME Connection javax.datamining.resource.Connection dmeConn = connFactory.getConnection();
This option is useful when the application doesn't want to pre-create the JDBC Connection and allow OJDM to maintain the JDBC Connection. Here the user needs to create an empty ConnectionSpec
instance using getConnectionSpec
() method in the oracle.dmt.jdm.resource.OraConnectionFactory
class and create a DME Connection using the connection spec. The following code illustrates the usage.
//Create ConnectionSpec ConnectionSpec connSpec = m_dmeConnFactory.getConnectionSpec(); connSpec.setURI("jdbc:oracle:thin:@host:port:sid"); connSpec.setName("user"); connSpec.setPassword("password"); //Create DME Connection javax.datamining.resource.Connection m_dmeConn = m_dmeConnFactory.getConnection(connSpec);
In the Oracle Data Mining Java API, the DME Connection
is the primary factory object. The Connection
instantiates the object factories using the getFactory
method. The Connection
object provides named object lookup, persistence, and task execution features.
The Connection.getFactory
method creates a factory object. For example, to create a factory for the PhysicalDataSet
object, pass the absolute name of the object to this method. The getFactory
method creates an instance of PhysicalDataSetFactory
.
javax.datamining.data.PhysicalDataSetFactory pdsFactory = dmeConn.getFactory("javax.datamining.data.PhysicalDataSet");
The Connection
object provides methods for retrieving metadata about mining objects.
Method | Description |
---|---|
getCreationDate |
Returns the creation date of the specified named object.
getCreationDate(java.lang.String objectName, NamedObject objectType) returns java.util.Date |
getDescription |
Returns the description of the specified mining object.
getDescription(java.lang.String objectName, NamedObject objectType) returns java.lang.String |
getObjectNames |
Returns a collection of the names of the objects of the specified type.
getObjectNames(NamedObject objectType) returns java.util.Collection |
getObjectNames |
This is an Oracle JDM extension method that is added in 11.1 to provide a listing of mining object names across schemas or within a schema. It provides various optional method arguments that can be used to get a filtered list of arguments.
getObjectNames( java.lang.String schemaPattern, NamedObject objectType, java.lang.String objectNamePattern, javax.datamining.Enum minorType_1, javax.datamining.Enum minorType_2 ): returns java.sql.ResultSet See Example 7-1. |
Example 7-1 Oracle JDM Extension Method getObjectNames
This example illustrates the getObjectNames
method.
To list the names of classification test metrics computed by the user SCOTT
, specify:
schemaPattern
as "SCOTT"objectType
as NamedObject.testMetrics
objectPattern
as nullminorType_1
as MiningFunction.classification
minorType_2
as nullIrrespective of the type of filters specified, the getObjectNames
method returns the java.sql.ResultSet
object with the following columns.
Column Name | Data Type | Description |
---|---|---|
SCHEMA_NAME |
String | Name of the schema (can be null) |
TYPE |
String | Type of the mining object |
NAME |
String | Name of the mining object |
MINOR_TYPE_1 |
String | Mining objects can have minor/sub types. For example, model objects can have function and algorithm as minor types. |
MINOR_TYPE_2 |
String | Mining objects can have more than one minor type. If they have a second minor type, then this column is used. |
CREATION_DATE |
Timestamp | Date when this object was created |
DESCRIPTION |
String | Description of the object |
The Connection
object provides methods for retrieving mining objects and saving them in the DME. Persistent objects are stored as database objects. Transient objects are stored in memory by the Connection
object.
Method | Description |
---|---|
saveObject |
Saves the named object in the metadata repository associated with the connection.
saveObject(java.lang.String name, MiningObject object, boolean replace) |
retrieveObject |
Retrieves a copy of the specified named object from the metadata repository associated with the connection.
retrieveObject(java.lang.String objectIdentifier) returns MiningObject |
retrieveObject |
Retrieves a copy of the named object from the metadata repository associated with the connection.
retrieveObject(java.lang.String name, NamedObject objectType) returns MiningObject |
retrieveObjects |
Returns a collection of mining objects of the given type that were created within the specified time interval (from createAfter to createBefore).
(java.util.Date createdAfter, java.util.Date createdBefore, NamedObject objectType): returns java.util.Collection |
retrieveObjects |
Returns a collection of mining objects of the specified type that were created within the specified time interval (from createAfter to createBefore)
retrieveObjects(java.util.Date createdAfter, java.util.Date createdBefore, NamedObject objectType, Enum minorType): returns java.util.Collection |
The Connection
object provides an execute
method, which can execute mining tasks either asynchronously or synchronously. The DME uses the database Scheduler to execute mining tasks, which are stored in the user's schema as Scheduler jobs. The following methods are used to execute the tasks.
Task Execution | execute method syntax |
---|---|
asynchronous |
execute(java.lang.String taskName) returns ExecutionHandle |
synchronous |
execute(Task task,java.lang.Long timeout)) returns ExecutionHandleTypically to be used with single record scoring, but it may be used in other contexts as well. |
See Also:
Oracle Database Administrator's Guide for information about the Oracle Database Scheduler.
The Connection
interface provides a ConnectionMetaData
and supportsCapability
retrieval methods. This feature is useful for applications to know more about the DME at runtime. The following methods are used for retrieving this information from the connection.
Method | Description |
---|---|
getMetaData |
Returns information about the underlying DME instance represented through an active connection. ConnectionMetaData provides version information for the JDM implementation and Oracle Database.
getMetaData() returns ConnectionMetaData |
getSupportedFunctions |
Returns an array of mining functions that are supported by the implementation.
getSupportedFunctions() returns MiningFunction[] |
getSupportedAlgorithms |
Returns an array of mining algorithms that are supported by the specified mining function.
getSupportedAlgorithms(MiningFunction function) returns MiningAlgorithm[] |
supportsCapability |
Returns true if the specified combination of mining capabilities is supported. If an algorithm is not specified, returns true if the specified function is supported.
supportsCapability(MiningFunction function, MiningAlgorithm algorithm, MiningTask taskType) returns boolean |
The ConnectionMetaData
object provides methods for retrieving JDM standard version information and Oracle version information.
Method | Description |
---|---|
getVersion |
Returns the version of the JDM Standard API. It must be "JDM 1.0" for the first release of JDM.
getVersion() returns String |
getMajorVersion |
Returns the major version number. For the first release of JDM, this is "1".
getMajorVersion() returns int |
getMinorVersion |
Returns the minor version number. For the first release of JDM, this is "0".
getMinorVersion() returns int |
getProviderName |
Returns the provider name as "Oracle Corporation".
getProviderName() returns String |
getProviderVersion |
Returns the version of the Oracle Database that shipped the Oracle Data Mining Java API jar file.
getProviderVersion() returns String |
getProviderDMEVersion |
Returns the DME version of the provider
getProviderDMEVersion() returns String |
This section briefly describes the OJDM design principles to familiarize the developers with the API. The JDM standard uses the factory method pattern as the core design pattern for the API. User can instantiate a JDM object using its factory. This enables JDM vendors like Oracle to implement a vendor neutral API. OJDM follows the same factory method pattern for its extensions. javax.datamining
is the base package for the JDM standard defined classes and oracle.dmt.jdm is the base package for the Oracle extensions to the JDM standard
The JDM standard organizes its packages by the mining functions and mining algorithms. For example, javax.datamining.supervised package has all the supervised functions related classes and sub-packages java.datamining.supervised.classification and java.datamining.supervised.regression. Each function sub-package has the classes related to that function. Similarly, javax.datamining.algorithm is the base package for all algorithms and each algorithm has its sub-package under this package, for example, javax.datamining.algorithm.naivebayes is the sub-package for Naïve Bayes algorithm related classes. OJDM follows a similar package structure for the extensions, for example, feature extraction is a non-JDM standard function supported by the OJDM, here oracle.dmt.jdm.featureextraction is the package for this function and oracle.dmt.jdm.algorithm.nmf package for the Non-Negative Matrix Factorization algorithm used for feature extraction.
The JDM standard has some core packages that define common classes and packages for tasks, model details, rules and statistics. For more details refer to the JDM javadoc. The class diagram in Figure 7-1 illustrates the inheritance hierarchy of the named mining objects that are discussed in Chapter 2. In the subsequent sections more class diagrams are used to illustrate other OJDM objects. Note that the classes/interfaces shown in gray color are oracle JDM extension interfaces/classes. In Figure 7-1, oracle.dmt.jdm.transform.OraTransformationSequence is an Oracle extension to the mining objects defined in JDM 1.1 standard.
Figure 7-1 JDM Named Objects Class Diagram
The JDM standard defines physical and logical data objects to describe the mining attribute characteristics of the data as well as statistical computations for describing the data.
The javax.datamining.data
package contains all the data-related classes. The class diagram in Figure 7-2 illustrates the class relationships of the data objects supported by the Oracle Data Mining Java API.
Figure 7-2 Data Objects in Oracle Data Mining Java API
The PhysicalDataSet
object is used to specify the name and location of the dataset used for mining operations. For example, to represent a model build input dataset MINING_DATA_BUILD_V
in a DMUSER
schema account, PhysicalDataSet
object is created with the data URI DMUSER.MINING_DATA_BUILD_V
. The schema name prefix is optional when accessing the datasets in the same user account.
Note that in the class diagram in Figure 7-2 a PhysicalDataSet
can have PhysicalAttribute
objects. A PhysicalAttribute
represents physical characteristics of the columns of the input dataset; optionally physical attributes can specify the roles of the column. For example, in the MINING_DATA_BUILD_V
dataset, CUST_ID
uniquely identifies each case used for mining. So the role of the CUST_ID
column is specified as case id.
Example 7-2 illustrates the code sample that creates the PhysicalDataSet
object. Note that the PhysicalDataSet
object is saved with the name JDM_BUILD_PDS
that can be specified as input to model build that we discuss later in "Build Settings"
The PhysicalDataRecord
object shown in Figure 7-2 is used to specify a single record of a dataset. It is used for single record apply that we will discuss in a later section of this Chapter. The SignatureAttribute
is used to specify the model signature of a mining model that will be discussed in later section of this Chapter.In OJDM attribute data types are used to implicitly specify the mining attribute types. For example, all VARCHAR2
columns are treated as categorical and all NUMBER
columns are treated as numerical. So there is no need to specify logical data details in OJDM. However, to rename attributes of a column, a user can specify the embedded transformations that are discussed in the next section.
Example 7-2 Creation of a Physical Dataset
//Create PhysicalDataSetFactory PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory)m_dmeConn.getFactory("javax.datamining.data.PhysicalDataSet"); //Create a PhysicalDataSet object PhysicalDataSet buildData = pdsFactory.create("DMUSER.MINING_DATA_BUILD_V", false); //Create PhysicalAttributeFactory PhysicalAttributeFactory paFactory = (PhysicalAttributeFactory)m_dmeConn.getFactory("javax.datamining.data.PhysicalAttribute"); //Create PhysicalAttribute object PhysicalAttribute pAttr = paFactory.create( "cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); //Add the attribute to the PhysicalDataSet object buildData.addAtribute(pAttr); //Save the physical data set object dmeConn.saveObject("JDM_BUILD_PDS", buildData, true);
In the Oracle Data Mining Java API, the BuildSettings
object is saved as a table in the database. The settings table is compatible with the DBMS_DATA_MINING.CREATE_MODEL
procedure. The name of the settings table must be unique in the user's schema. Figure 7-3 illustrates the build settings class hierarchy.
The code in Example 7-3 illustrates the creation and storing of a classification settings object with a tree algorithm.
Example 7-3 Creation of a Classification Settings with Decision Tree Algorithm
//Create a classification settings factory ClassificationSettingsFactory clasFactory = (ClassificationSettingsFactory)dmeConn.getFactory ("javax.datamining.supervised.classification.ClassificationSettings"); //Create a ClassificationSettings object ClassificationSettings clas = clasFactory.create(); //Set target attribute name clas.setTargetAttributeName("AFFINITY_CARD"); //Create a TreeSettingsFactory TreeSettingsFactory treeFactory = (TreeSettingsFactory)dmeConn.getFactory ("javax.datamining.algorithm.tree.TreeSettings"); //Create TreeSettings instance TreeSettings treeAlgo = treeFactory.create(); treeAlgo.setBuildHomogeneityMetric(TreeHomogeneityMetric.entropy); treeAlgo.setMaxDepth(10); treeAlgo.setMinNodeSize( 10, SizeUnit.count ); //Set algorithm settings in the classification settings clas.setAlgorithmSettings(treeAlgo); //Save the build settings object in the database dmeConn.saveObject("JDM_TREE_CLAS", clas, true);
In 11.1, all mining algorithms support automated data preparations (ADP). By default for decision tree and GLM algorithms, ADP is enabled. For other algorithms it is disabled by default for backward compatibility reasons. To enable ADP explicitly for the algorithms that do not enable by default, invoke the following function, by specifying the useAutomatedDataPreparations
boolean flag as true.
OraBuildSettings.useAutomatedDataPreparations (boolean useAutomatedDataPreparations)
For more information about automatic data preparation, see Oracle Data Mining Concepts.
OJDM uses the DBMS_SCHEDULER
infrastructure for executing mining tasks either synchronously or asynchronously in the database. A mining task is saved as a DBMS_SCHEDULER
job in the user schema and is set to DISABLED
state. When user calls the execute method in DME Connection, the job state will be changed to ENABLED
and scheduler starts executing the mining task by creating a new database session for asynchronous executions. For synchronous executions scheduler uses the same database session opened by the DME connection.
The class diagram in Figure 7-4 illustrates the different types of tasks that are available in OJDM and its class hierarchy. Subsequent sections will discuss more about the individual tasks shown in this diagram
DBMS_SCHEDULER
provides additional scheduling and resource management features. You can extend the capabilities of Oracle Data Mining tasks by using the Scheduler infrastructure.
See Also:
Oracle Database Administrator's Guide for information about the database scheduler.In Oracle Data Mining 11.1, the task infrastructure supports applications to specify dependent tasks through the API and deploy the execution of the tasks to the database server. The server executes complete workflow of tasks specified through the API and once deployed it does not depend on client. Client can monitor the execution process using OJDM API. For example, typically after the completion of data preparations, model is built and then tested and applied. Both test and apply can be done in parallel after model build is successful.
To build a task flow invoke the method OraTask.addDependency
(String parentTaskName
). For example, the code in Example 7-4 illustrates how to setup a mining task workflow, where first run the transformations task and then model build task. After successful completion of the build task run apply and test tasks in parallel.
Example 7-4 Building Mining Task Workflows
//Task objects declarations private TransformationTask xformTask; private BuildTask buildTask; private TestTask testTask; private DataSetApplyTask applyTask; //Creation of the tasks and task input objects are skipped for this example … //Save the first task in the workflow (i.e., transformations task) dmeConn.saveObject("transformationTask", xformTask, true); //Specify dependencies before saving of the tasks buildTask.addDependency("transformationTask"); dmeConn.saveObject("modelBuildTask", buildTask, true); testTask.addDependency("modelBuildTask"); dmeConn.saveObject("modelTestTask", testTask, true); applyTask.addDependency("modelBuildTask"); dmeConn.saveObject("modelApplyTask", applyTask, true); //Execute the first task in the workflow to initiate the execution of the whole workflow dmeConn.execute("transformationTask");
The javax.datamining.task.BuildTask
class is used to build a mining model. Prior to building a model, a PhysicalDataSet
object and a BuildSettings
object must be saved.
Example 7-5 illustrates the building of a tree model using the PhysicalDataSet
described in "Describing the Mining Data" and the BuildSettings
described in "Build Settings".
Example 7-5 Building a Model
//Create BuildTaskFactory BuildTaskFactory buildTaskFactory = dmeConn.getFactory("javax.datamining.task.BuildTask"); //Create BuildTask object BuildTask buildTask = buildTaskFactory.create ( "JDM_BUILD_PDS","JDM_TREE_CLAS","JDM_TREE_MODEL"); //Save BuildTask object dmeConn.saveObject("JDM_BUILD_TASK", buildTask, true); //Execute build task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_BUILD_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
After building a model using the BuildTask
, a model object is persisted in the database. It can be retrieved to explore the model details.
The class diagram in Figure 7-5 illustrates the different types of model objects and model details objects supported by the Oracle Data Mining Java API.
Figure 7-5 Model and Model Detail Class Diagram
Example 7-6 illustrates the retrieval of the classification tree model built in "Building a Mining Model" and its TreeModelDetail
.
Example 7-6 Retrieve Model Details
//Retrieve classification model from the DME ClassificationModel treeModel = (ClassificationModel)dmeConn.retrieveObject ( "JDM_TREE_MODEL", NamedObject.model); //Retrieve tree model detail from the model TreeModelDetail treeDetail = (TreeModelDetail)treeModel.getModelDetail(); //Get the root node TreeNode rootNode = treeDetail.getRootNode(); //Get child nodes TreeNode[] childNodes = rootNode.getChildren(); //Get details of the first child node int nodeId = childNodes[0].getIdentifier(); long caseCount = childNodes[0].getCaseCount(); Object prediction = childNodes[0].getPrediction();
Once a supervised model has been built, it can be evaluated using a test operation. The JDM standard defines two types of test operations: one that takes the mining model as input, and the other that takes the apply output table with the actual and predicted value columns.
javax.datamining.supervised.TestTask
is the base class for the model- based test tasks, and javax.datamining.supervised.TestMetricsTask
is the base class for the apply output table-based test tasks.
The test operation creates and persists a test metrics object in the DME. For classification model testing, either of the following can be used.
javax.datamining.supervised.classification.ClassificationTestTask javax.datamining.supervised.classification.ClassificationTestMetricsTask
Both of these tasks create the named object javax.datamining.supervised.classification.ClassificationTestMetrics
and store it as a table in the user's schema.
The classification test metrics components, confusion matrix, lift results, and ROC associated with the ClassificationTestMetrics
object are stored in separate tables whose names are the ClassificationTestMetrics
object name followed by the suffix _CFM
, _LFT
, or _ROC
. These tables can be used to display test results in dashboards, BI platforms such as Oracle BI, Business Objects, and so on.
Similarly for regression model testing, either of the following can be used:
javax.datamining.supervised.regression.RegressionTestTask javax.datamining.supervised.regression.RegressionTestMtericsTask
Both these tasks create a named object javax.datamining.supervised.regression.RegressionTestMetrics
and store it as a table in the user's schema.
The class diagram in Figure 7-6 illustrates the test metrics class hierarchy. It refers to Figure 7-4, "Task Class Diagram" for the class hierarchy of test tasks.
Example 7-7 illustrates the test of a tree model JDM_TREE_MODEL
using the ClassificationTestTask
on the dataset MINING_DATA_TEST_V
.
Example 7-7 Testing a Model
//Create & save PhysicalDataSpecification PhysicalDataSet testData = m_pdsFactory.create( "MINING_DATA_TEST_V", false ); PhysicalAttribute pa = m_paFactory.create("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); testData.addAttribute( pa ); m_dmeConn.saveObject( "JDM_TEST_PDS", testData, true ); //Create ClassificationTestTaskFactory ClassificationTestTaskFactory testTaskFactory = (ClassificationTestTaskFactory)dmeConn.getFactory( "javax.datamining.supervised.classification.ClassificationTestTask"); //Create, store & execute Test Task ClassificationTestTask testTask = testTaskFactory.create( "JDM_TEST_PDS", "JDM_TREE_MODEL", "JDM_TREE_TESTMETRICS" ); testTask.setNumberOfLiftQuantiles(10); testTask.setPositiveTargetValue(new Integer(1)); //Save TestTask object dmeConn.saveObject("JDM_TEST_TASK", testTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_TEST_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE); //Explore the test metrics after successful completion of the task if(ExecutionState.success.equals(execStatus.getState())) { //Retrieve the test metrics object ClassificationTestMetrics testMetrics = (ClassificationTestMetrics)dmeConn.getObject("JDM_TREE_TESTMETRICS"); //Retrieve confusion matrix and accuracy Double accuracy = testMetrics.getAccuracy(); ConfusionMatrix cfm = testMetrics.getConfusionMatrix(); //Retrieve lift Lift lift = testMetrics.getLift(); //Retrieve ROC ReceiverOperatingCharacterics roc = testMetrics.getROC(); }
In Example 7-7, a test metrics object is stored as a table called JDM_TREE_TESTMETRICS
. The confusion matrix is stored in the JDM_TREE_TESTMETRICS_CFM
table, lift is stored in the JDB_TREE_TESTMETRICS_LFT
table, and ROC
is stored in the JDM_TREE_TESTMETRICS_ROC
table. You can use BI tools like Oracle Discoverer to query these tables and create reports.
All supervised models can be applied to data to find the prediction. Some of the unsupervised models, such as clustering and feature extraction, support the apply operation to find the cluster id or feature id for new records.
The JDM standard API provides an ApplySettings
object to specify the type of output for the scored results. javax.datamining.task.apply.ApplySettings
is the base class for all apply settings. In the Oracle Data Mining Java API, the ApplySettings
object is transient; it is stored in the Connection
context, not in the database.
The class diagram in Figure 7-7 illustrates the class hierarchy of the apply settings available in the Oracle Data Mining Java API.
In the Oracle Data Mining Java API, default apply settings produce the apply output table in fixed format. The list in Table 7-1 illustrates the default output formats for different functions.
Table 7-1 Default Output Formats for Different Functions
Mining Function | ||||
---|---|---|---|---|
Classification without Cost |
Case ID |
Prediction |
Probability |
|
Classification with Cost |
Case ID |
Prediction |
Probability |
Cost |
Regression |
Case ID |
Prediction |
||
Feature extraction |
Case ID |
Feature ID |
Value |
All types of apply settings support source and destination attribute mappings. For example, if the original apply table has customer name and age columns that need to be carried forward to the apply output table, it can be done by specifying the source destination mappings in the apply settings.
In the Oracle Data Mining Java API, classification apply settings support map by rank, top prediction, map by category, and map all predictions. Regression apply settings support map prediction value. Clustering apply settings support map by rank, map by cluster id, map top cluster, and map all clusters. Feature extraction apply settings support map by rank, map by feature id, map top feature, and map all features.
Example 7-8 illustrates the applying of a tree model JDM_TREE_MODEL
using ClassificationApplyTask
on the dataset MINING_DATA_APPLY_V
.
Example 7-8 Applying a Model
//Create & save PhysicalDataSpecification PhysicalDataSet applyData = m_pdsFactory.create( "MINING_DATA_APPLY_V", false ); PhysicalAttribute pa = m_paFactory.create("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); applyData.addAttribute( pa ); m_dmeConn.saveObject( "JDM_APPLY_PDS", applyData, true ); //Create ClassificationApplySettingsFactory ClassificationApplySettingsFactory applySettingsFactory = (ClassificationApplySettingsFactory)dmeConn.getFactory( "javax.datamining.supervised.classification. ClassificationApplySettings"); //Create & save ClassificationApplySettings ClassificationApplySettings clasAS = applySettingsFactory.create(); m_dmeConn.saveObject( "JDM_APPLY_SETTINGS", clasAS, true); //Create DataSetApplyTaskFactory DataSetApplyTaskFactory applyTaskFactory = (DataSetApplyTaskFactory)dmeConn.getFactory( "javax.datamining.task.apply.DataSetApplyTask"); //Create, store & execute apply Task DataSetApplyTask applyTask = m_dsApplyFactory.create( " JDM_APPLY_PDS ", "JDM_TREE_MODEL", " JDM_APPLY_SETTINGS ", "JDM_APPLY_OUTPUT_TABLE"); //Save ApplyTask object dmeConn.saveObject("JDM_APPLY_TASK", applyTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_APPLY_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
The class javax.datamining.supervised.classification.CostMatrix
is used to represent the costs of the false positive and false negative predictions. It is used for classification problems to specify the costs associated with the false predictions.
In the Oracle Data Mining Java API, cost matrix is supported in apply and test operations for all classification models. For the decision tree algorithm, a cost matrix can be specified at build time. For more information about cost matrix, see Oracle Data Mining Concepts.
Example 7-9 illustrates how to create a cost matrix object where the target has two classes: YES
(1) and NO
(0). Suppose a positive (YES
) response to the promotion generates $2 and the cost of the promotion is $1. Then the cost of misclassifying a positive responder is $2. The cost of misclassifying a non-responder is $1.
Example 7-9 Creating a Cost Matrix
//Create category set factory & cost matrix factory CategorySetFactory catSetFactory = (CategorySetFactory)m_dmeConn.getFactory( "javax.datamining.data.CategorySet" ); CostMatrixFactory costMatrixFactory = (CostMatrixFactory)m_dmeConn.getFactory( "javax.datamining.supervised.classification.CostMatrix"); //Create categorySet CategorySet catSet = m_catSetFactory.create(AttributeDataType.integerType); //Add category values catSet.addCategory(new Integer(0), CategoryProperty.valid); catSet.addCategory(new Integer(1), CategoryProperty.valid); //create cost matrix CostMatrix costMatrix = m_costMatrixFactory.create(catSet); costMatrix.setCellValue(new Integer(0), new Integer(0), 0); costMatrix.setCellValue (new Integer(1), new Integer(1), 0); costMatrix.setCellValue (new Integer(0), new Integer(1), 2); costMatrix.setCellValue (new Integer(1), new Integer(0), 1); //Save cost matrix in the DME dmeConn.saveObject("JDM_COST_MATRIX", costMatrix);
Prior probabilities are used for classification problems if the actual data has a different distribution for target values than the data provided for the model build. A user can specify the prior probabilities in the classification function settings, using setPriorProbabilitiesMap
. For more information about prior probabilities, see Oracle Data Mining Concepts.
Note:
Priors are not supported with decision trees.Example 7-10 illustrates how to create a PriorProbabilities
object, when the target has two classes: YES
(1) and NO
(0), and probability of YES
is 0.05, probability of NO
is 0.95.
In 11.1, OJDM supports embedding transformations with the model metadata. When the transformations are embedded with the model, they are implicitly applied to apply and test datasets. For example, user can embed a transformation that recodes the response attributes value representation from 1/0 to Yes/No; model uses this transformation when applying the model to the new data.
Users can specify these transformations as SQL expressions or can use the OJDM transformations discussed in Section 2.13 and build a transformation sequence.
In this section, the first example discusses the simple expression transformation using the oracle.dmt.jdm.transform.OraExpressionTransform class specified as input for model build.
The second example illustrates how to build a sequence of complex transformations and persist them as a transformation sequence and embed them into the model.
Using OraTransformationFactory
user can create transformation objects such as OraTransformationSequence
, OraExpressionTransform
, OraBinningTransform
, OraNormalizationTransform
and OraClippingTransform
.
In Example 7-11, we create an expression transform that defines a simple log transformation for age attribute, recode transformation for affinity_card attribute and explicit exclusion of original age attribute from the model build. The code illustrates using OJDM API how one can embed these simple SQL expression transformations with the model.
Example 7-11 Simple Expression Transformation
//Create OraTransformationFactory OraTransformationFactory m_xformFactory = (OraTransformationFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.transform.OraTransformation" ); //Create OraExpressionTransform from the transformation factory OraExpressionTransform exprXform = m_xformFactory.createExpressionTransform(); //1) Specify log transformation of age attribute and create a new attribute call log_age // that can be used for mining exprXform.addAttributeExpression("log_age", //Expression output attribute name "log(10, age) as log_age", //Expression "power(10, log_age)" //Reverse expression ); //2) Recode 1/0 values of the affinity card attribute with the yes/no values and replace // existing attribute with the new recoded attribute exprXform.addAttributeExpression("affinity_card", //Expression output attribute name "CASE WHEN AFFINITY_CARD = 1 THEN 'YES' ELSE 'NO' END ", null //No explicit reverse expression ); //3) Exclude age attribute from mining exprXform.addAttributeExpression("age", //Expression output attribute name null, //Specify expression as null //to exclude attribute from mining null ); //Create transformation sequence object using expression transformation OraTransformationSequence xformSeq = m_xformFactory.createTransformationSequence( "MINING_DATA_BUILD_V", //Input table exprXform, //Expressions to be defined null //Output transformed view is specified as null as we are trying to //embed the transformations to the model ); //Save transformation sequence object m_dmeConn.saveObject("simpleExprXForm_jdm", xformSeq, true); //Create build Task with transformation sequence BuildTask buildTask = m_buildFactory.create( "inputPDS", //Build data specification "inputBuildSettings", //Mining function settings name "outputModel" //Mining model name ); //Specify transformation sequence as one of the input to enable embedding //of the transformations defined in the sequence with the model //In this example only expression transformations are specified ((OraBuildTask)buildTask).setTransformationSequenceName("simpleExprXForm_jdm"); //Save and execute the build task ... //After successful model build specified transformations are embedded with the model //User can retrieve the transformation details that are embedded with the model by calling //the following function in OraModel OraExpressionTransform modelExmbeededTransforms = ((OraModel)model). GetModelTransformations();
In the previous example, we explored how to embed per attribute simple SQL expression transformations that can be used for trivial business transformations. In this section we will detail, how a complex transformation sequence can be build using OJDM and embed these with the model.
OJDM 10.2 provides typical mining related individual transformations such as binning, normalization and outlier treatment (clipping). In 10.2 users have to maintain these transformations outside the modeling process and do the consistent transformations for the build, apply and test datasets outside the mining operations. This requires significant additional coding and maintenance of the transformation related objects by the end-user applications.
With the model embedded transformations capability, users can embed even complex transformation sequences such as first add business transformations and new attributes using the expression transforms (as discussed in the previous example), second treat outliers with the user specified clipping definitions and lastly normalize the data with the user specified normalization technique.
In OJDM new OraTransformationSequence
object supports ability to specify sequence of transformations and convert these transformations into per attribute SQL expressions to embed them to the model. Example 7-12 illustrates using OJDM API how one can build a transformation sequence which uses the expression transform created in the previous example and extends it with the outlier and normalization data mining transformations and embed these complex transformation sequence with the model.
Example 7-12 Complex Sequence Transformations
//Create a list of transformations to be performed on the input mining data List xformList = new ArrayList(); xfromList.add( exprXform ); //Expression transformation xformList.add( clippingXform ); //Clipping transformation to treat outliers xformList.add( normalizeXform );//Normalization transformation //Create transformation sequence object using list of transformation OraTransformationSequence xformSeq = m_xformFactory.createTransformationSequence( "MINING_DATA_BUILD_V", //Input table xformList, //List of transformations null //Output transformed view is specified as null as we are trying to //embed the transformations to the model ); //Save transformation sequence object m_dmeConn.saveObject("complexXFormSeq_jdm", xformSeq, true); //Create transformation task with the transformation sequence OraTransformationTaskFactory m_xformTaskFactory = (OraTransformationTaskFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.task.OraTransformationTask"); OraTransformationTask xformTask = m_xformTaskFactory .create( "complexXFormSeq_jdm",, false //boolean flag useTransformDefinitionTables ); //Save and execute transformation task to populate transformation sequence with the //SQL expressions necessary before embedding them to the build task . . . . //Create build Task with transformation sequence . . . . ((OraBuildTask)buildTask).setTransformationSequenceName("complexXFormSeq_jdm "); //Save and execute the build task with the embedded transformations ...
Note that in both the examples we specified the output view of the transformation sequence as null to avoid creation of the database view that includes the transformations in the sequence. However, one can specify the view name to create a database view with the transformations and use this view as input to the model build to maintain the transformations outside the modeling process. OJDM API provides flexibility for the applications to choose the approach that best fits the need.
OJDM has oracle.dmt.jdm.task.OraPredictTask
, OraExplainTask
and OraProfileTask
for data mining novice users to get predictions, to explain attributes importance and to discover profiles from the data.
Using OraPredictTask
predictions are computed by just specifying the data location and the target column. This task learns from the known values in the target column and other columns in the input table and fills the unknown values in the target column with the predictions. This task hides all the data mining process done inside the database and produces the predictions and accuracy of the predictions.
Using OraExplainTask
attributes ranking/importance with respect to an explain column. By just specifying the data location and explain column this task produces the attribute ranking table.
Using OraProfileTask
profiles are discovered from the data for a given target attribute. For example, to find the profiles of the customers who respond to a product promotion, give the customers dataset with the customer attributes and promotion response attribute to the profile task. Profile task outputs a table with the profile definitions that applications can display to the users.
Both the tasks do automated data preparation where needed.
Example 7-13 illustrates how to execute predict, explain, and profile tasks.
Example 7-13 Predictive Analytics
//Get Predictive Analytics Task Factory object OraPredictiveAnalyticsTaskFactory m_paFactory = (OraPredictiveAnalyticsTaskFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.task.OraPredictiveAnalyticsTask"); //Predict task //Create predict task object OraPredictTask predictTask = m_paFactory.createPredictTask ( "MINING_DATA_BUILD_V", //Input table "cust_id", //Case id column "affinity_card", //target column "JDM_PREDICTION_RESULTS"); //prediction output table //Save predict task object dmeConn.saveObject("JDM_PREDICT_TASK", predictTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle1 = dmeConn.execute("JDM_PREDICT_TASK"); //Wait for completion of the task ExecutionStatus execStatus1 = execHandle1.waitForCompletion(Integer.MAX_VALUE); //Explain task //Create explain task object OraExplainTask explainTask = m_paFactory.createExplainTask ( "MINING_DATA_BUILD_V", //Input table "affinity_card", //explain column "JDM_EXPLAIN_RESULTS"); //explain output table //Save predict task object dmeConn.saveObject("JDM_EXPLAIN_TASK", explainTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle2 = dmeConn.execute("JDM_ EXPLAIN_TASK"); //Wait for completion of the task ExecutionStatus execStatus2 = execHandle2.waitForCompletion(Integer.MAX_VALUE); //Profile task //Create profile task OraProfileTask profileTask = m_paFactory.createProfileTask( "MINING_DATA_BUILD_V", //Input table "affinity_card", //Target column "JDM_PROFILE_RESULTS); //Profile output table //Save predict task object dmeConn.saveObject("JDM_PROFILE_TASK", profileTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle3 = dmeConn.execute("JDM_PROFILE_TASK"); //Wait for completion of the task ExecutionStatus execStatus3 = execHandle3.waitForCompletion(Integer.MAX_VALUE);
In the Oracle Data Mining Java API, data must be prepared before building, applying, or testing a model. The oracle.dmt.jdm.task.OraTransformationTask
class supports common transformations used in data mining: binning, normalization, clipping, and text transformations. For more information about transformations, see Oracle Data Mining Concepts.
The class diagram in Figure 7-8 illustrates the OraTransformationTask
and its relationship with other objects.
Figure 7-8 OraTransformationTask Class Diagram
Binning is the process of grouping related values together, thus reducing the number of distinct values for an attribute. Having fewer distinct values typically leads to a more compact model and one that builds faster, but it can also lead to some loss in accuracy.
The class diagram in Figure 7-9 illustrates the binning transformation classes.
Figure 7-9 OraBinningTransformation Class Diagram
Here, OraBinningTransformation
contains all the settings required for binning. The Oracle Data Mining Java API supports top-n, custom binning for categorical attributes, and equi-width, quantile and custom binning for numerical attributes. After running the binning transformations, it creates a transformed table and bin boundary tables in the user's schema. The user can specify the bin boundary table names, or the system will generate the names for the bin boundary tables. This facilitates the reusing of the bin boundary tables that are created for binning build data for apply and test data.
The following code illustrates the binning operation on the view MINING_BUILD_DATA_V
//Create binning transformation instance OraBinningTransformFactory binXformFactory = (OraBinningTransformFactory)dmeConn.getFactory( "oracle.dmt.jdm.transform.binning.OraBinningTransform"); OraBinningTransform binTransform = m_binXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "BINNED_DATA_BUILD_V", // name of the transformation result true); // result of the transformation is a view // Specify the number of numeric bins binTransform.setNumberOfBinsForNumerical(10); // Specify the number of categoric bins binTransform.setNumberOfBinsForCategorical(8); // Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; binTransform.setExcludeColumnList(excludedList); // Specify the type of numeric binning: equal-width or quantile ( default is quantile ) binTransform.setNumericalBinningType(binningType); // Specify the type of categorical binning as Top-N: by default it is none binTransform.setCategoricalBinningType(OraCategoricalBinningType.top_n); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(binTransform); //Save transformation task object dmeConn.saveObject("JDM_BINNING_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_ BINNING _TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Normalizing converts individual attribute values in such a way that all attribute values lie in the same range. Normally, values are converted to be in the range 0.0 to 1.0 or the range -1 to +1. Normalization ensures that attributes do not receive artificial weighting caused by differences in the ranges that they span.
The class diagram in Figure 7-10 illustrates the normalization transformation classes.
Figure 7-10 OraNormalizeTransformation Class Diagram
Here, OraNormalizeTransformation
contains all the settings required for normalization. The Oracle Data Mining Java API supports z-Score, min-max, and linear scale normalizations. Normalization is required for SVM, NMF, and k-Means algorithms.
The following code illustrates normalization on the view MINING_BUILD_DATA_V
.
//Create OraNormalizationFactory OraNormalizeTransformFactory normalizeXformFactory = (OraNormalizeTransformFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.transform.normalize.OraNormalizeTransform"); //Create OraNormalization OraNormalizeTransform normalizeTransform = m_normalizeXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "NORMALIZED_DATA_BUILD_V", // name of the transformation result true, // result of the transformation is a view OraNormalizeType.z_Score, //Normalize type new Integer(6) ); //Rounding number // Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; normalizeTransform.setExcludeColumnList(excludedList); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(normalizeTransform); //Save transformation task object dmeConn.saveObject("JDM_NORMALIZE_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_NORMALIZE_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Some computations on attribute values can be significantly affected by extreme values. One approach to achieving a more robust computation is to either winsorize or trim the data using clipping transformations.
Winsorizing involves setting the tail values of a particular attribute to some specified value. For example, for a 90% winsorization, the bottom 5% are set equal to the minimum value in the 6th percentile, while the upper 5% are set equal to the value corresponding to the maximum value in the 95th percentile.
Trimming "removes" the tails in the sense that trimmed values are ignored in further values. This is achieved by setting the tails to NULL.
The class diagram in Figure 7-11 illustrates the clipping transformation classes.
Figure 7-11 OraClippingTransformation Class Diagram
Here, OraClippingTransformation
contains all the settings required for clipping. The Oracle Data Mining Java API supports winsorize and trim types of clipping.
The following code illustrates clipping on the view MINING_BUILD_DATA_V
.
//Create OraClippingTransformFactory OraClippingTransformFactory clipXformFactory = (OraClippingTransformFactory)dmeConn.getFactory( "oracle.dmt.jdm.transform.clipping.OraClippingTransform"); //Create OraClippingTransform OraClippingTransform clipTransform = clipXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "WINSORISED_DATA_BUILD_V", // name of the transformation result true );// result of the transformation is a view //Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; clipTransform.setExcludeColumnList(excludedList); //Specify the type of clipping clipTransform.setClippingType(OraClippingType.winsorize); // Specify the tail fraction as 3% of values on both ends clipTransform.setTailFraction(0.03); //Create and save transformation task OraTransformationTask xformTask = xformTaskFactory.create(clipTransform); //Save transformation task object dmeConn.saveObject("JDM_CLIPPING_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_CLIPPING_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Text columns need to be transformed to nested table structure to do the mining on text columns. This transformation converts the text columns to nested table columns. A features table is created by text transformation. A model build text data column features table must be used for apply and test tasks to get the correct results.
The class diagram in Figure 7-12 illustrates the text transformation classes.
Figure 7-12 Text Transformation Class Diagram
Here, OraTextTransformation
is used to specify the text columns and the feature tables associated with the text columns.
The following code illustrates clipping on the table MINING_BUILD_TEXT
.
//Create OraTextTransformFactory OraTextTransformFactory textXformFactory = dmeConn.getFactory( "oracle.dmt.jdm.transform.text.OraTextTransform"); //Create OraTextTransform OraTextTransform txtXform = (OraTextTransformImpl)textXformFactory.create( "MINING_BUILD_TEXT", // name of the input data set "NESTED_TABLE_BUILD_TEXT ", // name of the transformation result "CUST_ID", //Case id column new String[] { "COMMENTS" } ); //Text column names ); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(txtXform); //Save transformation task object dmeConn.saveObject("JDM_TEXTXFORM_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_TEXTXFORM_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion (Integer.MAX_VALUE);