Oracle9i Data Mining Administrator's Guide Release 2 (9.2) Part Number A95959-01 |
|
Oracle is a registered trademark, and Oracle9i, SQL*Net, and SQL*Plus are trademarks or registered trademarks of Oracle Corporation. Other names may be trademarks of their respective owners.
Copyright © 2002, Oracle Corporation.
All Rights Reserved.
March 2002
Part No. A95959-01
This document describes how to install the Oracle9i Data Mining (ODM) software and how to perform other administrative functions common to all ODM administration on both UNIX and Windows platforms.
This administrator's guide is intended for anyone planning to install and run Oracle9i Data Mining -- either a database administrator or a system administrator.
This guide contains the following sections:
The documentation set for Oracle9i Data Mining is part of the Oracle9i Database Documentation Library; the ODM document set consists of the following documents:
For detailed information about the ODM API, see the ODM Javadoc in the directory $ORACLE_HOME/dm/doc
on any system where ODM is installed. To prepare the Javadoc for user access, see Section 4.3.
For more information about the Oracle9i database, see:
In this manual, Windows refers to the Windows NT, Windows 2000, and Windows XP operating systems.
The SQL interface to Oracle9i is referred to as SQL. This interface is the Oracle9i implementation of the SQL standard ANSI X3.135-1992, ISO 9075:1992, commonly referred to as the ANSI/ISO SQL standard or SQL92. In examples, an implied carriage return occurs at the end of each line, unless otherwise noted. You must press the Return key at the end of a line of input.
Oracle9i Data Mining (ODM) allows application programmers to perform data mining in the Oracle9i database. All model-building and scoring functions are accessible through a Java-based API. The Oracle9i database provides the infrastructure for application developers to build integrated applications, with complete programmatic control of data mining functions to deliver data mining within the database.
ODM release 2 has many new features; for details, see Oracle9i Data Mining Concepts.
ODM release 2 runs on Real Application Clusters.
This section contains generic ODM requirements, an installation overview, and detailed installation steps.
ODM is an option to Oracle9i Enterprise Edition.
ODM has the following general software requirements:
java.sql
package that is included in JDK 1.2 known as the JDBC 2.0 APIIf you plan to run ODM on a RAC, the ODM tablespace must be at least 250MB. (In a RAC environment, tablespaces are built on a raw device; they are not growable as on UFS.)
This document provides the generic instructions for installing Oracle9i Data Mining.
Before you install ODM, confirm that your system satisfies the software and hardware requirements for Oracle9i Enterprise Edition, as described in the release notes for your platform. You should also ensure that your system contains enough space for the tables that you plan to use during data mining.
There are three common cases for installing ODM:
To install ODM on an Oracle9i Real Application Cluster, see Section 3.3.
If you are installing ODM on a system where the database is not installed, there are two basic ways to install Oracle9i Enterprise Edition:
Oracle provides a seed or preconfigured database that automatically includes features that result in a highly effective and easier to manage database.
Follow these steps to install Oracle9i and ODM:
alter user odm identified bynew_ODM_password
account unlock; alter user odm_mtr identified bynew_MTR_password
account unlock;
where new_ODM_password
is the new password for the ODM account and new_MTR_password
is the new password for the ODM_MTR account.
exec DM_START_MONITOR
After successful installation, all ODM software is located in the $ORACLE_HOME/dm
directory.
Creating a custom database takes longer than creating a seed one, but gives you full control to specify and change all database parameters.
These are the major steps required to install ODM without using a seed database:
Follow these steps to install ODM after you have created your custom database:
exec DM_START_MONITOR
After successful installation, all ODM software is located in the $ORACLE_HOME/dm
directory.
If Oracle9i release 1or earlier is installed on your system, follow these steps:
For information about upgrading the database, see Oracle9i Database Migration. For information about upgrading ODM, see Section 3.4.
The default value of initial parameters in an Oracle preconfigured (seed) database is generally sufficient for running ODM. If you are creating a custom database, the following parameter settings can be used as a general guideline for the database. The ODM-specific parameters must be set exactly as shown. We recommend that you tune other parameters based on your hardware resource capacity, volume of your input datasets, and other characteristics of your system:
########################################### # Cache and I/O ########################################### db_block_size=8192 db_cache_size=67108864 db_file_multiblock_read_count=16 ########################################### # Cursors and Library Cache ########################################### open_cursors=300 ########################################### # Miscellaneous ########################################### compatible=9.2.0.0.0 ########################################### # Pools ########################################### java_pool_size=67108864 large_pool_size=10M shared_pool_size=67108864 ########################################### # Optimizer ########################################### hash_join_enabled=TRUE ########################################### # Processes and Sessions ########################################### processes=150 ########################################### # Sort, Hash Joins, Bitmap Indexes ########################################### sort_area_size=5242880 sort_area_retained_size=2097152 ########################################### # ODM Specific ########################################### aq_tm_processes = 1 job_queue_processes =10 ########################################### # Parallel setting, adjust according to CPU number ########################################### parallel_max_servers = 32 parallel_min_servers = 2
ODM installation on a Real Application Cluster (RAC) is similar to ODM installation on a non-RAC system. If you use Oracle Universal Installer to create the preconfigured database on RAC, ODM will be installed in this database just as it is in a non-RAC environment.
If you choose to create a custom database on your RAC and install ODM there, we recommend that you configure the ODM tablespace with a raw device partition of at least 250 MB.
ODM upgrade is part of the Oracle database upgrade process. If your ODM 9.0.1 release is installed in an Oracle9i release 1 database and your ODM schema name is ODM, the Database Upgrade Assistant (DBUA) will upgrade ODM from 9.0.1 to 9.2.0 at the same time that the database is upgraded. All ODM 9.2.0 related files will be located in the $ORACLE_HOME/dm
directory. When the database upgrade completes, ODM will be at the 9.2.0 release level.
For detailed information about upgrading an Oracle database, see the Oracle9i Database Migration manual.
The ODM upgrade has two parts:
When you upgrade ODM from release 1 to release 2, the ODM schema repository is upgraded from 9.0.1 to.9.2.0. There are many changes in the ODM schema definition for 9.2.0.
After the upgrade, all ODM 9.0.1 models will continue to function in the ODM 9.2.0 environment. Association Rules and Naive Bayes models are fully migrated to ODM 9.2.0.
The database upgrade utility calls ODM upgrade scripts during the database upgrade process; the ODM scripts perform the following actions:
All upgrade scripts are in the $ORACLE_HOME/dm/admin
directory. The top-level upgrade script for ODM is
$ORACLE_HOME/rdbms/admin/odmdbmig.sql
All mining models recorded in schema will be migrated to 9.2.0 format; that is, all Naive Bayes and Association Rules models are upgraded.
You cannot downgrade ODM Java objects at this release.
If you want to preserve existing mining models, you should not deinstall ODM. To install a newer version of ODM, upgrade as described in Oracle9i Database Migration and in Section 3.4.
If you wish to deinstall ODM, you should stop the ODM Task Monitor as described in Section 4.8. Then you can deinstall ODM using OUI, just as you would any other database component.
This section contains information of interest to ODM administrators.
For information about administering an Oracle9i database, see the Oracle9i Database Administrator's Guide.
To improve ODM performance, enable parallelism by setting the database initialization parameters PARALLEL_MAX_SERVERS and PARALLEL_MIN_SERVERS based on the characteristics of your system.
You should change the ODM default passwords after installation completes. You change ODM passwords just as you change any other database passwords.
Documentation for the ODM API, created using Javadoc, is in the file $ORACLE_HOME/dm/doc/odmjdoc.tar.
You should untar this file so that users can display it in a browser.
The following ODM configuration parameters reside in the ODM_CONFIGURATION table. These parameters may require modification for your environment.
Data type is int; default is 10. Specifies the maximum depth of any Network Feature for ABN setting.
Data type is int; default is 10. Specifies the maximum number of Network Features for ABN setting.
Data type is int; default is 5. Specifies maximum number of consecutive pruned Network Features for ABN setting.
Data type is int; default is 50000.
Data type is int; default is 5. Specifies the number of bins used by automated binning for categorical attributes. This value should be >= 2.
Data type is STRING; default is OTHER. Specifies the name of the "Other" bin generated during Top-n categorical binning.
Data type is int; default is 100. Specifies the maximum number of bins allowed for numerical attributes for clustering. Useful values are between 2 and 100. This parameter is used in conjunction with CL_ALG_SETTING_KM_BIN_FACTOR and CL_ALG_SETTING_OC_BIN_FACTOR.
Data type is int; default is 5. Specifies the number of bins used by automated binning for numerical attributes. This value should be >= 2.
Data type in int; default is 50000. Specifies the maximum number of unique sequence IDs per partition used by clustering apply.
Data type is int; default is 50000. Keeps the computations constrained to memory-sized chunks and determines the size of the random sample used for MDL computations (scoring within build). There is no maximum value; this value should not be smaller than 1000.
Data type is int; default is 50000. Constrains the scoring to memory-sized chunks of the data and loop through such chunks. The value for this parameter depends on the sorting area (SA) and the number of clusters. The larger the SA, the larger this parameter can be. A rough formula for this parameter is
CLUSTERING_APPLY_SEQ_PER_PARTITION = SA/(100*Num_Clusters)
where "SA" is the size of the sorting area and "Num_Clusters" is the number of clusters.
Data type is NUMBER; default is 1.353. Controls the level of statistical significance for O-Cluster to determine if more data is necessary to refine a model.
Data type is NUMBER; default is 2. Factor used in automatic bin number computation for the k-means algorithm. Increasing this value will increase resolution by increasing the number of bins. However, the number of bins is also capped by AUTO_BIN_CL_NUMERICAL_NUM.
Data type is int; default is 10000. Number of rows used by the in-memory buffer used by k-means. For an installation with limited memory, this number should be smaller than the default data size. Summarization is activated for datasets larger than the buffer size.
Data type is NUMBER; default is 20. Controls the number of points produced by data summarization for k-means. The larger the value, the more points. The formula for the number of points is:
Number of Points = CL_ALG_SETTING_KM_FACTOR *
Num_Attributes * Num_Clusters
where "Num_Attributes" is the number of attributes and "Num_Clusters" is the number of clusters.
The number of points must be <= 1000. This parameter can be any positive value; however, a small number of summarization points can produce poor accuracy.
Data type is int; default is 10. Controls the minimum number of rows required by O-Cluster to find a cluster. For data tables with a very small number of rows, this number should be set to a value between 2 and 10.
Data type is NUMBER; default is 0.9. Factor used in automatic bin number computation for the O-Cluster algorithm. Increasing this value will increase the number of bins. However, increasing the number of bins may have a negative effect on the statistical significance of the model.
Data type is int; default is 50000. Specifies the number of rows used by the in-memory buffer used by O-Cluster. For an installation with limited memory, this number should be smaller than the default size.
Data type is int; default is 1. Must be 1 or 2. 1 specifies that k-means is a balanced tree; 2 specifies that k-means is an unbalanced tree.
Data type is int; default is 2. Must be 0, 1, 2, or 3. Specifies the type of messages written to the LOG_FILE. 0 means no logging; 1 means write Internal Error, Error, and Warning messages to LOG_FILE; 2 means write all messages for 1 plus Notifications; 3 means write all messages for 2 plus trace information.
Data type is int; default is 0. Must be 0, 1, 2, or 3. Enables trace for the ODM client. 0 indicates no trace; 1 indicates low; 2 indicates moderate; 3 indicates high.
Data type is STRING; default is ODM. Specifies the owner of the ODM schema repository.
Data type is int; default is 0. Must be 0, 1, 2, or 3. Enables trace for the ODM client. 0 indicates no trace; 1 indicates low; 2 indicates moderate; 3 indicates high.
Data type is int; default is 0. Must be 0, 1, 2, or 3. Enables trace for the ODM client. 0 indicates no trace; 1 indicates low; 2 indicates moderate; 3 indicates high.
You can speed up clustering package (dmcuh, dmcub, dmkmh, dmkmb, dmoch, dmocb) procedures by compiling them into native code residing in shared libraries. The procedures are translated into C code, then compiled with your usual C compiler and linked into the Oracle process. For details on how to compile PL/SQL procedures into native code, see the PL/SQL User's Guide and Reference.
Oracle9i Data Mining is an option to the Oracle9i Enterprise Edition. If ODM is part of your installation, the following query should return a value TRUE
:
SELECT value FROM v$option WHERE parameter = 'Oracle Data Mining';
This query is usually run by the DBA logged in as system/manager.
When using ODM in a shared JVM environment, such as when integrated with a servlet application, all connections made to an ODM server (also known as a DMS) must be based on databases with compatible character sets. Otherwise, string length tests conducted in the JVM may not recognize these differences, allowing data to pass to the database, which could result in server-side failures.
To start ODM, log in as ODM user and start the ODM Monitor with the following SQL*Plus command:
exec DM_START_MONITOR
To stop ODM, log in as ODM user and stop the ODM Monitor with the following SQL*Plus command:
exec DM_STOP_MONITOR
Executing an ODM method results in the execution of a PL/SQL program in an Oracle9i database. Errors can occur at the Java level or at the PL/SQL level. There are two error tables that you should consult when errors occur in programs that use ODM classes and methods:
Our goal is to make Oracle products, services, and supporting documentation accessible, with good usability, to the disabled community. To that end, our documentation includes features that make information available to users of assistive technology. This documentation is available in HTML format, and contains markup to facilitate access by the disabled community. Standards will continue to evolve over time, and Oracle Corporation is actively engaged with other market-leading technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers. For additional information, visit the Oracle Accessibility Program Web site at http://www.oracle.com/accessibility/
.
JAWS, a Windows screen reader, may not always correctly read the code examples in this document. The conventions for writing code require that closing braces should appear on an otherwise empty line; however, JAWS may not always read a line of text that consists solely of a bracket or brace.
|
Copyright © 2002 Oracle Corporation. All Rights Reserved. |
|