Oracle® Database VLDB and Partitioning Guide 11g Release 1 (11.1) Part Number B32024-01 |
|
|
View PDF |
Although most organizations have long regarded their stores of data as one of their most valuable corporate assets, how this data was managed and maintained varies enormously. Originally, data was used to help achieve operational goals, run the business, and help identify the future direction and success of the company.
However, new government regulations and guidelines are a key driving force in how and why data is being retained, as they are now requiring organizations to retain and control information for very long periods of time. Consequently, today there are two additional objectives IT managers are trying to satisfy: to store vast quantities of data, for the lowest possible cost; and to meet the new regulatory requirements for data retention and protection.
This chapter discusses the components in the Oracle Database which can be used to build an Information Lifecycle Management (ILM) strategy. This chapter contains the following topics:
Information today comes in a wide variety of types, for example an E-mail message, a photograph, or an order in an Online Transaction Processing System. Therefore, once you know the type of data and how it will be used, you already have an understanding of what its evolution and final destiny is likely to be.
One of the challenges facing each organization is to understand how its data evolves and grows, monitor how its usage changes over time, and decide how long it should survive, while adhering to all the rules and regulations that now apply to that data. Information Lifecycle Management (ILM) is designed to address these issues, with a combination of processes, policies, software, and hardware so that the appropriate technology can be used for each stage in the lifecycle of the data.
The Oracle Database provides the ideal platform for implementing an ILM solution, because it offers:
Application Transparency
Application Transparency is very important in ILM because it means that there is no need to customize applications and it also allows various changes to be made to the data without any impact on the applications that are using that data. Therefore, data can easily be moved at the different stages of its lifecycle and access to the data can be optimized via the database. Another important benefit is that application transparency offers the flexibility required to quickly adapt to any new regulatory requirements, again without any impact on the existing applications.
Fine-grained
Oracle is able to view data at a very fine-grained level as well as group related data together, whereas storage devices only see bytes and blocks.
Low-Cost
With so much data to retain, using low cost storage is a key factor in implementing ILM. Since Oracle can take advantage of many types of storage devices, the maximum amount of data can be held for the lowest possible cost.
Enforceable Compliance Policies
When information is kept for compliance reasons, it is imperative to show to regulatory bodies that data is being retained and managed in accordance with the regulations. Within Oracle, it is possible to define security and audit policies, which enforce and log all access to data.
Information Lifecycle Management is concerned with all data in an organization. This includes not just structured data, such as orders in an OLTP system or a history of sales in a data warehouse, but also unstructured data, such as E-mail, documents, and images.
Although the Oracle Database already supports the storing of unstructured data through the use of BLOBs and Oracle Fast Files (available in Oracle Database 11g), a sophisticated document management system is available in Oracle Content Database, when used in conjunction with the Enterprise Edition. It includes role-based security to ensure that content is only accessed by authorized personnel and policies which describe what happens to the content during its lifetime.
Therefore, if all of the information in your organization is contained in an Oracle database, then you can take advantage of the features and functionality provided by the database to manage and move the data as it evolves during its lifetime, without having to manage multiple types of data stores.
Today, many organizations must retain specific data for a specific period of time. Failure to comply with these regulations could result in organizations having to pay very heavy fines. Therefore, around the world, a number of regulatory requirements, such as Sarbanes-Oxley, HIPAA, DOD5015.2-STD in the US and the European Data Privacy Directive in the European Union, are changing how organizations manage their data. These regulations specify what data must be retained, whether it can be changed, and for how long it must be retained, which could be for a period of 30 years or longer.
These regulations frequently demand that electronic data is secure from unauthorized access and changes, and that there is an audit trail of all changes to data and by whom. The Oracle Database can retain huge quantities of data without impacting application performance. It also contains the features required to restrict access and prevent unauthorized changes to data, and can be further enhanced with Oracle Database Vault and Oracle Audit Vault. The Oracle Database also provides cryptographic functions that can be used to demonstrate that a highly privileged user has not intentionally modified data. Flashback Data Archive can be used to show all the versions of a row during its lifetime.
Building an Information Lifecycle Management solution using the Oracle Database is quite straightforward and can be completed by following these four simple steps, although Step 4 is optional if ILM is not being implemented for compliance:
In order to make effective use of Information Lifecycle Management, the first step is to look at all the data in your organization and determine:
What data is important, where is it stored, and what needs to be retained
How this data flows within the organization
What happens to this data over time and whether it is still needed
The degree of data availability and protection that is needed
Data retention for legal and business requirements
Once there is an understanding of how the data is used, it can then be classified on this basis. The most common type of classification is by age or date, but other types are possible, such as by product or privacy. A hybrid classification could also be used, such as by privacy and age.
In order to treat the data classes differently, the data needs to be physically separated. When information is first created, it is often frequently accessed, but then over time it may be referenced very infrequently. For instance, when a customer places an order, they regularly look at the order to see its status and whether it has been shipped. Once it arrives, they may never reference that order again. This order would also be included in regular reports that are run to see what goods are being ordered, but, over time, would not figure in any of the reports and may only be referenced in the future if someone does a detailed analysis that involves this data. Therefore, orders could be classified by the Financial Quarters Q1, Q2, Q3, and Q4, and as Historical Orders.
The advantage of using this approach is that when the data is grouped at the row level by its class, which in this example would be the date of the order, all orders for Q1 can be managed as a self contained unit, where as the orders for Q2 would reside in a different class. This can be achieved by using partitioning. Since partitions are completely transparent to the application, the data is physically separated but the application still sees all of the orders.
Partitioning involves physically placing data according to a data value, and a frequently used technique is to partition information by date. Figure 5-1 illustrates a scenario where the orders for Q1, Q2, Q3, and Q4 are stored in individual partitions and the orders for previous years are stored in other partitions.
Figure 5-1 Allocating Data Classes to a Partition
Oracle offers several different partitioning methods. Range partitioning is one of the most frequently used partitioning methods for ILM. Interval and reference partitioning (introduced in Oracle Database 11g) are also particularly suited for use in an ILM environment.
There are a number of benefits to partitioning data. Partitioning provides an easy way to distribute the data across appropriate storage devices depending on its usage, while still keeping the data online and stored on the most cost-effective device. Since partitioning is completely transparent to anyone accessing the data, no application changes are required, thus partitioning can be implemented at any time. When new partitions are required, they are simply added using the ADD PARTITION
clause or they can be created automatically if interval partitioning is being used.
Among other benefits, each partition can have its own local index. When the optimizer uses partition pruning, queries will only access the relevant partitions instead of all partitions, thus improving query response times.
An analysis of your data is likely to reveal that initially, it is accessed and updated on a very frequent basis. As the age of the data increases, its access frequency diminishes to almost negligible, if any. Therefore, most organizations find themselves in the situation where many users are accessing current data while very few users are accessing older data, as illustrated in Figure 5-2. Thus, data can be considered to be one of the following: active, less active, historical, or ready to be archived.
With so much data being held, during its lifetime the data should be moved to different physical locations. Depending on where the data is in its lifecycle, it needs to be located on the most appropriate storage device.
Since Oracle Database can take advantage of many different storage options, the next step is to establish the required storage tiers. Although you can create as many storage tiers as you require, a suggested starting point are the following tiers:
High Performance
The high performance storage tier is where all the important and frequently accessed data would be stored, such as the partition holding our Q1 orders. This would utilize smaller, faster disks on high performance storage devices.
Low Cost
The low cost storage tier is where the less frequently accessed data is stored, such as the partitions holding the orders for Q2, Q3, and Q4. This tier would be built using large capacity disks, such as those found in modular storage arrays or the low costs ATA disks, which offer the maximum amount of inexpensive storage.
Online Archive
The online archive storage tier is where all the data that is seldom accessed or modified would be stored. It is likely to be extremely large and to store the maximum quantity of data. Various techniques can be used to compress the data. This tier could be located in the database or it could be in another database, which serves as a central archive database for all information within the enterprise. Stored on low cost storage devices, such as ATA drives, the data would still be online and available, for a cost that is only slightly higher than storing this information on tape, without the disadvantages that come with archiving data to tape. If the Online Archive storage tier is identified as read-only, then it would be impossible to change the data and subsequent backups would not be required after the initial database backup.
Offline Archive (optional)
The offline archive storage tier is an optional tier because it is only used when there is a requirement to remove data from the database and store it in some other format, such as XML on a tape.
Figure 5-2 illustrates how data is used over a period of time. Using this information, it can be determined that to retain all this information, several storage tiers are required to hold all of the data, which also has the benefit of significantly reducing total storage costs.
Once the storage tiers have been created, the data classes identified in "Step 1: Define the Data Classes" will be physically implemented inside the database using partitions. This approach provides an easy way to distribute the data across the appropriate storage devices depending on its usage, while still keeping the data online and readily available, and stored on the most cost-effective device.
Note:
Automatic Storage Management (ASM) can also be used to manage the data across the storage tiers.Once the storage tiers have been defined, the data classes (partitions) identified in Step 1 can be assigned to the appropriate storage tiers. This provides an easy way to distribute the data across the appropriate storage devices depending on its usage, keeping the data online and available, and stored on the most cost-effective device. This is illustrated in Figure 5-3. Using this approach, no application changes are required because the data is still seen.
One of the benefits of implementing an ILM strategy is the cost savings that can result from using multiple tiered storage. Assume that we have 3 TB worth of data to store, comprised of: 200 GB on High Performance, 800 GB on Low Cost, and 2 TB on Online Archive. Assume the cost per GB is $72 on the High Performance tier, $14 on the Low Cost tier, and $7 on the Online Archive tier.
Table 5-1 illustrates the possible cost savings using tiered storage, rather than storing all data on one class of storage. As you can see, the cost savings can be quite significant and, if the data is suitable for database compression, then even further cost savings are possible.
Table 5-1 Cost Savings Using Tiered Storage
Storage Tier | Single Tier using High Performance Disks | Multiple Storage Tiers | Multiple Tiers with Database Compression |
---|---|---|---|
High Performance (200 GB) |
$14,400 |
$14,400 |
$14,400 |
Low Cost (800 GB) |
$57,600 |
$11,200 |
$11,200 |
Online Archive (2 TB) |
$144,000 |
$14,000 |
$5,600 |
$216,000 |
$39,600 |
$31,200 |
The next step is to ensure that only authorized users have access to the data and to specify how to move the data during its lifetime. As the data ages, there are a number of techniques that can be used to migrate the data between the storage tiers.
The security of your data is another very important part of Information Lifecycle Management because the access rights to the data may change during its lifetime. In addition, there may be regulatory requirements that place exacting demands on how the data can be accessed.
The data in an Oracle Database can be secured using database features, such as:
Database Security
Views
Virtual Private Database
Virtual Private Database (VPD) defines a very fine-grained level of access to the database. Security policies determine which rows may be viewed and the columns that are visible. Multiple policies can be defined so that different users and applications see different views of the same data. For example, the majority of users could see the information for Q1, Q2, Q3, and Q4, while only authorized users would be able to view the historical data.
A security policy is defined at the database level and is transparently applied to all database users. The benefit of this approach is that it provides a secure and controlled environment for accessing the data, which cannot be overridden and can be implemented without requiring any application changes. In addition, read-only tablespaces can be defined which ensures that the data will not change.
During its lifetime, data will need to be moved. This may occur for the following reasons:
For performance, only a limited number of orders are held on high performance disks
Data is no longer frequently accessed and is using valuable high performance storage, and needs to be moved to a low-cost storage device
Legal requirements demand that the information is always available for a given period of time, and it needs to be held safely for the lowest possible cost
There are a number of ways that data can be physically moved in the Oracle Database to take advantage of the different storage tiers. For example, if the data is partitioned, then a partition containing the orders for Q2 could be moved online from the high performance storage tier to the low cost storage tier. Since the data is being moved within the database, it can be physically moved, without affecting the applications that require it or causing disruption to regular users.
Sometimes individual data items, rather than a group of data, must be moved. For example, suppose data was classified according to a level of privacy and a report, which was once secret, is now to be made available to the public. If the classification changed from secret to public and the data was partitioned on its privacy classification, then the row would automatically move to the partition containing public data.
Whenever data is moved from its original source, then it is very important to ensure that the process selected adheres to any regulatory requirements, such as, the data cannot be altered, is secure from unauthorized access, easily readable, and stored in an approved location.
The last step in an Information Lifecycle Management solution is the creation of policies for compliance. When data is decentralized and fragmented, compliance policies have to be defined and enforced in every data location, which could easily result in a compliance policy being overlooked. However, using the Oracle Database to provide a central location for storing data means that it is very easy to enforce compliance policies as they are all managed and enforced from one central location.
When defining compliance policies, consider the following areas:
Data Retention
Immutability
Privacy
Auditing
Expiration
The retention policy describes how the data is to be retained, how long it must be kept, and what happens at the end of life. An example of a retention policy is a record must be stored in its original form, no modifications are allowed, it must be kept for seven years, and then it may be deleted. Using Oracle Database security, it is possible to ensure that data remains unchanged and that only authorized processes can remove the data at the appropriate time. Retention policies can also be defined via a lifecycle definition in the ILM Assistant.
Immutability is concerned with proving to an external party that data is complete and has not been modified. Cryptographic or digital signatures can be generated by the Oracle Database and retained either inside or outside of the database, to show that data has not been altered.
The Oracle Database provides several ways to ensure data privacy. Access to data can be strictly controlled through the use of security policies defined using Virtual Private Database (VPD). In addition, individual columns can be encrypted so that anyone looking at the raw data cannot see its contents.
The Oracle Database has the ability to track all access and changes to data. These auditing capabilities can be defined either at the table level or through fine-grained auditing, which specifies the criteria for when an audit record is generated. Auditing can be further enhanced using Audit Vault.
There usually comes a point during the lifecycle of the data when it is no longer being regularly accessed and is considered eligible for archiving. Traditionally, at this time, the data would have been removed from the database and stored on tape, because it is capable of storing vast quantities of information for a very low cost. Today it is no longer necessary to archive that data to tape, instead it can remain in the database, or transferred to a central online archive database. All this information would be stored using low-cost storage devices whose cost per gigabyte is very close to that of tape.
There are a number of benefits to keeping all of the data in a database for archival purposes. The most important benefit is that the data will always be instantly available. Therefore, time is not wasted locating the tapes where the data was archived and determining whether the tape is readable and still in a format that can be loaded into the database.
If the data has been archived for many years, then development time may also be needed to write a program to reload the data into the database from the tape archive. This could prove to be expensive and time consuming, especially if the data is extremely old. If the data is retained in the database, then this is not a problem, because it is already online, and in the latest database format.
Holding the historical data in the database no longer impacts the time required to backup the database and the size of the backup. When RMAN is used to back up the database, it will only include in the backup the data that has changed. Since historical data is less likely to change, once the data has been backed up, it will not be backed up again.
Another important factor to consider is how the data is to be physically removed from the database, especially if it is to be transferred from a production system to a central database archive. Oracle provides the capability to move this data rapidly between databases by using transportable tablespaces or partitions, which moves the data as a complete unit.
When it is time to remove data from the database, the fastest way is to remove a set of data. This is achieved by keeping the data in its own partition. The partition can be dropped, which is a very fast operation. However, if this approach is not possible because data relationships must be maintained, then a conventional SQL delete statement must be issued. You should not underestimate the time required to issue the delete statement.
If there is a requirement to remove data from the database and there is a possibility that the data may need to be returned to the database in the future, then consider removing the data in a database format such as a transportable tablespace, or use the XML capability of the Oracle Database to extract the information in an open format.
Consider an online archive of your data into an Oracle database for the following reasons:
The cost of disk is approaching that of tape, eliminate the time to find the tape that contains the data and the cost of restoring that data
Data remains online when needed, faster access to meet business requirements
Data online means immediate access, so cannot be fined by regulatory body for failing to produce data
Use the current application to access the data, no need to waste resources to build a new application
The Oracle ILM Assistant provides a graphical user interface (GUI) for managing your ILM environment. Figure 5-4 shows the first screen of the ILM Assistant, which lists the outstanding tasks that should be performed.
The ILM Assistant provides the ability to create lifecycle definitions, which are assigned to tables in the database. Using this lifecycle definition, the ILM Assistant advises when it is time to move, archive, or delete data, as shown by the calendar. It will also illustrate the storage requirements and cost savings associated with moving the data.
The ILM Assistant can manage only partitioned tables. For non-partitioned tables, the ILM Assistant generates a script to show how the table could be partitioned, and it also provides the capability to simulate partitioning on a table to view the actions that would arise if the table were partitioned.
The ILM Assistant will not execute any commands for the tasks it recommends to be performed, such as migrating data to different storage tiers. Instead, it generates a script of the commands that need to be executed.
To assist with managing compliance issues, the ILM Assistant shows all Virtual Private Databases (VPD) and Fine-Grained Audit (FGA) policies that have been defined on tables under ILM control. In addition, both Database and FGA audit records can be viewed and digital signatures generated and compared.
The Oracle ILM Assistant requires that Oracle Application Express is installed in the database where the tables to be managed by the ILM Assistant reside.
The ILM Assistant provides capability in the following areas:
Lifecycle Setup
Lifecycle Management
Compliance & Security
Reports
The Lifecycle Setup area of the ILM Assistant is comprised of the following tasks that need to be performed to prepare for managing your data:
Logical Storage Tiers
Lifecycle Definitions
Lifecycle Tables
Preferences
If this is the first time that you have used the ILM Assistant, then it is here where you specify exactly how the data is to be managed by the ILM Assistant. The following steps must be completed before the ILM Assistant is able to give advice on data placement, as illustrated in Figure 5-5.
Define the logical storage tiers
Define the lifecycle definitions
Select tables to be managed by the lifecycle definitions
Figure 5-5 ILM Assistant: Specifying How Data is Managed
Other options available within setup include the ability to:
View partition simulation
View a lifecycle summary of mapped tables and their logical storage tiers and lifecycle definitions
View storage costs
Define policy notes
Customize the ILM Assistant via preferences
A logical storage tier is a name given to a logical group of storage devices; typically all disks of the same type will be identified by that name. For example, the group called High Performance could refer to all the high performance disks. Any number of logical storage tiers may be defined and the devices are identified by the assigned tablespaces, which reside upon them.
The Cost per GB value must be a value greater than zero. The value is used by the ILM Assistant to project storage costs when data is mapped to the tier. It is recommended that you enter a value that represents a reasonably accurate cost of storing data on the tier. This would include the physical purchase price of a device. However, you might also want to consider other associated costs, such as maintenance and running costs.
Each storage tier will have a set of assigned tablespaces that are labeled as a read-write preferred tablespace, read-only preferred tablespace, or a secondary tablespace. If read-write data can be migrated onto the tier, then the read-write preferred tablespace is required. If the storage tier will accept read-only data, then a read-only preferred tablespace must also be identified.
In addition to the preferred tablespaces, one or more secondary tablespaces may be assigned to the tier. Secondary tablespaces are typically located in the same location as the read-write preferred tablespace for the storage tier.
Since the ILM Assistant only supports a single preferred tablespace, any read-write data that must reside on the tier would generate a migration event to move the data to the read-write preferred tablespace. To avoid unnecessary data migration events, the ILM Assistant allows existing data to remain on a secondary tablespace for the storage tier.
A lifecycle definition describes how data migrates across the logical storage tiers during its lifetime. It is comprised of one or more lifecycle stages that select a logical storage tier, data attributes such as compression and read only, and a duration for data residing on that lifecycle stage.
A lifecycle definition is valid if it contains at least one lifecycle stage. There must be a final stage, which is either user specified or automatically generated by the ILM Assistant upon completion of the lifecycle definition process. For the final stage you must specify what happens to data at lifecycle end.
A lifecycle definition is comprised of a number of stages that describes what happens to data during its lifetime. Lifecycle stages are initially created in reverse time order (that is, working backwards in time from the current date). Every stage must have a unique name; an optional description can be supplied.
If the stage is not the final stage, then you must specify how long the data is to remain on this stage and any stage attributes such as whether the data should be compressed or set to read only. Note that it is only possible to specify a read only stage if a preferred read only tablespace has been defined for the logical storage tier for this stage.
The current stage represents the present time but can span any length of time. A lifecycle can only have one current stage. The final stage is required as it describes what happens when data reaches its end-of-life. A lifecycle can only have one final stage and it is automatically created if the user does not create one. Possible actions are:
Purge the data
Archive the data off-line
Allow the data to remain on-line
Stages that store data on-line also permit several attributes to be defined that affect the data. The supported attributes are:
Compress
Compress and Read-Only
Read-Only
Each stage is comprised of the following information:
Stage Type
A stage is classified as a current stage, final stage, or unclassified.
Stage Name
Displays the user-supplied name of the stage.
Stage Description
Displays the user-supplied stage description.
Action
Displays the action performed when data maps to the stage. Possible actions are:
Remain Online
Archive Offline
Purge
Tier Name
Displays the storage tier associated with the stage. For a stage that purges data or moves data offline, a tier is not specified.
Attributes
Displays the optional data attributes that will be applied to data when it maps to the stage. Possible values are:
Compress
Compress and Read-Only
Read-Only
Stage Duration
Displays the length of time the data can remain mapped to the stage.
Stage Start Date
Displays the actual calendar date for the beginning of the stage. The date is computed based on the adjacent stages and the user-specified fiscal start date.
Stage End Date
Displays the actual calendar date for the end of the stage. The date is computed based on the adjacent stages and the user-specified fiscal start date.
The Lifecycle Tables area identifies those tables that may be managed by the ILM Assistant, and it is here where these tables are mapped to a lifecycle definition, as illustrated in Figure 5-6. A database may contain many tables, only some of which you wish to consider as candidates for ILM. A table is automatically eligible if it is range partitioned on a date column. When the table is associated with a lifecycle definition, the ILM Assistant can manage its data. For tables having no partitioning, storage cost savings and storage tier migration can be modeled using a simulated partitioning strategy.
Figure 5-6 ILM Assistant: Lifecycle Tables
If the table is not yet partitioned, then you will be directed to a Partition Simulation page where you can setup a full simulation. Similar to setting up a managed table, a simulation can be previewed and accepted on this page. Upon returning from the simulation page, the table is now eligible for full lifecycle management in simulation mode.
The difference between a managed table and a simulated table is that a managed table contains actual partitions while a simulated table only contains fake partitioning data. All reports and event detection work with both types of lifecycle tables. However, any table upon which partitioning is being simulated will only be seen as being partitioned from within the ILM Assistant. All other tools will continue to see it as a non-partitioned table.
Though the lifecycle tables view shows all accessible tables, the ILM Assistant may not be able to manage every table. In those cases, the table will be marked as ineligible and a link will be provided to explain the exception. Some examples of ineligible tables are:
Tables having no date column
Tables partitioned on non-date columns
Tables partitioned using a partition type other than range
Tables containing a LONG column
Index-organized tables
The display for Lifecycle Tables can be customized to show managed, simulated, candidate, and ineligible tables, and is comprised of the following information:
Table Owner
The Oracle schema that owns the table
Table Name
The table that may allow ILM management
Storage Size
The current estimated size of the table. The value is scaled according to the Size Metric as specified within the Filter Options.
Data Reads
The current sum of all logical and physical reads for the table.
Data Writes
The current sum of all physical writes for the table.
Lifecycle Definition
If the ILM Assistant is managing the table, then the required lifecycle definition is displayed here.
Lifecycle Status
Provides the current status of the table. This will indicate whether the table is eligible, is managed, or is simulated. For tables that are ineligible, the status link provides an explanation regarding its incompatibility with the ILM Assistant.
Table Partitioning
Provides a status of the table partitioning. A table can have partitioning implemented, simulated, or none.
Cost Savings
When the ILM Assistant is managing a table, a total cost-savings value is computed and displayed here.
Partition Map
Indicates that the current table partitioning scheme is compatible with the lifecycle definition. Clicking on the icon displays a detailed report of the table partitions.
For installations having many tables, the ILM Assistant provides a table list caching system to prevent long page waits and possible browser timeouts. The table list is a snapshot of all user tables on the system that should be periodically refreshed to maintain consistency within the ILM Assistant. Typically, the table list should be refreshed when application tables have been added, changed, or removed outside of the ILM Assistant, or when up-to-date table statistics are desired.
By default, a table list refresh operation will attempt to scan for every table defined in the database. For large application environments, this can take a long time to complete. Typically, ILM Assistant management of tables is limited to a small number of tables. To avoid refreshing the table list with the entire set of tables found in the database, filtering may be used to narrow the number of tables to be scanned. For example, if the user was only interested in managing tables in the SH
schema, the Table Owner Filter can be set to SH
. To estimate the time it may take to do refresh, click Estimate Refresh Statistics. This will return the projected number of tables that match the filters as well as the time it will take to process the data.
Purging unused entries in the cache will clean up a cache that contains any entries that are not currently managed by the ILM Assistant. It will not affect any of the tables that currently match the filters.
As a guideline, the ILM Assistant can refresh the table list at a rate of 300 to 350 tables per minute. The operation may be interrupted from the Lifecycle Tables screen. An interrupt will stop the refresh operation as if it has reached the normal end of the table scan. Because of the nature of the process, an interrupt can take up to 30 seconds to stop the actual scan operation.
The Partition Map column in the Lifecycle Tables Report indicates whether all the partitions in the table will fit inside a stage and do not overlap stages. The Mapping Status indicates the quality of the partition-to-stage relationship. A green checkmark indicates the partition resides completely within the stage without violating date boundaries. A warning icon indicates some type of mismatch. Possible exceptions for the stage mapping are:
Misaligned partitions
A partition can be misaligned when it cannot fit into an entire stage. This can happen if the lifecycle stage duration is smaller than the smallest partition range. To resolve this, either choose a better lifecycle definition to manage the table or adjust the stage duration by editing the lifecycle definition.
Tablespace is not associated with a logical storage tier
This is very common for new ILM Assistant users. In order to perform cost analysis, the ILM Assistant needs to associate all referenced tablespaces with a tier. Typically, the easiest correction is to edit a logical storage tier and add the missing tablespace as a secondary tablespace.
The ILM Assistant provides a comprehensive storage cost and savings report associated with the managed or simulated table, as illustrated in Figure 5-7.
Figure 5-7 ILM Assistant: Partitioning for Simulated Tables
The report is divided into two main areas. The top portion of the report is a rollup showing the totals for the managed or simulated tables. For managed tables, there are two subsections that show data for a non-ILM environment using a single storage tier and an ILM managed, multi-tier environment. For simulated tables, a third section is provided that shows an ILM managed, multi-tier environment that includes the estimated effects of compression.
The bottom section of the storage costs page is the detail section that breaks up the cost areas by logical storage tier:
Single-Tier Size
Displays the total size of the entities. For a lifecycle-based report, the value represents the sum of all table sizes that are assigned the current lifecycle definition. For managed tables, the size is the actual size as indicated by the database storage statistics. For simulated tables, the size is the projected size as calculated by the user-specified number of rows and average row length.
Single-Tier Cost
Displays the single-tier cost, which is calculated by multiplying the single-tier size of the current entities by the cost of storing the data on the most expensive tier within the lifecycle definition.
Cost per GB
Displays the user-specified cost when setting up the storage tier. The value is used to calculate the storage costs for partitions that are assigned to the tier.
Multi-Tier Size
Displays the total size of the entities that reside on that tier. For lifecycles, it represents all table partitions that are associated with the current tier. For a table, it represents the sum of all partitions that are associated with the tier. The size does not include any projected compression.
Multi-Tier Cost
Displays the cost, which is calculated by multiplying the cost per gigabyte for the current tier by the space occupied by the entities. For lifecycles, it represents all table partitions that are associated with the current tier. For a table, it represents the sum of all partitions that are associated with the tier.
Multi-Tier Savings
Displays the savings, which is computed by subtracting the multi-tier cost from the calculated cost of storing the same data using the single-tier approach.
Percent Savings
Displays the ratio of multi-tier savings to the single-tier cost for the same data.
Multi-Tier Compressed Size
Displays the total size of the entities that reside on that tier. For lifecycles, it represents all table partitions that are associated with the current tier. For a table, it represents the sum of all partitions that are associated with the tier. The size includes projected compression based on the estimated compression factor assigned by the user.
This report item is only present when viewing simulated table data.
Multi-Tier Compressed Cost
Displays the cost, which is calculated by multiplying the cost per gigabyte for the current tier by the space occupied by the entities. For lifecycles, it represents all table partitions that are associated with the current tier. For a table, it represents the sum of all partitions that are associated with the tier. The size includes projected compression based on the estimated compression factor assigned by the user.
This report item is only present when viewing simulated table data.
Multi-Tier Compressed Savings
Displays the savings, which is computed by subtracting the multi-tier compressed cost from the calculated cost of storing the same data using the single-tier approach.
This report item is only present when viewing simulated table data.
Percent Savings
Displays the ratio of multi-tier compressed savings to the single-tier cost for the same data.
This report item is only present when viewing simulated table data.
Lifecycle Stages Compressed
When setting up lifecycle stages, the user has the option of requiring the partitions to be compressed when assigned to the stage. This value shows the number of stages assigned to the storage tier that have the compressed attribute set.
Partitions Compression
Displays the number of partitions on the storage tier that are currently compressed.
Implementing Partitioning is likely to be a major task for any organization and the ILM Assistant allows you to model the impact before actually reorganizing the data. To achieve this, the ILM Assistant requires the following information in simulation mode:
Lifecycle Definition
Select a lifecycle definition that will be used to manage the simulated table. The simulated partitions will be derived from the lifecycle stages defined in the lifecycle. The ILM Assistant will determine the optimal date range based on the stage duration information supplied.
Partitioning Column
Select a suitable date column as the partitioning key. If the current table has only one date column, then the column will automatically be selected and displayed in read-only form.
Partition Date Interval
Displays the optimal partition range interval based on the selected lifecycle definition. The ILM Assistant will compute an interval that will guarantee that all generated partitions will properly align with the lifecycle stages.
Number of Rows
Provide the number of rows in the current table. The default value is retrieved from the current tables database statistics. If the default value is unavailable, or you wish to project future growth, you may enter any value greater than zero.
Average Row Length
Provide the average row length for the table. The default value is retrieved from the current tables database statistics. If the statistics are not valid, then the ILM Assistant will query the table and calculate a maximum row size. If the default value is unsuitable, or you wish to project future growth, then you may enter any value greater than zero.
Estimated Compression Factor
Provide a compression factor. The compression factor is used exclusively by the ILM Assistant to estimate storage costs and savings. The factor is purely an estimate, but can give you savings potential. A value of one indicates no compression is projected. A value greater than one indicates a reduction in space using the formula reduction = 1 / factor. The default value is calculated by sampling a small percentage of the table for compression potential.
An additional option after previewing the simulation is Migration Script generation, as illustrated in Figure 5-7. This allows the user to create a script that can be used to convert the existing non-partitioned table to a partitioned counterpart. It should be noted that the script contains a simple create operation and a command to migrate the existing data; however, parts of the script have been commented out to prevent accidental operation. A conversion of a table to a partitioned table should be carefully planned.
Preferences control various aspects of the ILM Assistant's behavior and display of data (for example, the default date format for most entered values and reports, or the default number of rows to display). The following preferences can be set:
Compression sample block count
Compression sample percent
Date format (Long form)
Date format (Short form)
Demonstration Mode
Specifies a factor that amplifies the actual table sizes. A value of one effectively disables the mode since multiplying a number by one does not change the original value.
Language preference
Lifecycle table view filter
Specifies the default selection to view when visiting the Lifecycle Tables page. Values can be combined to indicate multiple types of tables. For example, 3 indicates that both managed and simulated tables are to be shown. Possible values are:
The default value is 7, which excludes ineligible tables.
Maximum report rows to display
Maximum viewable tables
Refresh rate for progress monitoring
Report column maximum display length
Start page for lifecycle setup
Possible values are:
Logical Storage Tiers
Lifecycle Definitions
Lifecycle Tables
Storage size metric
Specifies the default size metric to be used when viewing storage size values. Possible values are:
The value is case sensitive.
Lifecycle Management is concerned with the tasks that must be performed to move data to the correct place in the Information Lifecycle. Information is available on the following:
Lifecycle Events Calendar
Lifecycle Events
Event Scan History
The Lifecycle Events Calendar shows the calendar of previous, current, and (optionally,) future lifecycle events that must be performed to place data at the appropriate place in the information lifecycle, as illustrated in Figure 5-5. You can use the Previous Month with Events button to navigate to previous months containing lifecycle events.
To identify which data must be moved, click on the Scan for Events button which will ask whether to scan for all events up to the current day, or into the future. Additionally, you may choose to evaluate all tables or selected tables. The ILM Assistant will then compare the current location of data with where it should be stored in accordance with the lifecycle definition and recommend the appropriate movement. It will also advise if data should be compressed or set to read only as defined by the lifecycle definition. All the recommendations made by the ILM Assistant are applied to partitions only.
The Lifecycle Events report shows details about data migration events and provides a way to generate scripts to perform their actions. You can select some or all of the displayed events by clicking the checkboxes in the first column. You need to select events to generate scripts or to dismiss events. To generate a script on the selected events, click the Generate Script button. To dismiss the selected events to make them permanently disappear, click the Dismiss Selected Events button.
The event summary shows the following pieces of information:
Recommended Action
Indicates the type of event that was detected by the scan operation. Possible event types are:
MOVE PARTITION
Indicates that a partition should be moved from its current logical storage tier to a new logical storage tier. The movement is achieved by moving the partition from one tablespace to another.
COMPRESSION
Indicates that the partition should have data compression enabled.
READ-ONLY
Indicates that the partition should be set to read-only.
PURGE
Indicates that the partition should be physically deleted.
Partition Name
Describes the affected partition.
Current Tier
Describes the current location of the partition.
Recommended Tier
Describes the target storage tier for move operations.
Cost Savings
Indicates the potential storage cost savings if the event action were to be implemented.
Table Owner and Name
Describes the partition table owner and name.
Event Date
Indicates the date on which the action should be performed. For events that should have been resolved in the past, a single keyword Past is shown; fore events in the future, a calendar date is displayed.
Event Details
Provides a link to event details. This area describes lifecycle details that affected the scan operation.
When a partition requires several logical operations such as move and compress, the ILM Assistant displays the operations as separate events. However, in the script, the operations may be combined into a single SQL DDL statement.
The ILM Assistant currently does not have any archive capability. Therefore, selecting archive events generates a script that identifies which partitions should now be archived and lists them as comments.
Any authorized user can invoke event scanning via the Lifecycle Events Calendar. Over time, tracking the scan activity can be quite difficult, so a history is made available.
The history report shows the following pieces of information:
Scan Date
Submitted by User
Lowest Event Date
Highest Event Date
Table Owner and Name
Number of Events
Lifecycle Status
The Compliance & Security area shows everything that can be used to enforce security and help maintain compliance with the numerous regulations from around the world. It provides an area to:
View Current Status
Prove Immutability
View Privacy & Security Policies
View Auditing
Manage Policy Notes
Current status summarizes the status of all the various Compliance & Security features that are available. For example, it advises how many Virtual Private Database (VPD) policies have been defined, when a digital signature was last generated, and when a comparison of digital signatures was last performed.
Some regulations stipulate that it must be shown that data has not changed since it was entered into the database. One of the techniques that can be used to prove that data has not been altered is to generate a digital signature.
Oracle Database provides the capability to generate a digital signature for a SQL result set. This can be generated inside the ILM Assistant and is achieved by creating a named SQL result set which includes the query to describe the collection of records. The digital signature is generated and is initially saved in a text file.
To show that the data records in a query have not been altered, a digital signature can be presented for a previously defined SQL query, and re-generated on today's data and the signatures compared, to show that the data has not changed since the digital signature was originally generated.
The Privacy & Security area enables you to view:
A summary of privacy and security definitions for each ILM table
Virtual Private Database (VPD) policies
Security views on tables managed by the ILM Assistant
Reports on the access privileges granted to users for tables managed by the ILM Assistant
By default, the Lifecycle Table Summary is shown and VPD policies and user access information are available by selecting the appropriate links.
The Lifecycle Table Summary provides an overview for each table as to which features are being used in terms of VPD policies and table grants issued.
Using standard database privileges, it is possible to limit access to a table to certain users. However,such access allows users to read all information in that table. VPD Policies provide a finer level of control on who can access information. Using a VPD Policy, it is possible to write sophisticated functions, which define exactly which data is visible to a user.
For example, a policy could say that certain users can only view the last 12 months of data, while other users can view all of the data. Another policy could say that the only data visible is in the state where the office is located. Therefore, VPD Policies are an extremely powerful tool in controlling access to information. Only VPD policies that have been defined on tables that are being managed by the ILM Assistant are shown on the VPD Policies report.
Some regulations require that an audit trail be maintained of all access and changes to data. In the Oracle Database, several types of auditing are available: database and fine-grained. They each create their own audit records, which can be viewed as one consolidated report in the ILM Assistant that can be filtered on several criteria.
Within the auditing area on the ILM Assistant, it is possible to:
View the Current Audit Status
Manage Fine-Grained Audit Policies
View Audit Records
Standard Auditing within the Oracle Database logs all types of access to a table. However, there may be instances when it is desirable to only audit an event when a certain condition is met (for example, the value of the transaction being altered is greater than $10,000). This type of auditing is possible using Fine-Grained Audit policies where an audit condition can be specified and an optional function can be called for more sophisticated processing.
It is possible within the ILM Assistant to view both database and fine-grained audit records for tables mapped to Lifecycle Definitions in the ILM Assistant. An icon represents the type of audit record: database (indicated by a disc) or FGA. Use the Filter condition to filter the audit records that are displayed and click on the report heading to sort the data on that column.
By default, the ILM Assistant only displays audit records for the current day. To see audit records for previous days, you must use the filter options to specify a date range of records to display.
Policy notes provide textual documentation of your data management policies or anything that you wish to document with respect to managing data during its lifetime. Policy notes are informational only; they do not affect the tasks performed by the ILM Assistant. They can be used as a central place to describe your policies, as reminders, and as a way to prove that your policies are documented. They can also be used to document SLA (Service Level Agreements) and to document the compliance rules that you are trying to enforce.
The following example illustrates how to manually create storage tiers and partition a table across those storage tiers and then setup a VPD policy on that database to restrict access to the online archive tier data.
REM Setup the tablespaces for the data REM These tablespaces would be placed on a High Performance Tier CREATE SMALLFILE TABLESPACE q1_orders DATAFILE 'q1_orders' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE q2_orders DATAFILE 'q2_orders' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE q3_orders DATAFILE 'q3_orders' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE q4_orders DATAFILE 'q4_orders' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM These tablespaces would be placed on a Low Cost Tier CREATE SMALLFILE TABLESPACE "2006_ORDERS" DATAFILE '2006_orders' SIZE 5M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE "2005_ORDERS" DATAFILE '2005_orders' SIZE 5M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM These tablespaces would be placed on the Online Archive Tier CREATE SMALLFILE TABLESPACE "2004_ORDERS" DATAFILE '2004_orders' SIZE 5M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE old_orders DATAFILE 'old_orders' SIZE 15M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM Now create the Partitioned Table CREATE TABLE allorders ( prod_id NUMBER NOT NULL, cust_id NUMBER NOT NULL, time_id DATE NOT NULL, channel_id NUMBER NOT NULL, promo_id NUMBER NOT NULL, quantity_sold NUMBER(10,2) NOT NULL, amount_sold NUMBER(10,2) NOT NULL) -- -- table wide physical specs -- PCTFREE 5 NOLOGGING -- -- partitions -- PARTITION BY RANGE (time_id) ( partition allorders_pre_2004 VALUES LESS THAN (TO_DATE('2004-01-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE old_orders, partition allorders_2004 VALUES LESS THAN (TO_DATE('2005-01-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE "2004_ORDERS", partition allorders_2005 VALUES LESS THAN (TO_DATE('2006-01-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE "2005_ORDERS", partition allorders_2006 VALUES LESS THAN (TO_DATE('2007-01-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE "2006_ORDERS", partition allorders_q1_2007 VALUES LESS THAN (TO_DATE('2007-04-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE q1_orders, partition allorders_q2_2007 VALUES LESS THAN (TO_DATE('2007-07-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE q2_orders, partition allorders_q3_2007 VALUES LESS THAN (TO_DATE('2007-10-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE q3_orders, partition allorders_q4_2007 VALUES LESS THAN (TO_DATE('2008-01-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE q4_orders); ALTER TABLE allorders ENABLE ROW MOVEMENT; REM Here is another example using INTERVAL partitioning REM These tablespaces would be placed on a High Performance Tier CREATE SMALLFILE TABLESPACE cc_this_month DATAFILE 'cc_this_month' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; CREATE SMALLFILE TABLESPACE cc_prev_month DATAFILE 'cc_prev_month' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM These tablespaces would be placed on a Low Cost Tier CREATE SMALLFILE TABLESPACE cc_prev_12mth DATAFILE 'cc_prev_12' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM These tablespaces would be placed on the Online Archive Tier CREATE SMALLFILE TABLESPACE cc_old_tran DATAFILE 'cc_old_tran' SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO ; REM Credit Card Transactions where new partitions automatically are placed on the high performance tier CREATE TABLE cc_tran ( cc_no VARCHAR2(16) NOT NULL, tran_dt DATE NOT NULL, entry_dt DATE NOT NULL, ref_no NUMBER NOT NULL, description VARCHAR2(30) NOT NULL, tran_amt NUMBER(10,2) NOT NULL) -- -- table wide physical specs -- PCTFREE 5 NOLOGGING -- -- partitions -- PARTITION BY RANGE (tran_dt) INTERVAL (NUMTOYMINTERVAL(1,'month') ) STORE IN (cc_this_month ) ( partition very_old_cc_trans VALUES LESS THAN (TO_DATE('1999-07-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE cc_old_tran , partition old_cc_trans VALUES LESS THAN (TO_DATE('2006-07-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE cc_old_tran , partition last_12_mths VALUES LESS THAN (TO_DATE('2007-06-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE cc_prev_12mth, partition recent_cc_trans VALUES LESS THAN (TO_DATE('2007-07-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE cc_prev_month, partition new_cc_tran VALUES LESS THAN (TO_DATE('2007-08-01 00:00:00' ,'SYYYY-MM-DD HH24:MI:SS' ,'NLS_CALENDAR=GREGORIAN' )) TABLESPACE cc_this_month); REM Create a Security Policy to allow user SH to see all credit card data, REM PM only sees this years data, REM and all other uses cannot see the credit card data CREATE OR REPLACE FUNCTION ilm_seehist (oowner IN VARCHAR2, ojname IN VARCHAR2) RETURN VARCHAR2 AS con VARCHAR2 (200); BEGIN IF SYS_CONTEXT('USERENV','CLIENT_INFO') = 'SH' THEN -- sees all data con:= '1=1'; ELSIF SYS_CONTEXT('USERENV','CLIENT_INFO') = 'PM' THEN -- sees only data for 2007 con := 'time_id > ''31-Dec-2006'''; ELSE -- others nothing con:= '1=2'; END IF; RETURN (con); END ilm_seehist; / REM Then the policy is added with the DBMS_RLS package as follows: BEGIN DBMS_RLS.ADD_POLICY ( object_schema=>'SYSTEM' , object_name=>'cc_tran' , policy_name=>'ilm_view_history_data' , function_schema=>'SYSTEM' , policy_function=>'ilm_seehist' , sec_relevant_cols=>'tran_dt' ); END; /