Oracle® Streams Concepts and Administration 11g Release 1 (11.1) Part Number B28321-01 |
|
|
View PDF |
The following topics contain information about Oracle Streams high availability environments:
Configuring a high availability solution requires careful planning and analysis of failure scenarios. Database backups and physical standby databases provide physical copies of a source database for failover protection. Oracle Data Guard, in SQL apply mode, implements a logical standby database in a high availability environment. Because Oracle Data Guard is designed for a high availability environment, it handles most failure scenarios. However, some environments might require the flexibility available in Oracle Streams, so that they can take advantage of the extended feature set offered by Oracle Streams.
This chapter discusses some of the scenarios that can benefit from an Oracle Streams-based solution and explains Oracle Streams-specific issues that arise in high availability environments. It also contains information about best practices for deploying Oracle Streams in a high availability environment, including hardware failover within a cluster, instance failover within an Oracle Real Application Clusters (Oracle RAC) cluster, and failover and switchover between replicas.
See Also:
Oracle Data Guard Concepts and Administration for more information about Oracle Data Guard
Oracle Real Application Clusters Administration and Deployment Guide
Oracle Real Application Clusters (Oracle RAC) is the preferred method for protecting from an instance or system failure. After a failure, services are provided by a surviving node in the cluster. However, clustering does not protect from user error, media failure, or disasters. These types of failures require redundant copies of the database. You can make both physical and logical copies of a database.
Physical copies are identical, block for block, with the source database, and are the preferred means of protecting data. There are three types of physical copies: database backup, mirrored or multiplexed database files, and a physical standby database.
Logical copies contain the same information as the source database, but the information can be stored differently within the database. Creating a logical copy of your database offers many advantages. However, you should always create a logical copy in addition to a physical copy, not instead of physical copy.
A logical copy has the following benefits:
A logical copy can be open while being updated. This ability makes the logical copy useful for near real-time reporting.
A logical copy can have a different physical layout that is optimized for its own purpose. For example, it can contain additional indexes, and thereby improve the performance of reporting applications that utilize the logical copy.
A logical copy provides better protection from corruptions. Because data is logically captured and applied, it is very unlikely that a physical corruption can propagate to the logical copy of the database.
There are three types of logical copies of a database:
Logical standby databases
Oracle Streams replica databases
Application-maintained copies
Logical standby databases are best maintained using Oracle Data Guard in SQL apply mode. The rest of this chapter discusses Oracle Streams replica databases and application maintained copies.
See Also:
Oracle Database Backup and Recovery User's Guide for more information about database backups and mirroring or multiplexing database files
Oracle Data Guard Concepts and Administration for more information about physical standby databases and logical standby databases
Like Oracle Data Guard in SQL apply mode, Oracle Streams can capture database changes, propagate them to destinations, and apply the changes at these destinations. Oracle Streams is optimized for replicating data. Oracle Streams can capture changes at a source database, and the captured changes can be propagated asynchronously to replica databases. This optimization can reduce the latency and can enable the replicas to lag the primary database by no more than a few seconds.
Nevertheless, you might choose to use Oracle Streams to configure and maintain a logical copy of your production database. Although using Oracle Streams might require additional work, it offers increased flexibility that might be required to meet specific business requirements. A logical copy configured and maintained using Oracle Streams is called a replica, not a logical standby, because it provides many capabilities that are beyond the scope of the normal definition of a standby database. Some of the requirements that can best be met using an Oracle Streams replica are listed in the following sections.
See Also:
Oracle Streams Replication Administrator's Guide for more information about replicating database changes with Oracle StreamsThe greatest difference between a replica database and a standby database is that a replica database can be updated and a standby database cannot. Applications that must update data can run against the replica, including jobs and reporting applications that log reporting activity. Replica databases also allow local applications to operate autonomously, protecting local applications from WAN failures and reducing latency for database operations.
The production and the replica do not need to be running on the exact same platform. This provides more flexibility in using computing assets, and facilitates migration between platforms.
Oracle Streams replicas can use different character sets than the production database. Data is automatically converted from one character set to another before being applied. This ability is extremely important if you have global operations and you must distribute data in multiple countries.
If the replica is used for near real-time reporting, Oracle Streams can lag the production database by no more than a few seconds, providing up-to-date and accurate queries. Changes can be read from the online redo logs as the logs are written, rather than from the redo logs after archiving.
Oracle Streams supports unlimited numbers of replicas. Its flexible routing architecture allows for hub-and-spoke configurations that can efficiently propagate data to hundreds of replicas. This ability can be important if you must provide autonomous operation to many local offices in your organization. In contrast, because standby databases configured with Oracle Data Guard use the LOG_ARCHIVE_DEST_
n initialization parameter to specify destinations, there is a limit of ten copies when you use Oracle Data Guard.
See Also:
Oracle Database 2 Day + Data Replication and Integration Guide for more information about hub-and-spoke replication environments and for instructions that explain how to configure them
Oracle Streams Replication Administrator's Guide for a detailed example of such an environment
Oracle Streams replicas can be open to read/write operations at all times. If a primary database fails, then Oracle Streams replicas are able to instantly resume processing. A small window of data might be left at the primary database, but this data will be automatically applied when the primary database recovers. This ability can be important if you value fast recovery time over no lost data. Assuming the primary database can eventually be recovered, the data is only temporarily unavailable.
In a complex environment, changes need only be captured once. These changes can then be sent to multiple destinations. When a capture process is used to capture changes, this ability enables more efficient use of the resources needed to mine the redo logs for changes.
As mentioned previously, there are scenarios in which you might choose to use Oracle Streams to meet some of your high availability requirements. One of the rules of high availability is to keep it simple. Oracle Data Guard is designed for high availability and is easier to implement than an Oracle Streams-based high availability solution. If you decide to leverage the flexibility offered by Oracle Streams, then you must be prepared to invest in the expertise and planning required to make an Oracle Streams-based solution robust. This means writing scripts to implement much of the automation and management tools provided with Oracle Data Guard.
The best availability can be achieved by designing the maintenance of logical copies of data directly into an application. The application knows what data is valuable and must be immediately moved off-site to guarantee no data loss. It can also synchronously replicate truly critical data, while asynchronously replicating less critical data. Applications maintain copies of data by either synchronously or asynchronously sending data to other applications that manage another logical copy of the data. Synchronous operations are performed using the distributed SQL or remote procedure features of the database. Asynchronous operations are performed using Advanced Queuing. Advanced Queuing is a database message queuing feature that is part of Oracle Streams.
Although the highest levels of availability can be achieved with application-maintained copies of data, great care is required to realize these results. Typically, a great amount of custom development is required. Many of the difficult boundary conditions that have been analyzed and solved with solutions such as Oracle Data Guard and Oracle Streams replication must be reanalyzed and solved by the custom application developers. In addition, standard solutions like Oracle Data Guard and Oracle Streams replication undergo stringent testing both by Oracle and its customers. It will take a great deal of effort before a custom-developed solution can exhibit the same degree of maturity. For these reasons, only organizations with substantial patience and expertise should attempt to build a high availability solution with application maintained copies.
See Also:
Oracle Streams Advanced Queuing User's Guide for more information about developing applications with Advanced QueuingImplementing Oracle Streams in a high availability environment requires consideration of possible failure and recovery scenarios, and the implementation of procedures to ensure Oracle Streams continues to capture, propagate, and apply changes after a failure. Some of the issues that must be examined include the following:
When configuring a solution using Oracle Streams, it is important to anticipate failures and design availability into the architecture. You must examine every database in the distributed system, and design a recovery plan in case of failure of that database. In some situations, failure of a database affects only services accessing data on that database. In other situations, a failure is multiplied, because it can affect other databases.
This section contains these topics:
Configuring Oracle Real Application Clusters with Oracle Streams
Local or Downstream Capture with Oracle Streams Capture Processes
A configuration where each database is directly connected to every other database in the distributed system is the most resilient to failures, because a failure of one database will not prevent any other databases from operating or communicating. Assuming all data is replicated, services that were using the failed database can connect to surviving replicas.
See Also:
Oracle Streams Replication Administrator's Guide for a detailed example of such an environment
Although configurations where each database is directly connected to every other database provide the best high availability characteristics, they can become difficult to manage when the number of databases becomes large. Hub-and-spoke configurations solve this manageability issue by funneling changes from many databases into a hub database, and then to other hub databases, or to other spoke databases. To add a new source or destination, you simply connect it to a hub database, rather than establishing connections to every other database.
A hub, however, becomes a very important node in your distributed environment. Should it fail, all communications flowing through the hub will fail. Due to the asynchronous nature of the messages propagating through the hub, it can be very difficult to redirect a stream from one hub to another. A better approach is to make the hub resilient to failures.
The same techniques used to make a single database resilient to failures also apply to distributed hub databases. Oracle recommends Oracle Real Application Clusters (Oracle RAC) to provide protection from instance and node failures. This configuration should be combined with a "no loss" physical standby database, to protect from disasters and data errors. Oracle does not recommend using an Oracle Streams replica as the only means to protect from disasters or data errors.
See Also:
Oracle Database 2 Day + Data Replication and Integration Guide for more information about hub-and-spoke replication environments and for instructions that explain how to configure them
Oracle Streams Replication Administrator's Guide for a detailed example of such an environment
Using Oracle Real Application Clusters (Oracle RAC) with Oracle Streams introduces some important considerations. When running in an Oracle RAC cluster, a capture process runs on the instance that owns the queue that is receiving the captured logical change records (LCRs). Jobs should be running on all instances, and a propagation job running on an instance will propagate LCRs from any queue owned by that instance to destination queues. An apply process runs on the instance that owns the queue from which the apply process dequeues its messages. That might or might not be the same queue on which capture runs.
Any propagation to the database running Oracle RAC is made over database links. The database links must be configured to connect to the destination instance that owns the queue that will receive the messages.
You might choose to use a cold failover cluster to protect from system failure rather than Oracle RAC. A cold failover cluster is not Oracle RAC. Instead, a cold failover cluster uses a secondary node to mount and recover the database when the first node fails.
Note:
In an Oracle RAC environment, synchronous capture uses an internal mechanism to read changes made by all instances.Oracle Streams capture processes support capturing changes from the redo log on the local source database or at a downstream database at a different site. The choice of local capture or downstream capture has implications for availability. When a failure occurs at a source database, some changes might not have been captured. With local capture, those changes might not be available until the source database is recovered. In the event of a catastrophic failure, those changes might be lost.
Downstream capture at a remote database reduces the window of potential data loss in the event of a failure. Depending on the configuration, downstream capture enables you to guarantee all changes committed at the source database are safely copied to a remote site, where they can be captured and propagated to other databases and applications. Oracle Streams uses the same mechanism as Oracle Data Guard to copy redo data or log files to remote destinations, and supports the same operational modes, including maximum protection, maximum availability, and maximum performance.
Note:
Synchronous capture is always configured at the source database.See Also:
"Local Capture and Downstream Capture"The following sections provide best practices for recovering from failures.
This section contains these topics:
After a failure and restart of a single-node database, or a failure and restart of a database on another node in a cold failover cluster, the capture process automatically returns to the status it was in at the time of the failure. That is, if it was running at the time of the failure, then the capture process restarts automatically.
Similarly, for a capture process running in an Oracle Real Application Clusters (Oracle RAC) environment, if an instance running the capture process fails, then the queue that receives the captured LCRs is assigned to another node in the cluster, and the capture process is restarted automatically. A capture process follows its queue to a different instance if the current owner instance becomes unavailable, and the queue itself follows the rules for primary instance and secondary instance ownership.
See Also:
"Oracle Streams Capture Processes and Oracle Real Application Clusters"
"Queues and Oracle Real Application Clusters" for information about primary and secondary instance ownership for queues
It is important to ensure that a propagation continues to function after a failure of a destination database instance. A propagation job will retry (with increasing delay between retries) its database link sixteen times after a failure until the connection is reestablished. If the connection is not reestablished after sixteen tries, then the propagation schedule is disabled.
If the database is restarted on the same node, or on a different node in a cold failover cluster, then the connection should be reestablished. In some circumstances, the database link could be waiting on a read or write, and will not detect the failure until a lengthy time out expires. The time out is controlled by the TCP_KEEPALIVE_INTERVAL
TCP/IP parameter. In such circumstances, you should drop and re-create the database link to ensure that communication is reestablished quickly.
When an instance in an Oracle Real Application Clusters (Oracle RAC) cluster fails, the instance is recovered by another node in the cluster. Each queue that was previously owned by the failed instance is assigned to a new instance. If the failed instance contained one or more destination queues for propagations, then queue-to-queue propagations automatically failover to the new instance. However, for queue-to-dblink propagations, you must drop and reestablish any inbound database links to point to the new instance that owns a destination queue. You do not need to modify a propagation that uses a re-created database link.
In a high availability environment, you can prepare scripts that will drop and re-create all necessary database links. After a failover, you can execute these scripts so that Oracle Streams can resume propagation.
See Also:
"Configuring an Oracle Streams Administrator" for information about creating database links in an Oracle Streams environment
"Queues and Oracle Real Application Clusters" for more information about database links in an Oracle RAC environment
For messages to be propagated from a source queue to a destination queue, a propagation job must run on the instance owning the source queue. In a single-node database, or cold failover cluster, propagation resumes when the single database instance is restarted.
When running in an Oracle Real Application Clusters (Oracle RAC) environment, a propagation job runs on the instance that owns the source queue from which the propagation job sends messages to a destination queue. If the owner instance for a propagation job goes down, then the propagation job automatically migrates to a new owner instance. You should not alter instance affinity for Oracle Streams propagation jobs, because Oracle Streams manages instance affinity for propagation jobs automatically.
After a failure and restart of a single-node database, or a failure and restart of a database on another node in a cold failover cluster, the apply process automatically returns to the status it was in at the time of the failure. That is, if it was running at the time of the failure, then the apply process restarts automatically.
Similarly, in an Oracle Real Application Clusters (Oracle RAC) cluster, if an instance hosting the apply process fails, then the queue from which the apply process dequeues messages is assigned to another node in the cluster, and the apply process is restarted automatically. An apply process follows its queue to a different instance if the current owner instance becomes unavailable, and the queue itself follows the rules for primary instance and secondary instance ownership.
See Also:
"Oracle Streams Apply Processes and Oracle Real Application Clusters"
"Queues and Oracle Real Application Clusters" for information about primary and secondary instance ownership for queues