Skip Headers

Oracle® Database High Availability Architecture and Best Practices
10g Release 1 (10.1)

Part Number B10726-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to next page
Next
View PDF

Contents

Title and Copyright Information

Send Us Your Comments

Preface

Audience
Documentation Accessibility
Organization
Related Documents
Conventions

Part I Getting Started

1 Overview of High Availability

1.1 Introduction to High Availability
1.2 What is Availability?
1.3 Importance of Availability
1.4 Causes of Downtime
1.5 What Does This Book Contain?
1.6 Who Should Read This Book?

2 Determining Your High Availability Requirements

2.1 Why It Is Important to Determine High Availability Requirements
2.2 Analysis Framework for Determining High Availability Requirements
2.2.1 Business Impact Analysis
2.2.2 Cost of Downtime
2.2.3 Recovery Time Objective
2.2.4 Recovery Point Objective
2.3 Choosing a High Availability Architecture
2.3.1 HA Systems Capabilities
2.3.2 Business Performance, Budget and Growth Plans
2.3.3 High Availability Best Practices

Part II Oracle Database High Availability Features, Architectures, and Policies

3 Oracle Database High Availability Features

3.1 Oracle Real Application Clusters
3.2 Oracle Data Guard
3.3 Oracle Streams
3.4 Online Reorganization
3.5 Transportable Tablespaces
3.6 Automatic Storage Management
3.7 Flashback Technology
3.7.1 Oracle Flashback Query
3.7.2 Oracle Flashback Version Query
3.7.3 Oracle Flashback Transaction Query
3.7.4 Oracle Flashback Table
3.7.5 Oracle Flashback Drop
3.7.6 Oracle Flashback Database
3.8 Dynamic Reconfiguration
3.9 Oracle Fail Safe
3.10 Recovery Manager
3.11 Flash Recovery Area
3.12 Hardware Assisted Resilient Data (HARD) Initiative

4 High Availability Architectures

4.1 Oracle Database High Availability Architectures
4.1.1 "Database Only" Architecture
4.1.2 "RAC Only" Architecture
4.1.3 "Data Guard Only" Architecture
4.1.4 Maximum Availability Architecture
4.1.5 Streams Architecture
4.2 Choosing the Correct HA Architecture
4.3 Assessing Other Architectures

5 Operational Policies for High Availability

5.1 Introduction to Operational Policies for High Availability
5.2 Service Level Management for High Availability
5.3 Planning Capacity to Promote High Availability
5.4 Change Management for High Availability
5.5 Backup and Recovery Planning for High Availability
5.6 Disaster Recovery Planning
5.7 Planning Scheduled Outages
5.8 Staff Training for High Availability
5.9 Documentation as a Means of Maintaining High Availability
5.10 Physical Security Policies and Procedures for High Availability

Part III Configuring a Highly Available Oracle Environment

6 System and Network Configuration

6.1 Overview of System Configuration Recommendations
6.2 Recommendations for Configuring Storage
6.2.1 Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
6.2.2 Use an Array That Can Be Serviced Online
6.2.3 Mirror and Stripe for Protection and Performance
6.2.4 Load-Balance Across All Physical Interfaces
6.2.5 Create Independent Storage Areas
6.2.5.1 Storage Recommendations for Specific HA Architectures
6.2.6 Define ASM Disk and Failure Groups Properly
6.2.7 Use HARD-Compliant Storage for the Greatest Protection Against Data Corruption
6.2.8 Storage Recommendation for RAC
6.2.8.1 Protect the Oracle Cluster Registry and Voting Disk From Media Failure
6.3 Recommendations for Configuring Server Hardware
6.3.1 Server Hardware Recommendations for All Architectures
6.3.1.1 Use Fewer, Faster, and Denser Components
6.3.1.2 Use Redundant Hardware Components
6.3.1.3 Use Systems That Can Detect and Isolate Failures
6.3.1.4 Protect the Boot Disk With a Backup Copy
6.3.2 Server Hardware Recommendations for RAC
6.3.2.1 Use a Supported Cluster System to Run RAC
6.3.2.2 Choose the Proper Cluster Interconnect
6.3.3 Server Hardware Recommendations for Data Guard
6.3.3.1 Use Identical Hardware for Every Machine at Both Sites
6.4 Recommendations for Configuring Server Software
6.4.1 Server Software Recommendations for All Architectures
6.4.1.1 Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
6.4.1.2 Use an Operating System That is Fault-Tolerant to Hardware Failures
6.4.1.3 Configure Swap Partititions Appropriately
6.4.1.4 Set Operating System Parameters to Enable Future Growth
6.4.1.5 Use Logging or Journal File Systems
6.4.1.6 Mirror Disks That Contain Oracle and Application Software
6.4.2 Server Software Recommendations for RAC
6.4.2.1 Use Supported Clustering Software
6.4.2.2 Use Network Time Protocol (NTP) On All Cluster Nodes
6.5 Recommendations for Configuring the Network
6.5.1 Network Configuration Best Practices for All Architectures
6.5.1.1 Ensure That All Network Components Are Redundant
6.5.1.2 Use Load Balancers to Distribute Incoming Requests
6.5.2 Network Configuration Best Practices for RAC
6.5.2.1 Classify Network Interfaces Using the Oracle Interface Configuration Tool
6.5.3 Network Configuration Best Practices for Data Guard
6.5.3.1 Configure System TCP Parameters Appropriately
6.5.3.2 Use WAN Traffic Managers to Provide Site Failover Capabilities

7 Oracle Configuration Best Practices

7.1 Configuration Best Practices for the Database
7.1.1 Use Two Control Files
7.1.2 Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
7.1.3 Configure the Size of Redo Log Files and Groups Appropriately
7.1.4 Multiplex Online Redo Log Files
7.1.5 Enable ARCHIVELOG Mode
7.1.6 Enable Block Checksums
7.1.7 Enable Database Block Checking
7.1.8 Log Checkpoints to the Alert Log
7.1.9 Use Fast-Start Checkpointing to Control Instance Recovery Time
7.1.10 Capture Performance Statistics About Timing
7.1.11 Use Automatic Undo Management
7.1.12 Use Locally Managed Tablespaces
7.1.13 Use Automatic Segment Space Management
7.1.14 Use Temporary Tablespaces and Specify a Default Temporary Tablespace
7.1.15 Use Resumable Space Allocation
7.1.16 Use a Flash Recovery Area
7.1.17 Enable Flashback Database
7.1.18 Set Up and Follow Security Best Practices
7.1.19 Use the Database Resource Manager
7.1.20 Use a Server Parameter File
7.2 Configuration Best Practices for Real Application Clusters
7.2.1 Register All Instances with Remote Listeners
7.2.2 Do Not Set CLUSTER_INTERCONNECTS Unless Required for Scalability
7.3 Configuration Best Practices for Data Guard
7.3.1 Use a Simple, Robust Archiving Strategy and Configuration
7.3.2 Use Multiplexed Standby Redo Logs and Configure Size Appropriately
7.3.3 Enable FORCE LOGGING Mode
7.3.4 Use Real Time Apply
7.3.5 Configure the Database and Listener for Dynamic Service Registration
7.3.6 Tune the Network in a WAN Environment
7.3.7 Determine the Data Protection Mode
7.3.7.1 Determining the Protection Mode
7.3.7.2 Changing the Data Protection Mode
7.3.8 Conduct a Performance Assessment with the Proposed Network Configuration
7.3.9 Use a LAN or MAN for Maximum Availability or Maximum Protection Modes
7.3.10 Use ARCH for the Greatest Performance Throughput
7.3.11 Use the ASYNC Attribute to Control Data Loss
7.3.12 Evaluate SSH Port Forwarding with Compression
7.3.13 Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
7.3.14 Provide Secure Transmission of Redo Data
7.3.15 Set DB_UNIQUE_NAME
7.3.16 Set LOG_ARCHIVE_CONFIG Correctly
7.3.17 Recommendations for the Physical Standby Database Only
7.3.17.1 Tune Media Recovery Performance
7.3.18 Recommendations for the Logical Standby Database Only
7.3.18.1 Use Supplemental Logging and Primary Key Constraints
7.3.18.2 Set the MAX_SERVERS Initialization Parameter
7.3.18.3 Increase the PARALLEL_MAX_SERVERS Initialization Parameter
7.3.18.4 Set the TRANSACTION_CONSISTENCY Initialization Parameter
7.3.18.5 Skip SQL Apply for Unnecessary Objects
7.4 Configuration Best Practices for MAA
7.4.1 Configure Multiple Standby Instances
7.4.2 Configure Connect-Time Failover for Network Service Descriptors
7.5 Recommendations for Backup and Recovery
7.5.1 Use Recovery Manager to Back Up Database Files
7.5.2 Understand When to Use Backups
7.5.2.1 Perform Regular Backups
7.5.2.2 Initial Data Guard Environment Set-Up
7.5.2.3 Recovering from Data Failures Using File or Block Media Recovery
7.5.2.4 Double Failure Resolution
7.5.2.5 Long-Term Backups
7.5.3 Use an RMAN Recovery Catalog
7.5.4 Use the Autobackup Feature for the Control File and SPFILE
7.5.5 Use Incrementally Updated Backups to Reduce Restoration Time
7.5.6 Enable Change Tracking to Reduce Backup Time
7.5.7 Create Database Backups on Disk in the Flash Recovery Area
7.5.8 Create Tape Backups from the Flash Recovery Area
7.5.9 Determine Retention Policy and Backup Frequency
7.5.10 Configure the Size of the Flash Recovery Area Properly
7.5.11 In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
7.5.12 During Backups, Use the Target Database Control File as the RMAN Repository
7.5.13 Regularly Check Database Files for Corruption
7.5.14 Periodically Test Recovery Procedures
7.5.15 Back Up the OCR to Tape or Offsite
7.6 Recommendations for Fast Application Failover
7.6.1 Configure Connection Descriptors for All Possible Production Instances
7.6.2 Use RAC Availability Notifications and Events
7.6.3 Use Transparent Application Failover If RAC Notification Is Not Feasible
7.6.3.1 New Connections
7.6.3.2 Existing Connections
7.6.3.3 LOAD_BALANCE Parameter in the Connection Descriptor
7.6.3.4 FAILOVER Parameter in the Connection Descriptor
7.6.3.5 SERVICE_NAME Parameter in the Connection Descriptor
7.6.3.6 RETRIES Parameter in the Connection Descriptor
7.6.3.7 DELAY Parameter in the Connection Descriptor
7.6.4 Configure Services
7.6.5 Configure CRS for High Availability
7.6.6 Configure Service Callouts to Notify Middle-Tier Applications and Clients
7.6.7 Publish Standby or Nonproduction Services
7.6.8 Publish Production Services

Part IV Managing a Highly Available Oracle Environment

8 Using Oracle Enterprise Manager for Monitoring and Detection

8.1 Overview of Monitoring and Detection for High Availability
8.2 Using Enterprise Manager for System Monitoring
8.2.1 Set Up Default Notification Rules for Each System
8.2.2 Use Database Target Views to Monitor Health, Availability, and Performance
8.2.3 Use Event Notifications to React to Metric Changes
8.2.4 Use Events to Monitor Data Guard system Availability
8.3 Managing the HA Environment with Enterprise Manager
8.3.1 Check Enterprise Manager Policy Violations
8.3.2 Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines
8.3.3 Use Enterprise Manager to Manage Data Guard Targets
8.4 Highly Available Architectures for Enterprise Manager
8.4.1 Recommendations for an HA Architecture for Enterprise Manager
8.4.1.1 Protect the Repository and Processes As Well as the Configuration They Monitor
8.4.1.2 Place the Management Repository in a RAC Instance and Use Data Guard
8.4.1.3 Configure At Least Two Management Service Processes and Load Balance Them
8.4.1.4 Consider Hosting Enterprise Manager on the Same Hardware as an HA System
8.4.1.5 Monitor the Network Bandwidth Between Processes and Agents
8.4.2 Unscheduled Outages for Enterprise Manager
8.5 Additional Enterprise Manager Configuration
8.5.1 Configure a Separate Listener for Enterprise Manager
8.5.2 Install the Management Repository Into an Existing Database

9 Recovering from Outages

9.1 Recovery Steps for Unscheduled Outages
9.1.1 Recovery Steps for Unscheduled Outages on the Primary Site
9.1.2 Recovery Steps for Unscheduled Outages on the Secondary Site
9.2 Recovery Steps for Scheduled Outages
9.2.1 Recovery Steps for Scheduled Outages on the Primary Site
9.2.2 Recovery Steps for Scheduled Outages on the Secondary Site
9.2.3 Preparing for Scheduled Secondary Site Maintenance

10 Detailed Recovery Steps

10.1 Summary of Recovery Operations
10.2 Complete or Partial Site Failover
10.2.1 Complete Site Failover
10.2.2 Partial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
10.3 Database Failover
10.3.1 When to Use Data Guard Failover
10.3.2 When Not to Use Data Guard Failover
10.3.3 Data Guard Failover Using SQL*Plus
10.3.3.1 Physical Standby Failover Using SQL*Plus
10.3.3.2 Logical Standby Failover Using SQL*Plus
10.4 Database Switchover
10.4.1 When to Use Data Guard Switchover
10.4.2 When Not to Use Data Guard Switchover
10.4.3 Data Guard Switchover Using SQL*Plus
10.4.3.1 Physical Standby Switchover Using SQL*Plus
10.4.3.2 Logical Standby Switchover Using SQL*Plus
10.5 RAC Recovery
10.5.1 RAC Recovery for Unscheduled Outages
10.5.1.1 Automatic Instance Recovery for Failed Instances
10.5.1.2 Automatic Service Relocation
10.5.2 RAC Recovery for Scheduled Outages
10.5.2.1 Disabling CRS-Managed Resources
10.5.2.2 Planned Service Relocation
10.6 Apply Instance Failover
10.6.1 Performing an Apply Instance Failover Using SQL*Plus
10.6.1.1 Step 1: Ensure That the Chosen Standby Instance is Mounted
10.6.1.2 Step 2: Verify Oracle Net Connection to the Chosen Standby Host
10.6.1.3 Step 3: Start Recovery on the Chosen Standby Instance
10.6.1.4 Step 4: Copy Archived Redo Logs to the New Apply Host
10.6.1.5 Step 5: Verify the New Configuration
10.7 Recovery Solutions for Data Failures
10.7.1 Detecting and Recovering From Datafile Block Corruption
10.7.1.1 Detecting Datafile Block Corruption
10.7.1.2 Recovering From Datafile Block Corruption
10.7.2 Recovering From Media Failure
10.7.2.1 Determine the Extent of the Media Failure
10.7.2.2 Replace or Move Away From Faulty Hardware
10.7.2.3 Decide Which Recovery Action to Take
10.7.3 Recovery Methods for Data Failures
10.7.3.1 Use RMAN Datafile Media Recovery
10.7.3.2 Use RMAN Block Media Recovery
10.7.3.3 Re-Create Objects Manually
10.7.3.4 Use Data Guard to Recover From Data Failure
10.8 Recovering from User Error with Flashback Technology
10.8.1 Resolving Row and Transaction Inconsistencies
10.8.1.1 Flashback Query
10.8.1.2 Flashback Version Query
10.8.1.3 Flashback Transaction Query
10.8.1.4 Example: Using Flashback Technology to Investigate Salary Discrepancy
10.8.2 Resolving Table Inconsistencies
10.8.2.1 Flashback Table
10.8.2.2 Flashback Drop
10.8.3 Resolving Database-Wide Inconsistencies
10.8.3.1 Flashback Database
10.8.3.2 Using Flashback Database to Repair a Dropped Tablespace
10.9 RAC Rolling Upgrade
10.9.1 Applying a Patch with opatch
10.9.2 Rolling Back a Patch with opatch
10.9.3 Using opatch to List Installed Software Components and Patches
10.9.4 Recommended Practices for RAC Rolling Upgrades
10.10 Upgrade with Logical Standby Database
10.11 Online Object Reorganization
10.11.1 Online Table Reorganization
10.11.2 Online Index Reorganization
10.11.3 Online Tablespace Reorganization

11 Restoring Fault Tolerance

11.1 Restoring Full Tolerance
11.2 Restoring Failed Nodes or Instances in a RAC Cluster
11.2.1 Recovering Service Availability
11.2.2 Considerations for Client Connections After Restoring a RAC Instance
11.3 Restoring the Standby Database After a Failover
11.3.1 Restoring a Physical Standby Database After a Failover
11.3.1.1 Step 1P: Retrieve STANDBY_BECAME_PRIMARY_SCN
11.3.1.2 Step 2P: Flash Back the Previous Production Database
11.3.1.3 Step 3P: Mount New Standby Database From Previous Production Database
11.3.1.4 Step 4P: Archive to New Standby Database From New Production Database
11.3.1.5 Step 5P: Start Managed Recovery
11.3.1.6 Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
11.3.2 Restoring a Logical Standby Database After a Failover
11.3.2.1 Step 1L: Retrieve END_PRIMARY_SCN
11.3.2.2 Step 2L: Flash Back the Previous Production Database
11.3.2.3 Step 3L: Open New Logical Standby Database and Start SQL Apply
11.4 Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage
11.4.1 Step 1: Start the Standby Database
11.4.2 Step 2: Start Recovery
11.4.3 Step 3: Verify Log Transport Services on Production Database
11.4.4 Step 4: Verify that Recovery is Progressing on Standby Database
11.4.5 Step 5: Restore Production Database Protection Mode
11.5 Restoring Fault Tolerance after a Standby Database Data Failure
11.5.1 Step 1: Fix the Cause of the Outage
11.5.2 Step 2: Restore the Backup of Affected Datafiles
11.5.3 Step 3: Restore Required Archived Redo Log Files
11.5.4 Step 4: Start the Standby Database
11.5.5 Step 5: Start Recovery or Apply
11.5.6 Step 6: Verify Log Transport Services On the Production Database
11.5.7 Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database
11.5.8 Step 8: Restore Production Database Protection Mode
11.6 Restoring Fault Tolerance After the Production Database Has Opened Resetlogs
11.6.1 Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production
11.6.2 Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production
11.7 Restoring Fault Tolerance after Dual Failures

A Hardware Assisted Resilient Data (HARD) Initiative

A.1 Preventing Data Corruptions with HARD-Compliant Storage
A.2 Data Corruptions
A.3 Types of Data Corruption Addressed by HARD
A.4 Possible HARD Checks

B Database SPFILE and Oracle Net Configuration File Samples

B.1 SPFILE Samples
B.2 Oracle Net Configuration Files
B.2.1 SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
B.2.2 LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
B.2.3 TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration

Index