Oracle® Database Administrator's Guide 11g Release 1 (11.1) Part Number B28310-01 |
|
|
View PDF |
The two-phase commit mechanism ensures that all nodes either commit or perform a rollback together. What happens if any of the three phases fails because of a system or network error? The transaction becomes in-doubt.
Distributed transactions can become in-doubt in the following ways:
A server machine running Oracle Database software crashes
A network connection between two or more Oracle Databases involved in distributed processing is disconnected
An unhandled software error occurs
The RECO process automatically resolves in-doubt transactions when the machine, network, or software problem is resolved. Until RECO can resolve the transaction, the data is locked for both reads and writes. The database blocks reads because it cannot determine which version of the data to display for a query.
This section contains the following topics:
In the majority of cases, the database resolves the in-doubt transaction automatically. Assume that there are two nodes, local
and remote
, in the following scenarios. The local node is the commit point site. User scott
connects to local
and executes and commits a distributed transaction that updates local
and remote
.
Figure 32-5 illustrates the sequence of events when there is a failure during the prepare phase of a distributed transaction:
The following steps occur:
User SCOTT
connects to Local
and executes a distributed transaction.
The global coordinator, which in this example is also the commit point site, requests all databases other than the commit point site to promise to commit or roll back when told to do so.
The remote
database crashes before issuing the prepare response back to local
.
The transaction is ultimately rolled back on each database by the RECO process when the remote site is restored.
Figure 32-6 illustrates the sequence of events when there is a failure during the commit phase of a distributed transaction:
The following steps occur:
User Scott
connects to local
and executes a distributed transaction.
The global coordinator, which in this case is also the commit point site, requests all databases other than the commit point site to promise to commit or roll back when told to do so.
The commit point site receives a prepared message from remote
saying that it will commit.
The commit point site commits the transaction locally, then sends a commit message to remote
asking it to commit.
The remote
database receives the commit message, but cannot respond because of a network failure.
The transaction is ultimately committed on the remote database by the RECO process after the network is restored.
See Also:
"Deciding How to Handle In-Doubt Transactions" for a description of failure situations and how the database resolves intervening failures during two-phase commitYou should only need to resolve an in-doubt transaction in the following cases:
The in-doubt transaction has locks on critical data or undo segments.
The cause of the machine, network, or software failure cannot be repaired quickly.
Resolution of in-doubt transactions can be complicated. The procedure requires that you do the following:
Identify the transaction identification number for the in-doubt transaction.
Query the DBA_2PC_PENDING
and DBA_2PC_NEIGHBORS
views to determine whether the databases involved in the transaction have committed.
If necessary, force a commit using the COMMIT FORCE
statement or a rollback using the ROLLBACK FORCE
statement.
See Also:
The following sections explain how to resolve in-doubt transactions:A system change number (SCN) is an internal timestamp for a committed version of the database. The Oracle Database server uses the SCN clock value to guarantee transaction consistency. For example, when a user commits a transaction, the database records an SCN for this commit in the redo log.
The database uses SCNs to coordinate distributed transactions among different databases. For example, the database uses SCNs in the following way:
An application establishes a connection using a database link.
The distributed transaction commits with the highest global SCN among all the databases involved.
The commit global SCN is sent to all databases involved in the transaction.
SCNs are important for distributed transactions because they function as a synchronized commit timestamp of a transaction, even if the transaction fails. If a transaction becomes in-doubt, an administrator can use this SCN to coordinate changes made to the global database. The global SCN for the transaction commit can also be used to identify the transaction later, for example, in distributed recovery.