Oracle® Database 2 Day + Performance Tuning Guide 11g Release 1 (11.1) Part Number B28275-01 |
|
|
View PDF |
Transient performance problems are short-lived and do not appear in the Automatic Database Diagnostic Monitor (ADDM) analysis. ADDM tries to report the most significant performance problems during an analysis period in terms of their effect on DB time. If a problem lasts for a brief time, then its severity might be averaged out or minimized by other performance problems in the entire analysis period. Therefore, the problem may not appear in the ADDM findings. Whether or not a performance problem is captured by ADDM depends on its duration compared to the interval between the Automatic Workload Repository (AWR) snapshots.
If a performance problem lasts for a significant portion of the time between snapshots, then it will be captured by ADDM. For example, if the snapshot interval is one hour, a performance problem that lasts 30 minutes should not be considered a transient performance problem because its duration represents a significant portion of the snapshot interval and will likely be captured by ADDM.
On the other hand, a performance problem that lasts 2 minutes could be a transient performance problem because its duration represents a small portion of the snapshot interval and will likely not show up in the ADDM findings. For example, if the system was slow between 10:00 p.m. and 10:10 p.m., but the ADDM analysis for the time period between 10:00 p.m. and 11:00 p.m. does not show a performance problem, a transient performance problem may have occurred for only a few minutes of the 10-minute interval reported by the user.
This chapter contains the following sections:
To capture a detailed history of database activity, Oracle Database samples active sessions each second with the Active Session History (ASH) sampler. The Automatic Workload Repository (AWR) snapshot processing collects the sampled data into memory and writes it to persistent storage. ASH is an integral part of the Oracle Database self-management framework and is extremely useful for diagnosing performance problems.
Unlike instance-level statistics gathered by AWR, sampled data is gathered at the session level by ASH. By capturing statistics for only active sessions, a manageable set of data is represented. The size of this data is directly related to the work being performed, rather than to the entire database instance.
Sampled data captured by ASH can be aggregated based on the various dimensions that it captures, including the following:
SQL identifier of a SQL statement
Object number, file number, and block number
Wait event identifier and parameters
Session identifier and session serial number
Module and action name
Client identifier of the session
Service hash identifier
You can run ASH reports to analyze transient performance problems with the database that only occur during specific times. This technique is especially useful when you are trying to do either of the following:
Resolve transient performance problems that may last for only a short period of time, such as why a particular job or session is not responding when the rest of the instance is performing as usual
Perform scoped or targeted performance analysis by various dimensions or their combinations, such as time, session, module, action, or SQL identifier
See Also:
This section describes how to generate ASH reports using Oracle Enterprise Manager (Enterprise Manager).
On the Performance page, under Average Active Sessions, click Run ASH Report.
The Run ASH Report page appears.
Enter the date and time for the start and end of the time period when the transient performance problem occurred.
In this example, database activity increased between 1:45 p.m. and 2:00 p.m., so an ASH report needs to be created for that time period.
Click Generate Report.
The Processing: View Report page appears while the report is being generated.
After the report is generated, the ASH report appears under Report Results on the Run ASH Report page.
Optionally, click Save to File to save the report in HTML for future analysis.
You can use an ASH report to identify the source of transient performance problems. The report is divided into titled sections. The following sections of the ASH report are a useful place to begin the investigation:
See Also:
Oracle Database Performance Tuning Guide for more detailed information about the ASH report
The Top Events section of the report describes the top wait events of the sampled session activity categorized by user, background, and priority. Use this information to identify the wait events that may be the cause of the transient performance problem.
The Top Events section of the report contains the following subsections:
The Top User Events subsection of the report lists the top wait events from user processes that accounted for the highest percentages of sampled session activity.
The example in Figure 7-1 shows that 84 percent of database activity is consumed by the CPU + Wait for CPU
event. In this example, the Load Profile section should be examined next to determine the type of activity that is causing this wait event.
The Top Background Events subsection of the report lists the top wait events from the background events that accounted for the highest percentages of sampled session activity.
The example in Figure 7-2 shows that 17.65 percent of sampled session activity is consumed by the CPU + Wait for CPU
event.
The Load Profile section of the report describes the load analyzed in the sampled session activity. Use the information in this section to identify the service, client, or SQL command type that may be the cause of the transient performance problem. The Top Service/Module subsection lists the services and modules that accounted for the highest percentages of sampled session activity.
The example in Figure 7-3 shows that 81 percent of database activity is consumed by the SYS$USERS
service running the SQL*Plus module. In this example, it appears that the user is running a high-load SQL statement that is causing the performance problem indicated in Figure 7-1. The Top SQL section of the report should be analyzed next to determine whether a particular type of SQL statement makes up the load.
The Top SQL section of the report describes the top SQL statements of the sampled session activity. Use this information to identify high-load SQL statements that may be the cause of the transient performance problem. One useful subsection is Top SQL with Top Events, which lists the SQL statements that accounted for the highest percentages of sampled session activity. The Sampled # of Executions column shows how many distinct executions of a particular SQL statement were sampled. To view the text of the SQL statements, click the SQL ID link.
The example in Figure 7-4 shows that 75 percent of database activity is consumed by a particular SELECT
statement. This statement was executed in the SQL*Plus module shown in Figure 7-3. It appears that this high-load SQL statement is causing the performance problem. The Top Sessions section should be analyzed to identify the session running this SQL statement.
See Also:
The Top Sessions section lists the sessions that were waiting for the wait event that accounted for the highest percentages of sampled session activity. Use this information to identify the sessions that accounted for the highest percentages of sampled session activity, which may be the cause of the performance problem.
The example in Figure 7-5 shows that 81 percent of database activity is used by the user SH
with the session ID of 147. Thus, it appears that this user was running the high-load SQL statement identified in Figure 7-4. You should investigate this session to determine whether it is performing a legitimate operation and tune the SQL statement if possible. If tuning the SQL is not possible and the session is causing an unacceptable performance impact on the system, consider terminating the session.
The Top DB Objects subsection lists the database objects (such as tables and indexes) that accounted for the highest percentages of sampled session activity.
The example in Figure 7-6 shows that the objects accounting for the most session activity are in the SYSTEM
and SYSAUX
tablespaces. In each row of the table, the event is db file sequential read
, which signifies that a user process is reading a buffer into the system global area (SGA) buffer cache and is waiting for a physical I/O call to return.
The Top DB Files subsection lists the database files that accounted for the highest percentages of sampled session activity.
The example in Figure 7-7 shows that most of the session activity involves the datafile in the SYSTEM
tablespace. This information is consistent with Figure 7-6, which shows that the objects accounting for the most session activity are located in the SYSTEM
and SYSAUX
tablespaces.
The Activity Over Time section of the ASH report is particularly useful for longer time periods because it provides in-depth details about activities and workload profiles during the analysis period. The Activity Over Time section is divided into multiple time slots.
Each of the time slots contains information regarding that particular time slot, as described in Table 7-1.
Table 7-1 Activity Over Time
Column | Description |
---|---|
Slot Time (Duration) |
Duration of the slot |
Slot Count |
Number of sampled sessions in the slot |
Event |
Top three wait events in the slot |
Event Count |
Number of ASH samples waiting for the wait event |
% Event |
Percentage of ASH samples waiting for wait events in the entire analysis period |
All inner slots are 2 minutes each and can be compared to each other. The first and last slots, which are also called the outer slots, are odd-sized because they are the only slots that do not have a fixed slot time.
When comparing the inner slots, perform a skew analysis by identifying spikes in the Event Count and Slot Count columns. A spike in the Event Count column indicates an increase in the number of sampled sessions waiting for a particular event. A spike in the Slot Count column indicates an increase in active sessions, because ASH data is sampled from active sessions only and a relative increase in database workload. Typically, when the number of active session samples and the number of sessions associated with a wait event increase, the slot may be the cause of the transient performance problem.
The example in Figure 7-8 indicates that the number of sampled sessions rose sharply in the first inner slot and fell sharply in the last inner slot. The slot count and event count peaked in the 13:54 p.m. time slot.