Oracle Intelligent Agent User's Guide Release 9.2.0.2 Part Number A96676-02 |
|
|
View PDF |
This chapter covers generic troubleshooting strategies in the event your Intelligent Agent does not function properly. The following topics are discussed:
Under most circumstances, the Intelligent Agent itself requires very little in the way of configuration. In order to function properly, however, the Agent must be able to communicate with the managing host and managed services. If you are familiar with Oracle and your operating system, using the following abbreviated checklists will likely solve problems that can interfere with Agent operation.
The following checklists cover the areas most likely to affect Agent operation. Agent troubleshooting checklists have been divided according to the two most common platforms on which the Agent is run: Windows NT and UNIX. The checklists are abbreviated and assume knowledge of both Oracle, the operating system, and related communication protocols. Specific troubleshooting procedures are covered in detail later in this chapter.
If you are running an Agent on a Windows NT system, use the following checklist.
ORACLE_HOME\network\admin
directory, and services.ora is in the ORACLE_HOME\network\agent
directory.
Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Agent Configuration Files" for valid sample files.
If services are missing, check the following files for inconsistency or corruption:
If you are trying to do backups, you must run backupts.sql with the dbsnmp/dbsnmp
account.
The Agent is a service and runs by default as SYSTEM. It also needs DLLs from the ORACLE_HOME/BIN
directory. If you need mapped drives in your path, you MUST NOT set them in the SYSTEM path.
To set your own path:
dbsnmp.trace_level=admin
(or 16 if you want maximum information)
dbsnmp.trace_directory=
<any directory in which the Oracle user has write privileges>
dbsnmp.trace_file=<name of the trace output file>
dbsnmp.trace_unique=true/false (When TRUE, this parameter generates a unique trace file each time.)
oracle_home/network/log
directory.
DBSNMP.LOG
should show general Agent problems.
DBSNMP.NOHUP should show any errors related to the Agent's "watchdog" dbsnmpwd process.
NMICONF.LOG
should show problems with auto-discovery.
If you are running an Agent on a UNIX system, use the following checklist.
agentctl status
Alternatively, you can check to see if the Intelligent Agent is running by entering the following command:
ps -eaf | grep dbsnmp
These checks should show that a "dbsnmp" process is running and/or "dbsnmpwd" watchdog script is running.
ORACLE_HOME/network/log/dbsnmp*.log
file for errors on UNIX. (nmiconf.log for discovery).Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Agent Configuration Files" for valid sample files.
If services are missing, check the following files for inconsistency or corruption:
If you are trying to do backups, you must run backupts.sql with the dbsnmp/dbsnmp
account.
If after going through the quick checks your Intelligent Agent still is not functioning correctly, use the following section to cover other areas of Agent operation that are less probable causes of Agent operating problems. In addition, many of the steps in the checklists are covered in greater detail for those users who may be less familiar with Oracle and/or the operating system on which the Agent is running. The following questions are covered in this section:
One of the most common problems that prevents the Agent from starting is TCP/IP configuration. To check whether your TCP/IP setup is configured correctly, issue the following commands at the command line:
telnet <hostname>
If these files have never been used, only sample files will exist in the directory. Either rename or copy the .sam files to just the file name with no extension.
(UNIX) Log in as root and edit the /etc/hosts file.
NIS/DNS setups may require modification at the NIS/DNS level to correct TCP/IP configuration problems.
Example: (Windows NT)
HOSTS file: 122.111.111.111 myHost LMHOSTS file: 122.111.111.111 myHost #PRE
$ORACLE_HOME\network\admin\snmp_rw.ora files.
Before Release 8.0.4 of the Agent, the NT Agent required the DNS Hostname and the Computer Name to be identical. These parameters can be checked/changed from the following Windows NT Control Panel property sheets.
To verify the computer name:
To verify the DNS Name:
In addition to proper network configuration, which allows nodes in your network to communicate, components of your Oracle environment must also be able to communicate with each other. Oracle Net provides the session and data communication medium between client machines and Oracle servers, or between Oracle servers. For this reason, proper Oracle Net configuration is a prerequisite for Agent communication. This section covers the most common problems that can occur when Agent communication fails.
Oracle Net configuration files are found in either the $TNS_ADMIN location or the $ORACLE_HOME/network/admin directory (both UNIX and NT).
Primary configuration files are:
See Appendix A, "Agent Configuration Files" for information and examples of the above files.
TNS_ADMIN variable usage during Agent Discovery
All versions of the Unix discovery script allow the use of the TNS_ADMIN variable to locate input files (listener.ora and tnsnames.ora). Only Agent versions 7.3.4 and above correctly write the output files (snmp_ro.ora and snmp_rw.ora) into TNS_ADMIN, if set.
Beginning with version 8.0.5, the discovery script also reads the TNS_ADMIN value from the NT Registry.
The Agent also uses the TNS alias information found in the listener.ora file. The Agent does so even within an Oracle names environment. This behavior is intentional since an Oracle Names server may be temporarily unavailable and the Agent needs to be able to resolve names at all times. Check the following to make sure the local translation of the TNS alias takes place:
Do not activate the listener on port 1748, since Agent is listening on this port. (This is the reason you can use TNSPING against the Agent; TNSPING cannot differentiate between a listener and an Agent)
If your Oracle Net configuration is correct and you are still unable to contact the Agent, the next step is to determine whether services in your Oracle Net network can be reached. You can use the TNSPING utility on each database you want to access by entering the following at the command prompt:
tnsping <network service name>
If you can connect successfully from a client to a server (or from a server to a server) using TNSPING, the command will return an estimate of the round trip time (in milliseconds) it takes to reach the Oracle Net service. This indicates Oracle Net is functioning properly.
If you want to see if an Agent is up, do this:
$ tnsping (address=(protocol=tcp)(host=<<hostname>>)(port=1748))
You can also use the TNSPING utility to test connections to the Agent:
$ tnsping "(address=(protocol=tcp)(host=<<hostname>>)(port=1748))"
If there is a successful connection, you will see a message similar to the following:
Attempting to contact (address=(protocol=tcp)(host=<<hostname>>)(port=1748)) OK (750 msec)
If a host which is not in the invited_nodes list is trying to contact the node, the following error will be shown in the output:
Attempting to contact (address=(protocol=tcp)(host=<<hostname>>)(port=1748)) TNS-12547: TNS: lost contact
To check whether the Agent process is running issue the following command:
agentctl status
If the Agent did not start up, use any of the hints listed in the following table:
UNIX | Windows NT |
---|---|
$ORACLE_HOME/network/log/dbsnmp*.log file for errors |
Check for messages written to the NT Event Viewer (under Administrative Tools) since this is where the NT Agent writes any problems associated with startup. |
$ORACLE_HOME/network/log/nmiconf.log file for errors. |
$ORACLE_HOME/network/log/nmiconf.log file for errors. |
Check that the Oracle user has write permissions to the following directory: $ORACLE_HOME/network/log |
Check the properties of the Agent Service to verify the OS account used by the Agent (default is 'System') Check that the Agent user has write permissions to the following directory: $ORACLE_HOME/network/log |
Check snmp_ro.ora, snmp_rw.ora, and services.ora for the entries created by the Agent. The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME/network/admin directory, and services.ora is in the $ORACLE_HOME/network/agent directory. |
Check if snmp_ro.ora, snmp_rw.ora, and services.ora are created by the Agent on startup.The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME\network\admin directory, and services.ora is located in the $ORACLE_HOME\network\agent directory. |
Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption: |
Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption: |
Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly? |
Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly? |
If you still do not know why the Agent did not start, turn on tracing. (see Tracing the Intelligent Agent) |
Check that you DO NOT have a systems path variable containing external drives. The Agent is a service and runs by default as SYSTEM. It also needs DLLs from the $ORACLE_HOME/bin directory. If you need external mapped drives in your path, you MUST NOT set them in the SYSTEM path. To set your own path: |
Check the $ORACLE_HOME/network/log/AGENTSRVC.log file. This file will show startup errors that occured when the Agent service was started. |
|
If you still do not know why the Agent did not start, turn on tracing. For more information on setting up Agent tracing, see "Tracing the 9i Agent") |
For both UNIX and Windows NT systems check:
$ORACLE_HOME/network/log/dbsnmp.nohup
To test whether an Agent can connect to the database(s) it monitors on a given node, try connecting to each database with the following connect string:
dbsnmp/dbsnmp@address_list
You must perform this test on the node where the Agent resides.
To verify whether the Agent has the correct user permissions, see Installing the Intelligent Agent on page 2-2 .
An OS user needs to be specified for the node and must have the following permissions:
(Windows NT) Check the NT EVENT VIEWER -> APPLICATIONS -> LOG for any errors starting the DBSNMP process.
(Windows NT and UNIX) Check the $ORACLE_HOME/network/log/nmiconf.log file for discovery errors.
For both UNIX and Windows NT systems check the following file for additional errors:
$ORACLE_HOME/network/log/dbsnmp.nohup
The following error messages and resolution are categorized by operating system. Situations that apply to all systems are listed under "Generic Agent."
In order for the Agent to execute jobs on a managed node, the following conditions must be met:
This usually happens if you have a databases prior to 7.3.3 on the machine. From V7.3.3 onwards, a script called CATSNMP.SQL is included in the CATALOG.SQL dictionary script. This script is responsible for creating the DBSNMP user the Agent needs to connect. Older databases did not have this script yet.
Verify if the user 'DBSNMP' exists. If not, run the catsnmp.sql script.
This message comes from the discovery script, nmiconf.tcl. Make sure you have $ORACLE_HOME environment variable set to the ORACLE_HOME of the Agent and re-start the Agent.
If you have more than one database on a single node, then you need to make sure that each instance has a unique database name by specifying one of the following:
This error can occur if the Agent cannot write to $ORACLE_HOME\network\admin. Refer to the $ORACLE_HOME\network\log\nmiconf.log for errors. For more information on Agent startup problems, see "Did the Agent startup successfully?".
Check the services.ora file to determine which services have been discovered.
All the database services the Agent finds on a machine, must be defined in the relevant SQL*Net/Oracle Net configuration files. If the service(s) are not defined, service discovery will fail and, in the worst case, the Agent will hang or return errors.
Windows NT: Beginning with version 8.0.4, the Agent searches for service names that begin with 'OracleService' or 'OracleService<SID>'. Every entry beginning with 'OracleService' is considered to be a database running on this machine. Every SID encountered by the Agent must be defined in the relevant SQL*Net/Oracle Net files.
UNIX: The oratab file is used to determine which SIDs are present. For 7.3.3 Agents and earlier, discovery fails if it encounters a SID that is not accurate (like in a Developer 2000 environment). To work around this problem, the environment variable $ORATAB can be used to access an alternate oratab file which contains only the databases you wish the Agent to see.
For the remaining databases, check the oratab file, and the SQL*Net/Oracle Net files to see if these files exist and that all definitions are present. Make sure that all of the databases are listed in the listener.ora file. For more information, see "Are the Oracle Net Configuration files correct?" and "Is Oracle Net functioning properly?" .
OR
These errors are usually seen when the services on the Enterprise Manager Console and the services discovered by the Agent are out of sync. For example, if you have an event registered against TESTDB and someone changes the name of the database to PRODDB, that Agent and Console are out of sync.
To fix this, start by removing all job and event registrations from this service and dropping the node where the services exist from the console. Rediscover the node from the console using the auto-discovery wizard.
NOTE: With 7.3.2 the alias are case sensitive.
If you have a NT Agent please refer to 'Invalid service name' while registering a job or event.
You may receive this error while executing a TCL script using the oratcl verb oralogon. "Oralogin failed in orlon" means that the connect string is either wrong or for some reason, the account used cannot logon to the database. To debug the this error, turn on Tcl tracing.
Invalid username/password errors may occur when starting up the Agent on UNIX systems from an X-terminal. This problem can occur because the Data Gatherer (pre-9i) cannot connect to the Capacity Planner repository to upload collected data. This message will repeat every couple of minutes.
See Oracle Intelligent Agent - Windows Event Log Messages for information on Windows NT-Agent error message cross-referencing.
There are in fact two hostname definitions on NT: One NETBios one, used for the NT's internal Named Pipes protocol, which is always installed. The other is the TCP/IP hostname, which is only configurable when you install TCP/IP on NT.
To find the NT NetBios hostname:
To find the TCP/IP hostname:
On an NT server, you can 'ping' the two names, even if they are configured differently. Other clients, however, only 'ping' real TCP/IP hostnames. If the Agent is using local IPC connections, it uses Named Pipes, thus it uses the NetBios name. All external connections will use the TCP/IP name.
A mismatch in these names leads to 'unable to contact Agent', or forever pending jobs in the console. Therefore, make sure that the NetBios and the TCP/IP hostname are identical.
The Windows NT user that you created for the Agent (see Agent Configuration, Configuration Guide) needs read/write permissions to the $ORACLE_HOME\network\agent directory (and TEMP directory, for some applications) and read permissions to the SYSTEM32 directory
Verify that the NT user has these permissions.
First check that all of the SQL*Net/Oracle Net files are present and correctly defined. You can then debug discovery by editing your oratab file contains only a valid SID with a listener running. After you get this working, you can add the remaining entries in the oratab file to see which entry is causing the problem.
Check the $ORACLE_HOME/network/log/nmiconf.log files for errors.
There are two possible causes for this error:
Only have one Agent on a machine.
To confirm port is being used by someone else
netstat -a | grep 1748
If any result shown on screen that ends in "LISTENING" then the port is in use.
If it still fails to start the Agent, go through steps again, but before re-starting the AGENT, do this.
This will re-start the Agent and remove all of the job and event queues (.q files) it was using in the past.
If all else fails, re-booting the machine will free up the port.
This message indicates that the SNMP Master Agent (the process on UNIX that controls the SNMP protocol) could not be contacted. By default the Agent listens and works over SQL*Net or Oracle Net, but the Agent can also work over SNMP on UNIX systems.
This message can safely be ignored unless you are trying to communicate with a Master Agent.
Events registered with the Agent for monitoring a 9i version of the database will not work because the database account is locked.
Under these conditions, an Enterprise Manager database up-down event will always indicate that the database is down. The Agent's log file dbsnmp.log will contain a NMS-00207 error message indicating the dbsnmp user account for the database is locked.
To resolve this problem, you must log into the database and perform the following:
ALTER USER dbsnmp ACCOUNT UNLOCK;
ALTER USER dbsnmp IDENTIFIED BY <password>;
SNMP.CONNECT.<service_name>.PASSWORD=<password>
where service_name is the name of the seed database as discovered by the Agent in snmp_ro.ora/snmp_rw.ora.
Run the catsnmp.sql script for that database with either the SYS or INTERNAL accounts.
The 'dbsnmp' user could not be located.
Run the catsnmp.sql script for that database with either the SYS or INTERNAL accounts.
This happens if there mismatches between the ID's in the '*.q' files in the $ORACLE_HOME/network/agent directory. This condition can be caused by the following
Delete all the '*.q' in the $ORACLE_HOME/network/agent directory. Rebuild your repository. Restart the Agent.
Tracing and logging of the Intelligent Agent allows tracking of all communication between the Intelligent Agent and Management Server(s) as well as Agent startup and discovery information. To turn on tracing for the 9i Intelligent Agent, you will need to modify the Agent's snmp_rw.ora file. This file is normally in the $ORACLE_HOME\network\admin directory. The snmp_rw.ora is created the first time the Agent process is started. If the file is not created and you need to trace the startup process, manually create a text file and add the necessary tracing parameters to the file.
The log file, $ORACLE_HOME/network/log/dbsnmp.log, is written by the Agent on every startup, even if tracing is not turned on. It contains the name and version of the Agent and the name and location of the Agent's configuration files. If tracing is turned on, it also contains problems encountered with the database and listener connections.
The log file, $ORACLE_HOME/network/log/nmiconf.log, is created upon first start up of the Agent and appended upon subsequent Agent startups. The auto discovery is done by the Tcl script, nmiconf.tcl (hence, the log file name). This file is written to only during startup. $ORACLE_HOME/agentbin/ORATCLSH is a special-purpose TCL shell that supports all standard TCL verbs (supported in TCL82) plus a large subset (not all) of the ORATCL verbs supported by the Intelligent Agent. ORATCLSH is not a general purpose utility and may only be used in combination with the Intelligent Agent as it depends on files and data structures maintained by the Agent.
There is no documentation of ORATCLSH as it has never been part of the supported feature set of the Intelligent Agent. It is provided strictly as a debugging tool to help Oracle customers and developers in developing Enterprise Manager job and event scripts. The executable ORATCLSH is provided for debugging your TCL scripts. Before executing ORATCLSH, set the environment variable TCL_LIBRARY to point to $ORACLE_HOME/network/agent/tcl, the location of the init.tcl file.
By default the following log files are created under the Agent's ORACLE_HOME/network/log directory:
Setting various tracing and logging parameters in the snmp_rw.ora file allows you to monitor the following areas:
The following tables organize the tracing and logging parameters according to their respective functional areas.
Because the data collection service (formerly the Data Gatherer) functionality has been integrated into the 9i Intelligent Agent, data collection-based tracing can be turned on using one of the following procedures (according to platform).
>setenv VP_DEBUG 1
> agentctl start agent.
Any collection activity will be logged in $ORACLE_HOME/network/log/dbsnmp.nohup.
>set VP_DEBUG=1
> dbsnmp -agent_name Oracleora920Agent > stdout.log2> odg.log
>set VP_DEBUG 1
> dbsnmp -agentname Oracleora920Agent > stdout.log2> odg.log
If there are multiple ORACLE_HOMEs on the same machine, perform the following.
e:/920/bin/ > set VP_DEBUG=1
> agentctl start agent.
You can also turn on Event tracing by setting the dbsnmp.trace_level parameter in the snmp_rw.ora file to a level greater than or equal to one (dbsnmp.trace_level >= 1). You must shut down and re-start the Agent for these parameters to take effect. Tcl tracing creates a file, oratcl.trc, in the ORACLE_HOME/network/trace directory. Every time an event is triggered, an entry is added to the oratcl.trc file.
When the Oracle Intelligent Agent service fails or fails to start with a pre-determined error code it calls the Windows ReportEvent function to write an entry to the event log, Windows NT then passes the parameters to the event-logging service. This in turn uses the information to write a log record to the event log. Other errors are reported to the nmi.log and nmiconf.log.
When the Windows NT event viewer application starts it uses the OpenEventLog function to open the event log for an event source. The event viewer can then use the ReadEventLog function to read event records from the log. ReadEventLog returns a buffer containing an EVENTLOGRECORD structure and additional information that describes a logged event.
When the Intelligent Agent service fails to start, the Windows Service Manager will return the underlying error code, but because it is not able to interpret the Oracle Event Message it incorrectly returns the Windows NT Win32 message text. The correct message, however, will appear in the NT event log.
The following table defines the events that the Intelligent Agent displays in the Windows NT Event viewer and the associated Win32 error text :
The Intelligent Agent software is delivered with the RDBMS server software. However, this does not mean that the Intelligent Agent software must be installed together with the database. It is quite possible to install the Agent alone, in a dedicated "$ORACLE_HOME", separate from the rest of the Oracle software.
The only thing the Intelligent Agent needs is SQL*Net or Oracle Net, to make connections to the databases it needs to monitor, and to be able to communicate with the rest of the Enterprise Manager framework. The SQL*Net or Oracle Net product, and the underlying 'Common Libraries' are products that are installed automatically whenever the Agent is installed on UNIX.
It is not necessary to have a SQL*Net listener running in the same "$ORACLE_HOME" as the Agent, and it also does not have to be of the same base version as the Agent.
As soon as the software is installed from the installation media, the Intelligent Agent is relinked using the current system libraries present on the system.
If a change is made to the UNIX kernel, or the system libraries, it is advised to relink the Intelligent Agent, using the following commands:
$ cd $ORACLE_HOME/network/lib $ make -f ins_oemagent.mk install
This will create two new executables: dbsnmp and oratclsh, both created in the "$ORACLE_HOME/bin" directory. If a version of these executables already exists, the old ones will be renamed to "dbsnmp0" and "oratclsh0". As soon as the new software has been tested, these safety copies can be removed.
When relinking database software, or after making changes to the installed SQL*net protocols drivers, this does require a shutdown of all databases of that version!
As soon as the agent is relinked however, either the "root.sh" file needs to be executed again, or a manual intervention is needed to adjust the dbsnmp executable.
If the "root.sh" file is meanwhile adjusted or overwritten by a newer file, the following commands have to be executed manually, as the 'root' user:
$ cd $ORACLE_HOME/bin $ chmod 6751 dbsnmp $ chown root dbsnmp
These steps are essential for the proper working of the agent. If the agent has not the 'setuid' permissions (given by the chmod '6751' command), or is not owned by root ('chown root' command), the discovery can fail, and jobs will not get executed properly.
Also, when the agent has already been started on the machine, some of the files do have the 'root' ownership, making the agent fail to start, and update the wrong files, after a relink and an ownership back to the Oracle owner.
Whenever a program runs on UNIX under the 'root' permissions, the environment variable "LD_LIBRARY_PATH" is not read for security reasons. This means that all shared libraries, linked dynamically to the .EXE will have to be referenced using their absolute location on the disk.
You can check the linked shared libraries using the 'LDD' command. For example:
$ ldd dbsnmp
To avoid problems with the shared libraries, there are three options:
If the libraries are statically linked in, the 'ld' will not consider them 'shared', solving the problem. Downside here is that the executable will be a lot larger in size.
An easy and straightforward trick is to create a symbolic link to the needed shared library in "/usr/lib". Since this directory is hardcoded into the library loader, all shared objects placed in this directory will always be found.
Potential problems can arise in this case if several versions of software are installed on the same machine. Since the "/usr/lib" directory is the same for the entire machine, any file copied or linked in this directory will be used throughout the machine, meaning that other versions of software might be picking up the wrong shared object.
The best solution, but also the hardest, as it includes modifying the make file and adding the full hardcoded path to the used libraries. After the modification, relink the software, to make sure the libraries are found correctly.
Specifically for the Agent, the following errors can be encountered if there is a problem with the shared libraries:
ld: Can not map libclntsh.so.1.0 ld: Error opening libclntsh.so.1.0 ld: Can not find libnet.so
If an error is generated when the agent starts, perform the following:
$ make -f ins_oemagent.mk install
$ chmod 6751 dbsnmp $ chown root dbsnmp
Important: This section only applies pre-9i Intelligent Agents. The data collection services provided by the Data Gatherer are now integrated with the 9.x Intelligent Agent. |
The Oracle Data Gatherer is a daemon process which manages the collection of performance statistics from the Oracle database and from the host operating system for use by Enterprise Manager tools, such as the Oracle Performance Manager and the Oracle Capacity Planner. As mentioned above, this functionality is now an integral part of 9.x Agent. This section only applies to older versions of the Agent/Data Gatherer.
The Data Gatherer collects the following types of data:
You may not be able to collect operating system data if the Data Gatherer is not available for that particular operating system. To collect operating system data the Data Gatherer must be installed on the same host as the OS. It may not be possible to install and configure the Oracle Data Gatherer on a particular host if both of the following requirements are not met:
This is the assumed configuration for using the Oracle Data Gatherer. The Data Gatherer is installed separately as part of the database server installation. In 8.0.5 or higher, the Data Gatherer is installed along with the Intelligent Agent (but not an integral part of the Agent, as is the case for 9.x versions of the Agent). The Data Gatherer can be used to monitor database statistics for any database on that host, and also can be used to monitor OS statistics for the host itself.
It is possible to install the Data Gatherer on the same host where the client will be run from (if the client is Windows NT 4.0, this is not supported on Windows 95 or Windows 98). In this configuration, you will be able to monitor the database statistics for remote databases, but will not be able to monitor the operating system statistics on the remote host.
The Data Gatherer and clients are installed and run assuming the use of a well-known (IANA registered) port (1808), which is used for communication between clients and the Data Gatherer server. If you wish to use an alternate port, you may do so, however this configuration is not supported.
It is possible to install the Oracle Data Gatherer in an environment with multiple Oracle homes, however there are two issues to keep in mind if you attempt to do this:
The Data Gatherer is configured to save the state of all current historical collections, such that when it restarts it will create recovery threads to restart these collections from the state files. If the Data Gatherer is configured to start automatically when the system reboots, then collections should be able to continue.
If the database from which data is being collected is cycled (e.g. shutdown each evening for backups) then the Data Gatherer is designed to continue collecting data from the database when it restarts. The Data Gatherer attempts to reconnect to the database at the specified collection interval until it becomes available.
UNIX: vppcntl -start (UNIX command).
NT: From the Control Panel Services applet start the OracleDataGatherer service.
UNIX: vppcntl -stop(UNIX command).
NT: From the Control Panel Services applet stop the OracleDataGatherer service.
vppcntl - status
This status check will result in one of two messages:
Clean starting the Agent involves clearing all existing job and event definitions. This should only be necessary when the Enterprise Manager environment needs to be reinitialized, or upon specific request from Customer Support. Actions will need to be performed from both the Console and the Agent node.
To clean start the Agent:
On UNIX, issue the following command:
$ agentctl stop agent
After the stop command has been issued, use the 'ps' command to verify that the 'dbsnmp' processes have been stopped. If the Intelligent Agent cannot be stopped, use the 'ps' command to obtain the process ID's of the dbsnmp processes, and use the 'kill -9' command to terminate these processes.
On Windows NT, use the Control Panel / Services applet to turn the Agent service off, or issue the command line option:
C:\> agentctl stop agent
When the agent is stopped, use the TaskManager to verify that the 'dbsnmp' process has been stopped.
Information about the host on which the agent is running is also stored in these files. If the name or the IP address of the agent machine changes, these files need to be recreated.
This file should never be edited by a user. During startup, this file is read if it exists, and then recreated again with the new discovery information.
Some of the information in this file can be edited by an administrator to provide more info about the discovered services.
Upon Agent startup, this file is read if it exists, and after the discovery written again. Information is only added to this file.
The following files, located in the "$ORACLE_HOME/network/log" directory. They are used during startup of the agent:
You should also remove all files from $ORACLE_HOME/network/agent/library directory.
snmp.visibleservices = ()
dbsnmp.trace_level=16 dbsnmp.trace_unique=true
After the agent has started, verify the "SERVICES.ORA" file first. If this file contains all the services on the machine, then check the "SNMP_RO.ORA" and "SNMP_RW.ORA" files. Discovery problems can be found in the file "NMICONF.LOG".
The discovery process on UNIX involves the following actions:
1 file per $ORACLE_HOME
Located in one of the following locations (using this order searching for it):
1 per Intelligent Agent
Located in $ORACLE_HOME/network/log
1 per Intelligent Agent
Located in $ORACLE_HOME/network/agent/config
1 per Intelligent Agent
Located in $ORACLE_HOME/network/agent/config
Only 1 per machine
Located in either /etc or /var/opt/oracle
1 per Intelligent Agent
Located in $ORACLE_HOME/network/agent
1 per Intelligent Agent
Located in $TNS_ADMIN or $ORACLE_HOME/network/admin
1 per Intelligent Agent
Located in $TNS_ADMIN or $ORACLE_HOME/network/admin
1 file per $ORACLE_HOME
Located in either (using this order searching for it):
1 file per $ORACLE_HOME
Located in either (using this order searching for it):
The ORATAB file is located in either the /etc, or the /var/opt/oracle directory.
You should consult your OS-specific documentation to see which directory is used to store the configuration files.
ORATAB=$ORACLE_HOME/network/agent/oratab; export ORATAB
The result of the ORATAB parsing is two lists:
For every ORACLE_HOME on the system, the Agent looks for the SQL*Net or Oracle Net files. It needs the SQLNET.ORA and LISTENER.ORA files first, to obtain the database service definitions. Sometimes, in the case of missing information, the TNSNAMES.ORA file is also required.
The Agent searches for the SQL*Net files in the follow order:
Once the SQL*Net configuration directory is established, the actual reading of the information can begin.
Only one parameter is read from the SQLNET.ORA file: The names.default_domain parameter.
Using the same SQL*Net configuration directory, the information from the listener.ora file is read.
This contains two parts:
A message is logged in the NMICONF.LOG.
Example:
Warning : Multiple Listeners found for SID ORCL.
If there are duplicates service names encountered, the Agent constructs a new unique service name for this database. A message appears in the NMICONF.LOG warning about the newly constructed name.
The end result is a list of listeners. And for each listener the list of SIDs the listener works for. Every SID in these lists on its turn has, a list of the details needed for that database service.
As soon as all the files are parsed and treated, and all services are found, the Agent verifies if all the information is present and valid.
Example:
Warning : Listener LISTENER defined in
/oracle/815/network/admin/listener.ora will be ignored.
A message appears in the NMICONF.LOG saying the SID will be skipped.
Example:
Warning : No Listener found for SID ORCL. ORCL will be skipped
All remaining information is considered 'discovered' and is placed in the discovery files SNMP_RO.ORA, SNMP_RW.ORA and SERVICES.ORA.
The discovery process on Windows NT involves the following actions:
1 file per $ORACLE_HOME
Located in either (using this order searching for it):
1 per Intelligent Agent
Located in:
1 per Intelligent Agent
Located in:
1 per Intelligent Agent
Located in:
1 per Intelligent Agent
Located in:
1 per Intelligent Agent
Located in either (in order according to search priority):
1 per Intelligent Agent
Located in either (in order according to search priority):
1 file per $ORACLE_HOME
Located in either:
1 file per $ORACLE_HOME
Located in either (ordered according to search priority):
The registry is scanned for database services. For each 'OracleService' NT service found, a potential database service entry is created, and the corresponding ORACLE_HOME is determined.
Things to point here:
on UNIX.
Scanning the registry generates two lists:
For every ORACLE_HOME on the system, the Agent looks for the SQL*Net/Oracle Net files. It needs the SQLNET.ORA and LISTENER.ORA files first, to get the database service definitions. Sometimes, in case of missing information, the TNSNAMES.ORA file is also required.
The Agent looks for the SQL*Net files in this order:
Once the SQL*Net/Oracle Net configuration directory is established, the actual reading of the information can begin.
Only one parameter is read from the SQLNET.ORA file: The names.default_domain parameter.
Using the same SQL*Net/Oracle Net configuration directory, the information from the listener.ora file is read.
This contains two parts:
Example:
Warning : Multiple Listeners found for SID ORCL.
If there are duplicates service names encountered, the Agent will construct a new unique service name for this database. A message will appear in the NMICONF.LOG warning about the newly constructed name.
The end result here is a list of listeners, with for each listener the list of SID's this listener is working for. Every SID in those lists on its turn has a list of the details needed for that database service.
As soon as all the files are parsed and processed, and all services are found, the Agent verifies that all the information is present and valid.
Example:
Warning : Listener LISTENER defined in C:\ORA920\network\admin\listener.ora will be ignored
Example:
Warning : No Listener found for SID ORCL. ORCL will be skipped
Select username from v$session where username = 'DBSNMP'
If no data retrieved and the Agent is running, check that the user DBSNMP exists and connects ok.