Fault and Failure Terminology


All Sun servers show two operational states that you can view and monitor using ALOM: ok and failed or failure. Some servers have an additional operational state: fault. This section explains the differences between the fault state and the failed state.

Fault State

A fault indicates that a device is operating in a degraded state, but the device is still fully operational. Due to this degradation, the device might not be as reliable as a device that does not show a fault, but it is still able to perform its primary function.

For example, a power supply shows a fault state when an internal fan has failed. However, the power supply can still provide regulated power as long as its temperature does not exceed the critical threshold. In this fault state, the power supply might not be able to function indefinitely, depending on the temperature, load, and efficiency. Therefore, it is not as reliable as a non-faulted power supply.

Failed State

A failure indicates that a device is no longer operational as required by the system. A device fails due to some critical fault condition or combination of fault conditions. When a device enters a failed state, it ceases to function and is no longer available as a system resource.

Using the example of the power supply, the power supply is considered failed when it ceases to provide regulated power.