EMS Hardware Monitors (logo)

EMS Hardware Monitors: Overview

This overview introduces the EMS Hardware Event Monitors. In addition to this summary page, see the following overview topics:

For more information, see the EMS hardware monitor documents at Diagnostics HOME


NEW! (May 01) : A two-page brochure on EMS monitors titled Protect Against Hardware Failures on HP 9000 Computers.

Protects Against System Hardware Failures

EMS Hardware Monitors provide a high level of protection against hardware failures that could interrupt system operation or cause data loss.

Hardware Monitoring Defined

Hardware monitoring is the process of watching hardware resources, such as disks, for the occurrence of any unusual activity, called an event. When an event occurs, it is reported using a variety of notification methods (such as email). Event detection and notification are handled automatically with minimal involvement on the user's part.

Simple block diagram of EMS hardware monitors (14K)

Integration With Other Applications

Hardware monitoring can be integrated with other applications responsible for maintaining system availability, such as MC/ServiceGuard. It is vital that these applications be alerted to hardware problems immediately so they can take the necessary action to avoid system interruption. Hardware monitoring is easily integrated with MC/ServiceGuard, and the necessary notification methods are provided for communication with other applications such as HP OpenView and CA Unicenter TNG.


Summary
Benefits
Installing
How they work
Glossary
All the files
Diagnostics HOME

URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emo_summ.htm
Last updated: Tue May 8 11:23:49 PDT 2001

EMS Hardware Monitors: Benefits

EMS Hardware Monitors (logo)

EMS Hardware Monitors: Benefits

Hardware monitoring provides the following benefits:


Summary
Benefits
Installing
How they work
Glossary
All the files
Diagnostics HOME

URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emo_bene.htm
Last updated: Tue May 8 11:23:49 PDT 2001

EMS Hardware Monitors: Installing

EMS Hardware Monitors (logo)

EMS Hardware Monitors: Installing

You can get hardware monitoring installed and working in minutes.

System requirements:

Procedure:
  1. Install the Support Tools Manager (STM) from the Support Plus Media (CD-ROM) as described in Chapter 5 of the Support Plus: Diagnostics User's Guide. The EMS hardware monitors are automatically installed when STM is installed.

  2. Examine the list of supported products to see if any of your devices have special requirements in order to be monitored.

    For example,if monitoring FC-AL hubs, edit the file /var/stm/config/tools/monitor/dm_fc_hub. For complete instructions, see the data sheet for the FC-AL Hub Monitor in the "EMS Hardware Monitor User's Guide" available from the Diagnostics HOME

  3. Enable hardware event monitoring (April 1999 and earlier releases only):
    1. Run the monitoring request manager by typing: /etc/opt/resmon/lbin/monconfig
    2. From the main menu selection prompt, enter E(nable Monitoring)
    As of the June 1999 release, this step is not necessary.

  4. Add or modify monitoring requests to customize the monitoring configuration for your system.

  5. (Recommended) Verify that monitors are correctly operating, for example, by simulating a hardware failure or event.
For complete instructions, see Chapter 2 (Installing and Using EMS Hardware Monitors) in the "EMS Hardware Monitor User's Guide" available from the Diagnostics HOME

The default hardware monitoring configuration should meet most monitoring requirements. By default, messages regarding major warning, serious and critical events are:

All events are also stored in /var/opt/resmon/log/event.log.

(For versions of the EMS hardware monitors dated April 1999 and earlier, major warning, serious, and critical messages were also sent to the system console by default.)

If you find that the default monitoring should be customized, you can always return later and add or modify monitoring requests as needed.


Summary
Benefits
Installing
How they work
Glossary
All the files
Diagnostics HOME

URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emo_inst.htm
Last updated: Tue May 8 11:23:49 PDT 2001

EMS Hardware Monitors: How It Works

EMS Hardware Monitors (logo)

EMS Hardware Monitors: How It Works

Hardware monitors are implemented as special processes (daemons) running on the computer system. The typical hardware monitoring process works as follows: Block diagram of EMS HW Monitor Operation (6k)
  1. A hardware event monitor detects abnormal behavior in one of the hardware resources (devices) it is monitoring.

  2. The hardware event monitor creates the appropriate event message, which includes suggested corrective action, and passes it to the Event Monitoring Service (EMS).

  3. EMS sends the event message to the system administrator using the notification method specified in the monitoring request (for example: email, message to the console, entry in a system log).

  4. The system administrator (or Hewlett-Packard service provider) receives the messages, corrects the problem, and returns the hardware to its normal operating condition.

  5. If the Peripheral Status Monitor (PSM) has been properly configured, events are also processed by the PSM. The PSM changes the device status to DOWN if the event is serious enough. The change in device status is passed to EMS, which in turn alerts MC/ServiceGuard. The DOWN status will cause MC/ServiceGuard to failover any package associated with the failed hardware resource.

The Difference Between Hardware Event Monitoring and Hardware Status Monitoring

Hardware event monitoring is the detection of events experienced by a hardware resource. It is the task of the EMS Hardware Monitors to detect hardware events. Events are temporary in the sense that the monitor detects them but does not remember them. Of course the event itself may not be temporary - a failed disk will likely remain failed until it is replaced.

Hardware status monitoring is an extension of event monitoring that converts an event to a change in device status. This conversion, performed by the Peripheral Status Monitor, provides a mechanism for remembering the occurrence of an event by storing the resultant status. This persistence provides compatibility with applications such as MC/ServiceGuard, which require a change in device status to manage high availability packages.



Summary
Benefits
Installing
How they work
Glossary
All the files
Diagnostics HOME

URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emo_work.htm
Last updated: Tue May 8 11:23:49 PDT 2001

EMS Hardware Monitors: Glossary

EMS Hardware Monitors (logo)

EMS Hardware Monitors: Glossary

The following terms are important for understanding hardware event monitors.

asynchronous event detection
The ability to detect an event at the time it occurs. When an event occurs the monitor is immediately aware of it. This method provides quicker notification response than polling.

default monitoring requests
The default monitoring configuration created when the EMS Hardware Monitors are installed. The default requests ensure that a complete level of protection is automatically provided for all supported hardware resources.

Event Monitoring Service (EMS)
The application framework used for monitoring system resources on HP-UX 10.20 and 11.0. Hardware monitoring uses the EMS framework for reporting events and creating PSM monitoring requests. A collection of EMS system monitors are available at additional cost and are not included with the hardware monitoring software. For more information on the EMS monitor, refer to Using EMS HA Monitors, which can be downloaded from http://docs.hp.com/hpux/ha/

event severity level
Each event that occurs within the hardware is assigned a severity level, which reflects the impact the event may have on system operation. The severity levels provide the mechanism for directing event notification. For example, you may choose a notification method for critical events that will alert you immediately to their occurrence, and direct less important events to a log file for examination at your convenience. Also, when used with MC/ServiceGuard to determine failover criteria, severe and critical events cause failover.

hardware event
Any unusual or notable activity experienced by a hardware resource. For example, a disk drive that is not responding, or a tape drive that does not have a tape loaded. When any such activity occurs, the occurrence is reported as an event to the event monitor.

hardware event monitor
A monitor daemon that gathers information on the operational status of hardware resources. Each monitor is responsible for watching a specific group or type of hardware resources. For example, the tape monitor handles all tape devices on the system. The monitor may use polling or asynchronous event detection for tracking events. Unlike a status monitor, an event monitor does not "remember" the occurrence of an event. It simply detects and reports the event. An event can be converted into a more permanent status condition using the Peripheral Status Monitor.

hardware resource
A hardware device used in system operation. Resources supported by hardware monitoring include mass storage devices such as disks and tapes, connectivity devices such and hubs and multiplexors, and device adapters.

MC/ServiceGuard
Hewlett-Packard's application for creating and managing high availability clusters of HP 9000 Series 800 computers. A high availability computer system allows application services to continue in spite of a hardware or software failure. Hardware monitoring integrates with MC/ServiceGuard to ensure that hardware problems are detected and reported immediately, allowing MC/ServiceGuard to take the necessary action to maintain system availability. MC/ServiceGuard is available at additional cost.

monitoring request
A group of settings that define how events for a specific monitor are handled by EMS. A monitoring request identifies the severity levels of interest and the type of notification method to use when an event occurs. A monitoring request is applied to each hardware device (or instance) supported by the monitor. Monitoring requests are created for hardware events using the Hardware Monitoring Request Manager. Monitoring requests are created for changes in hardware status using the EMS GUI.

Peripheral Status Monitor (PSM)
Included with the hardware event monitors, the PSM is a monitor daemon that acts as a hardware status monitor by converting events to changes in hardware resource status. This provides compatibility with MC/ServiceGuard, which uses changes in status to manage cluster resources. The PSM is also used to create hardware status monitoring requests through the EMS GUI.

polling
The process of connecting to a hardware resource at regular intervals to determine its status. Any events that occur between polling intervals will not be detected until the next poll, unless the monitor supports asynchronous event monitoring.

resource instance
A specific hardware device. The resource instance is the last element of the resource path and is typically the hardware path to the resource (e.g., 10_12_5.0.0), but it may also be a product ID as in the case of AutoRAID disk arrays. There may be multiple instances for a monitor, each one representing a unique hardware device for which the monitor is responsible.

resource path
Hardware event monitors are organized into classes (and subclasses) for creating monitoring requests. These classes identify the unique path to each hardware resource supported by the monitor. Two similar resource paths exist for each hardware resource - an event path used for creating event monitoring requests, and a status path used for creating PSM monitoring requests.


Summary
Benefits
Installing
How they work
Glossary
All the files
Diagnostics HOME

URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emo_glos.htm
Last updated: Tue May 8 11:23:49 PDT 2001

emo_all.htm: created Tue May 8 11:23:52 PDT 2001