This page tells how to control and learn more about individual hardware monitors, by using the site: http://docs.hp.com/hpux/diagThe "CPU monitor" (previously named the LPMC monitor) will be used as an example.
Getting Started - the Website and the Manual
Getting Details about a Monitor - Data Sheets
Two Names for Each Monitor
Seeing What the Monitor Can Report -- Event Descriptions
History of Changes to the Monitor -- Release Notes
More Info on the Monitor
Verifying the Operation of a Monitor
Controlling Monitors (monconfig and .sapcfg)
Changing the Monitor Configuration
Controlling Individual Events
Getting Started - the Website and the ManualThe first step in learning about the monitors is to:
Look around in the "EMS Hardware Monitors" section of the diagnostics website (http://docs.hp.com/hpux/diag).Most questions about the monitors can be answered by looking at the pages in this section.
For a quick start, see the Overview of EMS Hardware Monitors
For complete background on the monitors, see the manual "EMS Hardware Monitors User's Guide", available in PDF in the "EMS Hardware Monitors" section. The manual contains chapters on:
- An Introduction to the Monitors
- Installing and Using Monitors
- Detailed Description
- Using the Peripheral Status Monitor
- Hardware Monitor Configuration
For specific information on individual monitors, consult the other web pages described below.
Getting Details about a Monitor - Data SheetsKey information about each monitor is contained in the monitor data sheets.
As an example, look at the CPU monitor data sheet (previously named "LPMC monitor")
The data sheet for a monitor tells
An alternate way to get basic information about a monitor is from the HP-UX man page. At the HP-UX prompt, enter
- What the monitor does and how it operates
- When the monitor was released or underwent major changes (Release History)
- Firmware files, OS versions, etc. required to operate the monitor (Special Requirements)
- Resource Path for the monitor, for example, /system/events/cpu/lpmc
- Whether the monitor supports automatic PSM state control. (PSM stands for the "Peripheral Status Monitor". This feature is used by MC/ServiceGuard in conjunction with the hardware monitors to control package failover.)
- Monitor name. In the case of the CPU monitor, the full path to the binary (program) file is given as /usr/sbin/stm/uut/bin/tools/monitor/lpmc_em.
- Locations, names, and default values for all configuration files. (Sample: Configuration files for CPU monitor.) Configuring monitors is discussed later in this document.
man MONITOR_NAMEwhere MONITOR_NAME is the binary name of the monitor. For example, for the CPU monitor (lpmc_em), you would enter:man lpmc_em
Two Names for Each MonitorFrom the data sheet, you learn that each monitor has two names:
- A human-readable monitor name, used in the documentation (for example, "Disk monitor" or "CPU monitor").
- A computer-friendly name (for example, disk_em or lpmc_em). This name is used to form filenames, directory names, etc. Often this name will contain the string _em for event monitor or dm_ for device monitor.
For a cross-reference of the two types of monitor names, see the list of monitor data sheets.
The two monitor names may or may not resemble each other. In the case of the Disk monitor (disk_em), the names obviously resemble each other. However, in the case of the CPU monitor (lpmc_em), they do not.
Originally, lpmc_em only monitored for Low Priority Machine Checks (LPMCs), and was called the LPMC monitor. However, when the monitor was enhanced to verify floating point functionality, the name was changed to from "LPMC monitor to "CPU monitor."
Seeing What the Monitor Can Report -- Event DescriptionsA good way to learn what a monitor does, is to look at the list of events it can report:
If you click on the event listing for the CPU monitor, you will see the different events which can be reported by the monitor. A sample event is shown below:
Event 100701
- Severity: SERIOUS
- Event Summary: Error(s) detected [on Processor nn |.]
- Problem Description: Floating point test failed on this processor. To prevent Data Corruption, the monitor will try to deactivate the processor and/or mark it for deconfiguration. Another event will be generated to inform the result of the operation(s).
- Probable Cause/Recommended Action: Contact your HP support representative to have the processor checked before a catastrophic failure occurs.
- Event Generation Threshold: When a Floating point test fails on a processor
History of Changes to the Monitor -- Release NotesThe Release Notes for EMS hardware monitors. show every significant customer-visible change to the EMS hardware monitors, for each release.
For example, the Release Notes for the June 2002 release describes the extensive changes made for the CPU Monitor in that release.
On that Web page, go to the section named Changes to Individual Monitors and the entry for "CPU monitor" You'll find text that begins:
- CPU monitor (lpmc_em).
- The LPMC monitor has been enhanced for this release:
- Triggering of the Dynamic Processor Resilience (DPR) action has been improved. For detailed information, see the white paper Dynamic Processor Deallocation and Dynamic Processor Resilience available on this web site.
There are four types of Cache errors - ICache Data, ICache Tag, DCache Data, and DCache Tag. Currently, the monitor will take the DPR action when the total Threshold number of these errors occur, in any combination. Starting with HWE0206 release, the monitor will bucket these errors, and the DPR will kick in when the Threshold number of the SAME type of error occurs. ....
More Info on the MonitorOther information about monitors on the diagnostics website:
- Requirements and Supported Products for EMS monitors.
Lists any firmware, OSs versions, etc. required to operate the different monitors. It also lists the devices supported by the monitors.- Multiple-View (Predictive-Enabled) Monitors
Describes an enhancement made to most of the monitors since June 2000, whereby a monitor can be configured to send different sorts of data to different targets. Also explains the role of the .clcfg configuration file, which allows configuration of individual events.- Frequently Asked Questions (FAQs)
Contains both general and specific Frequently Asked Questions (FAQs) about the monitors.
Verifying the Operation of a MonitorThe FAQs have important information on verifying the operation of monitors:
How do I know if EMS hardware monitors are functioning?
How can I verify that the EMS hardware monitors are working?
Verifying EMS Hardware Monitors is a separate document that explains the process more fully.
Controlling Monitors (monconfig and .sapcfg)You can control a monitor by using the monconfig utility. As root, enter:
/etc/opt/resmon/lbin/monconfigThe menu is as follows:Select: (S)how monitoring requests configured via monconfig (C)heck detailed monitoring status (L)ist descriptions of available monitors (A)dd a monitoring request (D)elete a monitoring request (M)odify an existing monitoring request (E)nable Monitoring (K)ill (disable) monitoring (H)elp (Q)uitFor details on how to use monconfig, choose (H)elpWith the monconfig utility, you create and modify "monitoring requests". In a monitoring request, you specify the notification methods for different severity levels of a monitor. In the example below, the events with a severity level of INFORMATION are sent to a text log for all the monitors listed.
1) Send events generated by monitors /storage/events/disk_arrays/AutoRAID /storage/events/disks/default ... /system/events/cpu/lpmc /adapters/events/scsi123_em /system/events/system_status with severity >= INFORMATION to TEXTLOG /var/opt/resmon/log/event.logOther notification methods include emailing the error information and generating an SNMP trap.The monconfig utility is actually modifying the start-up configuration file (.sapcfg file) for the monitor.
Changing the Monitor ConfigurationEach monitor has several configuration files you can modify to control the operation of the monitor. In general, it is recommended that you NOT change the default configuration unless you fully understand the implications of doing so. The default configuration has been designed to meet the needs of most users.
Config File Name Function MON_NAME.sapcfg (e.g., lpmc_em.sapcfg)
startup configuration file Defines monitoring requests. Don't modify this file by hand. Instead use monconfig (described above) to make/change monitoring requests.
MON_NAME.cfg (e.g., lpmc_em.cfg)
monitor configuration file Defines general behavior of monitors, for example, the polling interval. On Predictive-Enabled monitors, some parameters are now controlled by the .clcfg file. MON_NAME.psmcfg (e.g., lpmc_em.psmcfg)
PSM configuration file Required for a monitor to have its state monitored by the Peripheral Status Monitor (PSM) -- used by MC/ServiceGuard to control package failover. default_MON_NAME.clcfg (e.g., default_lpmc_em.clcfg)
default client configuration file Used to control the text sent in text messages to specific targets (for example, to Predictive Support) on monitors that are Predictive-Enabled (Multiple-View). An important use for .clcfg files is to control the reporting of INDIVIDUAL EVENTS. For each event, you can control the severity, enable flag, suppression time, threshold, etc.
default_MONITOR_NAME.clcfg applies to messages sent to all targets. Other .clcfg files control text sent only to specific targets.
To change the configuration:
- Look on appropriate data sheets for the name and location of the desired configuration file.
(Actually, the configuration files all live in the same directory: /var/stm/config/tools/monitor.)
- For the .cfg, .psmcfg, or .clcfg files, edit the file by hand with an ASCII editor like vi. The file itself contains detailed information on syntax, etc.
- For the .sapcfg file, make changes by using the monconfig utility described above.
For more information on monitor configuration, see:
- Chapter 5 ("Hardware Monitor Configuration") in the "EMS Hardware Monitors User's Guide" available in PDF on http://docs.hp.com/hpux/diag.
- The configuration files themselves.
- Multiple-View (Predictive-Enabled) Monitors
Controlling Individual EventsFor Predictive-Enabled monitors you can control the way a monitor reports individual events by modifying the .clcfg file(s) for the monitor. For each event, you can control the severity, enable flag, suppression time, threshold, etc.
See " Changing the Monitor Configuration above.