These release notes cover the March 2004 release of the Support Tools (diagnostics) for HP-UX 11i V2.0.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
Included with the OnlineDiag bundle of support tools are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs.
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can eliminate most undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Diagnostics section of Hewlett-Packard's online documentation Web site at:
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site.
For the most current information on HP-UX 11i V2.0 diagnostics, see the following Web pages at the Diagnostics site:
- "DIAGNOSTICS.readme for HP-UX 11i V2.0 (March 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/st/str_1123.htm- "Release Notes for STM on HP-UX 11i V2.0 (March 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/stm/str_1123.htm- "Release Notes for EMS Hardware Monitors (HP-UX 11i V2.0, March 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/ems/emr_1123.htmFor 11i V2.0, the EMS hardware monitors use version A.03.30 of the EMS platform. HP-UX 11i V2.0 does not support the full functionality of the EMS platform. However, all EMS functionality required by the hardware monitors is provided.
The notification method "SNMP" that can be configured (in previous releases) for EMS HW Monitors will probably NOT be available to monitors running on HP-UX 11i V2.0 (Check the latest Web page version of the EMS Release Notes for the most current information).
Memory Page Deallocation (MPD) and the memlogd daemon are not implemented on the RX 4610 computer.
Changes in the EMS Hardware Monitors for the the March 2004 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
- N/A
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- Chassis Code Monitor (dm_chassis).
- N/A
- CMC Monitor (cmc_em).
- JAGaf01829
The threshold for deactivating a monitor has been changed to 10.- JAGae91385
Currently, the Event Threshold window in the default_*.clcfg files for CMC monitors is set to ANY for certain events. The intention is to change it to 24-hours, so that these events are generated for 'threshold' occurrences in a '24 -hour' period, rather than at ANY time.- Core Hardware for Itanium (ia64_corehw).
- JAGae99136; JAGae99146; JAGae99148; JAGae99150; JAGae99222
These defects affect PA8800 processor-based (PA) non-cellular systems only. The current version of ia64_corehw monitor is not able to distinguish PA non-cellular systems from PA cellular systems. Therefore, it assumes the non-cellular system to be a cellular system, as a result of which, it is not able to :
- generate events for type 0x02 SEL entries on PA non-cellular systems
- clear the SEL when it got full
With this release, the monitor will now be able to identify the systems correctly. Code was changed to improve the detection of the following systems, so that the monitor can take appropriate actions on them: PA-non-cellular, PA-Cellular, IPF-non-cellular, IPF-Cellular.
- The ia64_corehw monitor was writing the following debug message in syslog:
PID=<nnnn> : Calling set wd timerWith this submittal, the monitor is to disabled from logging this message in syslog.
- JAGae86338
The WatchDogTimer settings of the HP version of the monitor were being overridden by the non-HP version on HP systems, because the non-HP version sets its actions, before deciding whether or not to run on the current (HP) system. The opposite case can also be true on non-HP systems. This has been fixed.- JAGae90398
The e-mail message from EMS says Baseboard Management Controller (BMC) clock is not initialized. This is fixed.- JAGae95681
When the OnlineDiag bundle is removed and re-installed, the monitor starts to write to the log file with '00' extension. This has been fixed.- Core Hardware Monitor -- Asama (ipfcorehw_asama).
- JAGae89421
The WatchDogTimer settings of the HP version of the monitor were being overridden by the non-HP version on HP systems, because the non-HP version sets its actions, before deciding whether or not to run on the current (HP) system. The opposite case can also be true on non-HP systems. This has been fixed.- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi).
- JAGae89420
The WatchDogTimer settings of the HP version of the monitor were being overridden by the non-HP version on HP systems, because the non-HP version sets its actions, before deciding whether or not to run on the current (HP) system. The opposite case can also be true on non-HP systems. This has been fixed.- CPE Monitor (cpe_em).
- This release of the monitor has the following new features:
- Support to decode Corrected Platform Errors on crossbar controller for cellular system.
- JAGae99310 : explain error_status only if validation_bit is set for it
- JAGaf00845 : monitor generates 6 events for one set of CPE data.
- JAGae91385
Currently, the Event Threshold window in the default_*.clcfg files for CPE monitors is set to ANY for certain events. The intention is to change it to 24-hours, so that these events are generated for 'threshold' occurrences in a '24 -hour' period, rather than at ANY time.- CPU Monitor -- Hitachi (cmc_em_hitachi).
- N/A
- Disk Array FC60 Monitor (fc60mon).
- JAGae97568; JAGae90990; JAGae82991 --> The modified catalog file lists the steps to kill and restart the AM60Srvr. Earlier, only one event was being generated for numbers of GHS enabled/disabled disks. The change that has been made in the configuration file, enables reporting of all the GHS events either enabling, or disabling, GHS disks in FC60 storage devices. Whenever a new fc60 device is connected to the server, you need to check to see if AM60Srvr is running. Once this check has been performed, you must issue the amdsp -R command, in order for the server to identify the device. This needs to be done before the monitor starts monitoring the device.
- JAGae90990
Previously, only one event was being generated for any number of GHS enabled/disabled disks. The change made in the configuration file enables reporting of all the GHS events, either enabling or disabling GHS disks in the FC60 storage device.- JAGaf01421
The /etc/opt/resmon/log directory is deleted 24 hours after the fc60mon monitor starts. This problem is fixed.- Disk Monitor (disk_em).
- JAGae98960
disk_em reported a large number of errors corrected with delay. This has been fixed in this release.- Fibre Channel Adapter (ql_adapter)
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- N/A
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- High Availability Disk Array Monitor (ha_disk_array).
- N/A
- High Availability Storage System (dm_ses_enclosure)
- N/A
- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- N/A
- Kernel Resource Monitor (krmond)
- N/A
- Memory IA64 (memory_ia64)
- JAGae72495
The IPF Memory Monitor can potentially run into a corner case timeout problem (given the timing limitations for EMS), if and only if it takes a long time to retrieve the memory configuration information on the IPF systems (which only happens on very large configuration systems).The fix was to have the IPF Memory Monitor fork off a child process to retrieve the memory configuration information, and to store it in a temporary file (/var/tmp/memconfig). The memconfig file is later read (and also removed) by the IPF Memory Monitor, when restoring the memory configuration information (that was previously retrieved and stored in the memconfig file by the child process).
- JAGae89571
Problem Description:
Incomplete DIMM Location can be displayed by the STM Logtool's vd/vda commands, and by the Memory Information Tool for multi-bit errors (MBE) in the PDT, on hp server rx2600, hp server rx1600, and hp server rx4640.
If the DIMMs are loaded in quads (in sets of fours) on hp server rx2600, hp server rx1600, and hp server rx4640, the DIMM Location displayed by the STM Logtool's vd/vda commands, and by the Memory Information Tool for MBE in the PDT, will only indicate the DIMM pair (first two DIMMs in that RANK), instead of the DIMM quad (all four DIMMs in that RANK). This is because when DIMMs are loaded in quads, a RANK is a quad. When DIMMs are loaded in pairs, a RANK is a pair.
Fix to JAGae89571:
The fix was to display the DIMM Location of the MBE in the PDT (as displayed in the STM Logtool vd/vda commands and the STM Memory Information Tool) on ALL HP Integrity Servers with the RANK number, which should be used to determine the DIMMs contributing to the MBE, depending on whether the DIMMs are loaded in pairs or in quads. When DIMMs are loaded in quads, there are four DIMMs in that RANK. When DIMMs are loaded in pairs, there are two DIMMs in that RANK.
Most systems will fall under the Typical/Normal Cases, and very few systems will actually fall under the Special Cases. If unsure, please refer to the System's User Guide for information on how DIMMs should be loaded.
*** Typical/Normal Cases:
- On hp workstation zx6000, hp workstation zx2000, DIMMs are always loaded in pairs:
RANK 0 => DIMM 0A/0B RANK 1 => DIMM 1A/1B .....- On hp server rx5670, hp server rx5630, hp server rx4640, DIMMs are always loaded in quads:
RANK 0 => DIMM 0A/0B/0C/0D RANK 1 => DIMM 1A/1B/1C/1D .....- On hp superdome server SD16A, hp superdome server SD32A, hp superdome server SD64A, hp server rx8620, hp server rx7620, DIMMs are always loaded in pairs:
RANK 0 => DIMM 0A/0B RANK 1 => DIMM 1A/1B .....*** Special Cases:
- On hp server rx2600, DIMMs must be loaded in quads:
RANK 0 => DIMM 0A/0B/1A/1B RANK 2 => DIMM 2A/2B/3A/3B RANK 4 => DIMM 4A/4B/5A/5B- On hp server rx1600, DIMMs can be loaded in pairs or DIMMs can be loaded in quads. If DIMMs are loaded in pairs:
RANK 0 => DIMM 0A/0B RANK 1 => DIMM 1A/1B .....- If DIMMs are loaded in quads:
RANK 0 => DIMM 0A/0B/1A/1B RANK 2 => DIMM 2A/2B/3A/3BTo determine if DIMMs are actually loaded in pairs or if DIMMs are actually loaded in quads, you can tell by checking the number of events (with alert level 5) sent by the firmware for each MBE occurence. If the DIMMs are loaded in pairs, firmware will send out 2 events (one for each DIMM in the pair). If the DIMMs are loaded in quads, firmware will send out 4 events (one for each DIMM in the quad).
- Memory Monitor -- Hitachi (ipfmemory_hitachi).
- N/A
- MSA1000 Storage Disk Array Monitor (msamon)
- JAGae98401
A new monitor has been added to monitor HP MSA 1000 Storage Disk Arrays on the HP-UX 11i V2 release.- Starting HP-UX 11i V2 March 2004 release of OnlineDiag, msamon monitors HP Storage Works Modular SAN Array 1000 supported on the following cards:
- A6795A (2 Gbps, 64-bit 66 MHz PCI)
- A6826A (2 Gbps, 64-bit PCI Dual Channel)
- Peripheral Status Monitor (psmmon).
- N/A
- RAID Adapter (dm_raid_adapter)
- N/A
- Remote Monitor (RemoteMonitor).
- N/A
- SCSI Disk Monitor (scsi_disk).
- N/A
- SCSI Tape Monitor (dm_stape).
- JAGae96244
Code has been modified to fix the SIGSEGV error for dm_stape.- JAGae94035
Duplicate entries for event 100772 have been removed from the default_dm_stape.clcfg file.- System Status Monitor (sysstat_em)
- N/A
- UPS Monitor (dm_ups).
- JAGaf02440
The UPS monitor has been modified to generate the test event which verifies monitor-EMS communication.Changes to Platform and Interface
- JAGae46500
When monconfig utility was used to add a monitoring request, the monoconfig utility printed an irrelevant error, after the help output. Now the code is modified to correct this problem.- A new monitor, msamon, for the HP MSA 1000, has been added.
- The following new monitors have been added:
- ql_adapter
- dm_raid_adapter
Customer-Visible Interface Changes
- N/A
PROBLEM: Memory Page Deallocation (MPD) is not available on the RX 4610 computer.The Memory Page Deallocation (MPD), which runs on most current HP-UX computer systems, does not work on the RX 4610 computer. If you look in the activity log for memlogd, you will see a message saying, "unsupported device."
MPD cannot be implemented on RX 4610 because of the design of that system. The memlogd daemon cannot run on it.
PROBLEM: dm_fc_hub and dm_fc_sw monitors not functional.In the June 2003 release of 11i V2.0, the Fibre Channel Arbitrated Loop Hub monitor (dm_fc_hub) and the Fibre Channel Switch monitor (dm_fc_sw) are probably not functional, because these monitors depend on SNMP functionality which may not be included in this release (check the latest version of the Web page for the EMS Release Notes for the most current information) .
For the March 2004 release of HP-UX 11i V2.0, the following monitors are scheduled to be available:The following monitors are NOT provided:
- CMC Monitor (cmc_em).
- Core Hardware for Itanium (ia64_corehw)
- Core Hardware Monitor -- Asama (ipfcorehw_asama)
- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi)
- CPE Monitor (cpe_em)
- CPU Monitor -- Hitachi (cmc_em_hitachi)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fibre Channel Adapter (ql_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- Forward Progress Log (FPL) Monitor (fpl_em)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- Kernel Resource Monitor (krmond)
- Memory IA64 (memory_ia64)
- Memory Monitor -- Hitachi (ipfmemory_hitachi)
- MSA1000 Storage Disk Array Monitor (msamon)
- Peripheral Status Monitor (psmmon)
- RAID Adapter (dm_raid_adapter)
- Remote Monitor (RemoteMonitor)
- SCSI Disk Monitor (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
- dm_core_hw: replaced by ia64_corehw
- dm_FCMS_adapter
- fw_disk_array: hardware not supported on system
- lpmc_em: replaced by cmc_em
- scsi123_em: hardware not supported on system
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files