These release notes cover the June 2004 release of Support Plus for HP-UX 11i.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: The HP Storageworks SDLT 160/320 GB Tape Drive and HP Ultrium 460 External Tape Drive are NOT SUPPORTED by the OnlineDiag product. Even though some STM tools may function with these devices, they are still NOT SUPPORTED. The diagnostic tools and utilities that DO SUPPORT these devices are the HP StorageWorks Library and Tape Tools (L&TT). These tools can be downloaded, free of charge, at:
http://www.hp.com/support/tapetoolsIncluded on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 11i and 11.00 (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/ .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics": http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.
For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the June 2004 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Visible Interface Changes
- JAGaf12089
aplsrv IDs were too long and ambiguous in some cases. Those IDs are modified.Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
- N/A
- Chassis Code Monitor (dm_chassis).
- JAGaf11912
Changes were made to prevent event 200 (cclogd not running) from being emitted during system and diagnostic shutdown.- Core Hardware Monitor (dm_core_hw)
- JAGae96050; JAGae96205 --> When the dm_core_hw monitor was killed using "kill -2 <monitor's pid>", and when the monitor was started again by using `registar`, it was found that the monitor was NOT registered with diaglogd for the SCAN type of error. The problem has been rectified, and now, after the dm_core_hw monitor is terminated, it will receive the EV_SCAN_EVENT_ENTRY type of errors when brought back again.
- Chassis Event Monitor (ia64_corehw).
- JAGaf12894
Reworded event 104010 to cover situations where redundancy was regained, or where redundancy is observed to be present.
Reworded event 104011 to cover situations where redundancy was lost, or where redundancy is not observed to be present.- JAGae97660
This version of the monitor fixes the code, so that the monitor will not log the following debug message in the syslog file:ia64_corehw monitor: PID=<> Calling set WD timer- JAGae95681
When the OnlineDiag bundle is removed and re-installed, the monitor starts to write to the log file with '00' extension. This has been fixed.- CPU Monitor (lpmc_em).
Note: As of the June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU.
- N/A
- Disk Array FC60 Monitor (fc60mon).
- JAGae97568
Recommended actions reported by fc60mon in Events 4 and 37, and improper event descriptions reported by fc60mon in Event 9, have been modified to enhance clarity.- Fast Wide SCSI Disk Array (fw_disk_array)
- N/A
- Disk Monitor (disk_em).
- JAGaf16966
On machines supporting hot swappable disks, events with a SERIOUS severity level were generating on booting. This is because the disks were not spinning automatically. This problem has been fixed.- Fibre Channel Adapters (dm_FCMS_adapter).
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- N/A
- Fibre Channel Adapter Model A6826A Monitor (dm_QL_adapter)
- N/A
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- Forward Progress Log Monitor (fpl_em)
- The fpl_em monitor was enhanced to have the ability to display related IPMI events in a single EMS event.
- JAGaf06110
The fpl_em monitor was enhanced to display additional information in all of the EMS events for the fpl_em monitor. The enhancements are:
- Display the Reporting Entity ID as:
Reporting Entity ID: 0 ( Cab 0 Cell 0 CPU 0 )Note: If the device (Cab, Cell or CPU) is invalid, it is not displayed. For example, if the CPU is invalid, the output would be ( Cab 0 Cell 0 ).
- Display the raw IPMI event as:
IPMI event: 0x0123456789abcdef 0x0123456789abcdefNote: This will force all leading zeros to be displayed.
- Display the IPMI event ID as:
IPMI Event ID: 1234 (0x4d2)Note: The IPMI Event ID is also being displayed as a hex value.All these enhancements will be visible on all systems on which the fpl_em monitor is supported.
- High Availability Disk Array Monitor (ha_disk_array).
- N/A
- High Availability Storage System (dm_ses_enclosure)
- N/A
- iSCSI Device Adapter (dm_iscsi_adapter)
- N/A
- Kernel Resource Monitor (krmond)
- N/A
- LPMC Monitor (lpmc_em).
As of the June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU. For more information, see CPU Monitor (lpmc_em).- Memory Monitor (dm_memory).
- JAGaf18838
When a single bit error was encountered, memlogd aborted due to unsatisfied symbols. This has now been fixed.- JAGaf18850
The serial number displayed by dm_memory monitor was incorrect. This has been fixed.- JAGae44296
The Serial Number and Part Number of the DIMM will be reported by the memory monitor for all Single-Bit-Error-related events. The Serial Number and Part Number of the DIMMs which are supposed to be reported by dm_memory monitor, cannot be retrieved in vPar environment in revisions prior to A.03.02 that are running on the following platforms:
SD16A, SD32A, SD64A, rp7420 and rp8420.
Starting with HP-UX 11i June 2004 release of OnlineDiag, this functionality is available in vPar environments in revisions A.03.02 onwards.- JAGae57879
When the /etc/protocols file was missing, the dm_memory monitor looped around causing high CPU usage. This has been fixed. It will now use the default protocol number for the protocol.- Peripheral Status Monitor (PSM/psmmon).
- JAGae60715
psmmon monitoring requests are not persistent like the monitoring requests of the hardware monitors. If the monitor rejects the monitoring request, the monitoring request is lost. Also, psmmon shuts down when monitoring is brought down. These two issues with psmmon have been fixed, by enhancing psmmon:
- To continue operations when monitoring is shut down, and psmctd and diagmond are not running. psmmon will continue to accept requests. It will return the "last known" state of the hardware resources as long as psmctd is down.
- To not reject any monitoring request. It only indicates a "not ready" state, when it is not in position to accept any requests. This prevents the loss of monitoring requests.
- MSA 1000 Storage Disk Array (msamon).
- JAGae98401
A new monitor has been added to monitor the MSA 1000 Storage Array and the MSA 30 Storage Enclosure on HP-UX 11.11.- Remote Monitor (RemoteMonitor).
- N/A
- RAID adapters (dm_raid_adapter)
- JAGaf08549
Error message only provides the reason for the error. It does not provide the error description. This has been fixed.- SCSI Card Monitor (scsi123_em).
- JAGae51900
scsi123_em monitor events 103097, 103103, and 103113 have been updated, so that the new description will give more accurate information about failures.- SCSI Cascade Monitor (scsi_cascade).
- N/A
- SCSI Disk (scsi_disk).
- N/A
- SCSI Tape Monitor (dm_stape).
- N/A
- System Status Monitor (sysstat_em).
- N/A
- UPS Monitor (dm_ups).
- JAGae71543
The UPS monitor has been modified to generate the test event to verify monitor - EMS communication.Changes to Platform and Interface
- JAGaf06008
The 'time_window' value in the clcfg file for hardware monitors is not being used, when events are generated. This was noticed with dm_memory, where event 4000 was triggered after a TOTAL of 20 occurrences of an SBE on a DIMM, NOT after 20 within 24 hours. This problem appears only during the next start up of diagnostics, after the cfg has been modified and the diagnostics have been brought down. This problem is applicable to all the hardware monitors.- JAGae46500
When monconfig utility was used to add a monitoring request, monconfig utility printed an irrelevant error after the help output. Now the code has been modified to correct this problem.Customer-Visible Interface Changes
- N/A
CAUTION: UPS Monitor May Need a PatchIn some cases, the UPS monitor (dm_ups) will not function and will instead generate event 45 (formerly event 42) with the text:
Probable Cause / Recommended Action: The monitor was unable to locate the fifo pipe that should have been created by ups_mond. Therefore, information about the ups cannot be sent to the monitor. You need version (80.1.2.3) of ups_mond or greater. To update your system with the correct version of ups_mond, install the following patch: HPUX 11.11 : PHCO_23832To fix the problem, load the indicated patch or load the HWE patch bundle which contains this patch. For HP-UX 11i, the ups_mond patch PHCO_23832 is also distributed on the Sept 01 OE.This problem will affect most systems with a UPS when the September 2001 diagnostics are installed. The only systems not affected will be those which are being updating from certain versions of the diagnostics (September 2000 through March 2001) and which do not have patch PHCO_19040 (HP-UX 11.00) installed.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg and sdisk_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems. Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
- AutoRAID Disk Array (armmon)
- Chassis Code Monitor (dm_chassis)
- Core Hardware (dm_core_hw)
- Chassis Event Monitor (ia64_corehw)
- CPU (lpmc_em)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fast Wide SCSI Disk Array (fw_disk_array)
- Fibre Channel Adapters (dm_FCMS_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Adapter Model A6826A Monitor (dm_QL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- Forward Progress Log Monitor (fpl_em)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- iSCSI Device Adapter (dm_iscsi_adapter)
- Kernel Resource (krmond)
- LPMC (lpmc_em) renamed to "CPU Monitor" as of the June 02 release
- Memory (dm_memory)
- MSA 1000 Storage Disk Array (msamon)
- Remote (RemoteMonitor)
- RAID adapters (dm_raid_adapter)
- SCSI Card (scsi123_em)
- SCSI Cascade (scsi_cascade)
- SCSI Disk (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
In addition, the Peripheral Status Monitor (PSM) is provided to monitor the current status of the products supported by the above list.
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files