These release notes cover the September 2001 release of Support Plus for HP-UX 11i/11.00/10.20 running on S800/S700 systems.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 11i, 110.00 and 10.20).
Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 11i, 11.00, or 10.20 (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the September 2001 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
- Changed event text that said "Replace component XXX" with text that suggested "Contact HP support to check component XXX." This change was made to a library, so multiple are affected.
- Fixed a problem whereby there is a remote chance that an error log will not be made available to a monitor. The error log would still be in the raw log for Logtool to decode, but the monitor would not be aware of it. The odds of this happening are very remote.
- Fixed a problem with false reporting of connection failures by audit system. There was a debug facility in the diagnostics and monitors that would attempt to connect to a port to output debug trace information. This caused the audit system (audsys) to see lots of connection failures. This change eliminates the connection attempts when debugging is not going on.
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
N/A- Chassis Code Monitor (dm_chassis).
- Added support for chassis codes reported by hp server rp8400 (9000/800/S16K-A, "Keystone"). Also added support for several new Superdome chassis codes.
- JAGad63718
Eliminated confusing text that appeared in the summary for all dm_chassis events: Unknown device at hardware path :.- CMC Monitor (cmc_em).
N/A- Core Hardware Monitor (dm_core_hw)
Changed the text of multiple event messages.
- Event 2 previously had the description:
The system has recovered from these errors but the system board should be replaced before a catastrophic error occurs.Event 2 now has the following description:The system has recovered from these errors but the system board should be checked before a catastrophic error occurs.- All occurrences of the text that suggest replacing a component have been changed to "Contact your HP support representative to check" the component.
- Some system names have been changed to add the new names: "N-Class" now adds "rp7400"; "L-Class" now adds "rp5430, rp5450, and rp5470."
- Core Hardware for Itanium (ia64_corehw).
N/A- Disk Array FC60 Monitor (fc60mon).
N/A- Disk Monitor (disk_em).
- Modified STM platform to map NEC iStorage 4000/2000/1000 disks as an NEC Array, so they will be ignored by the SCSI Disk Monitor (disk_em). In addition, added an entry in the xref file so that no tools will be shown to be available for these devices.
- Fixed a problem with disk_em, whereby the monitor would perform polling when it was inappropriate, thereby potentially causing performance problems. For example, the monitor would poll even if the diaglogd daemon were not operational or when the monitor was first started. This problem even occurred if the polling interval was set to 0.
- Fixed a problem with disk_em, whereby the monitor would log the following message into the api.log file every time an event is generated. With this fix, the message is no longer generated.
Process ID:3843(/usr/sbin/stm/uut/bin/tools/.../disk_em) Log Level: Error Severity field in the following entry in client configuration file (/var/stm/config/tools/monitor/default_disk_em.clcfg) is invalid: (EQ:100031:MAJOR_WARNIN:TRUE:1440:ANY:1:NONE:NO_OP:NO_OP:NONE) Possible Causes/Recommended Action: Fix entry in client configuration file.- Modified disk_em so that monitor will no longer try to get the serial number of a device if the device previously indicated that it did not support the command.
- JAGad67463
Modified disk_em so that it will not get defect logs for devices with non-HP firmware.- Changed event text that said "Replace component XXX" with text that suggested "Contact HP support to check component XXX."
- JAGad34366
Reset event history when FRU is replaced/removed.- Fix to prevent monitor from going into loop and logging messages in api.log file if poll interval is zero and diaglogd is down.
- Enhanced info in polling events. The event details will now have the information about CDB data that was used in the polling side of the monitor.
- Fibre Channel Adapters (dm_FCMS_adapter).
Fixed a problem with dm_FCMS_adapter, whereby the monitor would perform polling when it was inappropriate, thereby potentially causing performance problems. For example, the monitor would poll even if the diaglogd daemon were not operational or when the monitor was first started. This problem even occurred if the polling interval was set to 0.- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
Fixed a problem with dm_TL_adapter, whereby the monitor would perform polling when it was inappropriate, thereby potentially causing performance problems. For example, the monitor would poll even if the diaglogd daemon were not operational or when the monitor was first started. This problem even occurred if the polling interval was set to 0.- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
N/A- High Availability Disk Array Monitor (ha_disk_array) .
N/A- High Availability Storage System (dm_ses_enclosure)
N/A- Kernel Resource Monitor (krmond)
N/A- LPMC Monitor (lpmc_em).
- Improved the text for events 100, 101, and 103.
- Fix to prevent monitor from going into loop and logging messages in api.log file if poll interval is zero and diaglogd is down.
- Memory Monitor (dm_memory).
N/A- Peripheral Status Monitor (PSM).
N/A- Remote Monitor (RemoteMonitor).
Added new events #212, #221, #257.- SCSI Card Monitor (scsi123_em).
- Deleted extraneous event information from .cfg configuration file.
- JAGad63861 and JAGad56487.
- Fixed code to convert value of POLL_INTERVAL to seconds. The prior version of the monitor used the value before converting to seconds, as a result the monitor polled (incorrectly) every 60 seconds as opposed to 60 minutes.
- Changed event text that said "Replace component XXX" with text that suggested "Contact HP support to check component XXX." (Cause/action texts for events 101081 and 102057).
- Fix to prevent monitor from going into loop and logging messages in api.log file if poll interval is zero and diaglogd is down.
- The default severity for event 101081 will be Minor Warning rather than Critical.
- Product info (model string) will now appear in component data.
- SCSI Tape Monitor (dm_stape).
JAGad64273
Modified perform_polling function so that it will read Inquiry and Log Sense data from the stape driver cache instead of issuing SCSI commands to get this data. If the driver does not support generating this cache, it will revert back to sending SCSI commands to get this data.- System Status Monitor (sysstat_em)
- Modified the sysstat_em so that event 100009 and 100010 will be generated every hour that the condition persists.
- Fix to prevent monitor from going into loop and logging messages in api.log file if poll interval is zero and diaglogd is down.
- UPS Monitor (dm_ups).
Enhanced to support two new events that can be delivered from the most recent version of ups_mond and generated by a PowerTrust II UPS with updated firmware:These events can only be reported if ups_mond patch PHCO_24152, PHCO_24153, PHCO_24172, or PHCO_24173 has been installed on the system; the UPS is a PowerTrust II; and the firmware has been updated to support the new "s" and "k" commands.
- Event #45 - At least one or more components of the UPS subsystem configured through the device file (/dev/tty0p1) has lost communications.
- Event #46 - The UPS configured through device file (/dev/tty0p1) reports the UPS "Lost Communication" alarm condition has cleared.
Changes to Platform and Interface
- Modified the psmctd daemon to consider a missing HWE_SHUTDOWN file as monitoring enabled, not disabled. Previously if this file were removed, the monitors and monconfig considered monitoring enabled, but psmctd didn't. So, psmctd would shutdown and log an error indicating that monitoring needed to be enabled.
- (HP-UX 10.20 only). Fixed psmctd so that if sleep during boot is interrupted, it will restart sleep with time left to sleep, not total time requested to sleep. On previous release, if sleep was interrupted, it would restart with total sleep requested. This could cause psmctd to sleep more than total sleep requested. Psmctd is requested to sleep after reboot to solve problem where PSM requests get removed during reboot due to timing problems. Sleeping longer than requested is not a problem and the sleep will eventually fully complete. This code will only impact the 10.20 release reboot, as the sleep is NOT interrupted on 11.00 and higher releases.
- JAGad68646
Fixed a problem, whereby delays could occur in monitor startup. On a few systems, monitors would the swlist program and the program would hang. With this change, the monitors call the what program instead of swlist.Customer-Visible Interface Changes
CAUTION: UPS Monitor May Need a PatchIn some cases, the UPS monitor (dm_ups) will not function and will instead generate event 45 (formerly event 42) with the text:
Probable Cause / Recommended Action: The monitor was unable to locate the fifo pipe that should have been created by ups_mond. Therefore, information about the ups cannot be sent to the monitor. You need version (80.1.2.3) of ups_mond or greater. To update your system with the correct version of ups_mond, install one of the following patches: HPUX 10.20/s800 : PHCO_23830 HPUX 11.00 : PHCO_23831 HPUX 11.11 : PHCO_23832To fix the problem, load the indicated patch or load the HWE patch bundle which contains this patch. For HP-UX 11i, the ups_mond patch PHCO_23832 is also distributed on the Sept 01 OE.This problem will affect most systems with a UPS when the September 2001 diagnostics are installed. The only systems not affected will be those which are being updating from certain versions of the diagnostics (September 2000 through March 2001) and which do not have patch PHCO_19031 (HP-UX 10.20) or PHCO_19040 (HP-UX 11.00) installed.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems. Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
- AutoRAID Disk Array (armmon)
- Chassis Code Monitor (dm_chassis)
- CMC Monitor (cmc_em).
- Core Hardware (dm_core_hw)
- Core Hardware for Itanium (ia64_corehw)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fast Wide SCSI Disk Array (fw_disk_array)
- Fibre Channel Adapters (dm_FCMS_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- Kernel Resource (krmond)
- LPMC (lpmc_em)
- Memory (dm_memory)
- Remote (RemoteMonitor)
- SCSI Card (scsi123_em)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
In addition, the Peripheral Status Monitor (PSM) is provided to monitor the current status of the products supported by the above list.
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files