These release notes cover the September 2004 release of the Support Tools (diagnostics) for HP-UX 11.23.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
Included with the OnlineDiag bundle of support tools are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs.
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can eliminate most undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Diagnostics section of Hewlett-Packard's online documentation Web site at:
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site.
For the most current information on HP-UX 11.23 diagnostics, see the following Web pages at the Diagnostics site:
- "DIAGNOSTICS.readme for HP-UX 11.23 (September 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/st/str_1123pi.htm- "Release Notes for STM on HP-UX 11.23 (September 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/stm/str_1123pi.htm- "Release Notes for EMS Hardware Monitors on HP-UX 11.23 (September 2004)" at:
http://docs.hp.com/hpux/onlinedocs/diag/ems/emr_1123pi.htmFor 11.23, the EMS hardware monitors use version A.04.00.01 of the EMS platform.
Memory Page Deallocation (MPD) and the memlogd daemon are not implemented on the RX 4610 computer.
Changes in the EMS Hardware Monitors for the the September 2004 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems):
- JAGaf18401
Manpages for the following monitors have been updated:
- dm_ups
- sysstat_em
- dm_fc_sw
- JAGaf18370
Manpages for the following monitors now include the .clcfg (Client Configuration file) path:
- msamon
- dm_ses_enclosure
- dm_stape
- fc60mon
- disk_em
- JAGaf12089
aplsrv IDs were too long and ambiguous in some cases. Those IDs are modified.- JAGaf18375
Manpages for the following monitors now include the .clcfg files:
- dm_FCMS_adapter
- dm_TL_adapter
- dm_iscsi_adapter
- dm_ql_adapter
- dm_raid_adapter
- JAGaf45718
Manpages for the following monitors now include the .clcfg file.
- dm_TL_adapter
- dm_raid_adapter
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- Chassis Code Monitor (dm_chassis).
The following applies to PA systems only:
- JAGaf11912
The configuration file entry for Event 200 (cclogd not running) has now been modified, so that one event will be generated for every 10 detections of the absence of cclogd (chassis code logging daemon). The threshold has been increased, because the absence of cclogd might be a for a very short duration, and does not warrant an event of the severity MAJOR WARNING, every time it occurs.- CMC Monitor (cmc_em).
The following applies to IA systems only:
- The severity of event 100701, generated by the CMC monitor, has been changed from "Information" to "Minor Warning", because some user action may be required, if and when the event is generated. More information about the potential actions to be taken is included in the event itself.
- JAGaf19005; JAGaf17492
- JAGaf17492 : CMC monitor processes Ext Bus errors on non-cellular systems.
On non-cellular systems, the monitor processes the SBEs on the External Bus - which are NOT the cache errors. Therefore, these errors are incorrectly counted towards the DPR threshold.
- JAGaf17492 : CMC monitor needs to be enhanced for Enhanced Thermal Mode.
The current version of the monitor generates event 100701, when it receives Thermal-CMC data indicating that the processor power is reduced, because the processor is running hot.
When the monitor receives Thermal-CMC data indicating that the processor power is restored (because the processor has cooled down), the monitor generates event 100702.
The monitor needs to be improved to look for a trend toward these conditions, rather than reacting to individual instances of these conditions. Because of the trending, the event 100702 will be eliminated.
Fix: The monitor has been enhanced to generate event 100701, when it receives ETM-Entry data 12 times in 24-hour period. For the new thresholding mechanism, an ETM-Exit data, without prior ETM-Entry data, will be counted towards the threshold.
Also, if the processor stays in ETM for the equivalent of 12 transitions in the 15-minute O/S poll intervals, the event will be generated.
- JAGaf18376
The manpage for the CMC monitor has been enhanced to refer to cmc_em.cfg and default_cmc_em.clcfg files.- As part of the enhancement of the monitor to consider the Enhanced Thermal Mode (ETM) of the processor, event 100701 will now be generated, if the monitor detects that the processor goes into the ETM mode 12 times in a 24-hour period. Event 100702 will no longer be generated.
- JAGaf17492
The CMC monitor processed Ext Bus errors on non-cellular systems. On these systems, the monitor should not have counted the errors on the external bus towards the DPR action. This problem has now been fixed.- JAGaf05954; JAGaf07290
The CMC monitor needed to ignore inconsistent rx4640 data and datalength: this fix has been implemented.The CMC monitor was generating incorrect events for some rx4640 CPEs.
The text for Event 100701 was changed, to better explain the Thermal CMC.
The following applies to PA systems only:
- JAGaf04478
After completion of fp tests on each CPU, the lpmc_em monitor is now made free to run on any processor.- Core Hardware (dm_core_hw).
The following applies to PA systems only:
- JAGaf12285
The aplsrv ID for this monitor has been restricted to five letters (i.e., "CorHW") to prevent possible confusion with other monitors.- JAGaf14904
In the event descriptions for events 37 and 38, generated by this monitor, the partition was misspelled: this has been corrected.- JAGaf22356
Events 69, 70, and 71 of the dm_core_hw EMS monitor was printing warnings in the event text, even if everything was correct in the event text. This behaviour has been corrected.- Core Hardware for Itanium (ia64_corehw).
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems):
- JAGaf29389
The ia64_corehw.dict file has been updated to reflect the fact that the monitor now runs on both IA-64 and PA systems, and that it gathers the input required for the fpl_em monitor.- JAGaf04964
The monitor has been corrected to stop generating multiple processes, and to eliminate defunct processes on HP systems.- JAGaf18393
The monitor manpage has been updated to include descriptions of the ia64_corehw.cfg and default_ia64_corehw.clcfg files.- JAGaf05703
The monitor has been corrected to generate events 101011 and 101012, instead of events 101001 and 101002, for temperature-related follow-up events on HP systems. Events 101011 and 101012 have also been reworded to clarify that the action taken will be that specified in the envd config file.- JAGaf12894
Event 104010 was reworded to cover situations where redundancy was regained, or where redundancy was observed to be present. Also, Event 104011 was reworded to cover situations where redundancy was lost, or where redundancy was observed to be absent.The following applies to IA systems only:
- JAGaf12894
Event 104010 was reworded to cover situations where redundancy was regained, or where redundancy was observed to be present. Also, Event 104011 was reworded to cover situations where redundancy was lost, or where redundancy was observed to be absent.- Core Hardware Monitor -- Asama (ipfcorehw_asama).
- N/A
- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi).
- N/A
- CPE Monitor (cpe_em).
The following applies to IA systems only:
- JAGaf18376
The manpage for the CPE monitor has been enhanced to refer to cpe_em.cfg and default_cpe _em.clcfg files.- JAGAF06496; JAGaf06269; JAGaf08063
Fixes were made for the following defects:JAGaf06496 : CPE monitor should ignore Platform-specific Data for rx4640 errors.
JAGaf06269 : CPE monitor needs to ignore inconsistent rx4640 data and datalength.
JAGaf08063 : CPE monitor misinterpretes value of Error Type
- CPU Monitor (lpmc_em)
The following applies to PA systems only:
- JAGaf11023
The CPU monitor was stopping and restarting, due to possible overflow of open file descriptors.- JAGae99200
The CPU monitor was reporting the error as a cache error, in the event description for floating point errors. This has been fixed.- JAGaf20319
The CPU monitor has been fixed, so that it reports the correct serial number in the event error details.- JAGaf18666
The CPU monitor has been fixed, so that it works even when Shared Cache is absent.- CPU Monitor -- Hitachi (cmc_em_hitachi).
- N/A
- Disk Array FC60 Monitor (fc60mon).
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems):
- JAGae97568
Unclear recommended actions reported by fc60mon in Events 4 and 37, and improper event descriptions, reported by fc60mon in Event 9, have been modified.- Disk Monitor (disk_em).
- rx9610 is replaced by Nike.
- JAGaf16966
On machines supporting hot swappable disks, events with a SERIOUS severity level were generating on booting. This is because the disks were not spinning automatically.- Fibre Channel Adapter (ql_adapter)
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- JAGaf06925
Support is required for the new event code ECC_FCP_LESS_INQ_DATA 1012. Support is provided to the fcp cdio event message : ECC_FCP_LESS_INQ_DATA- JAGaf17992
Event message TLOG_RESUME_FAILED needs change.- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- Forward Progress Log (FPL) Monitor (fpl_em)
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems):
- JAGaf20670
The monitor was incorrectly logging a set of error messages to the /var/opt/resmon/log/api.log file. The monitor continued to function properly after logging these messages. This problem has been fixed. The monitor does not log faulty error messages to the api.log file.
- JAGaf18378
The fpl_em monitor's man page was updated to reflect the correct path for the fpl_em.sapcfg and api.log files. Also, additional text was included for the fpl_em.cfg and default_fpl_em.clcfg files.- High Availability Disk Array Monitor (ha_disk_array).
- N/A
- High Availability Storage System (dm_ses_enclosure)
- N/A
- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- JAGaf22601
Event messages generated by the monitor need to be modified.- Kernel Resource Monitor (krmond)
- N/A
- Memory (dm_memory)
The following applies to PA systems only:
- JAGaf18850
The serial number displayed by the dm_memory monitor was incorrect.- JAGae44296
The Serial Number and Part Number of the DIMM will be reported for all Single Bit Error related events by the memory monitor.- Memory IA64 (memory_ia64)
The following applies to IA systems only:
- JAGaf25908
The term "RANK" should be replaced with the term "ECHELON", when displaying the DIMM Location of a multi-bit error (MBE) in the STM Logtool's vd/vda commands, and the Memory Information Tool on HP Integrity Servers that are cellular systems (hp superdome server SD16A, hp superdome server SD32A, hp superdome server SD64A, hp server rx8620, hp server rx7620). This is because the term "RANK" means something different on these systems, and the correct terminology to use on these systems is "ECHELON".Consequently, the memory_ia64 monitor has been modified to replace the term "RANK" with the term "ECHELON", when displaying the DIMM Location of MBE on HP Integrity Servers that are cellular systems. With this change, both the Logtool's vd/vda commands and the Memory Information Tool will reflect the use of the term "ECHELON", when displaying the DIMM Location of MBE on HP Integrity Servers that are cellular systems. The term "RANK" is still used, however, on HP Integrity Servers that are non-cellular systems.
- JAGaf18397
The IPF Memory Monitor (memory_ia64) has been updated to include information about the memory_ia64.cfg and default_memory_ia64.clcfg files in the "Files" section of the IPF Memory Monitor's manpage.- Memory Monitor -- Hitachi (ipfmemory_hitachi).
- N/A
- MSA1000 Storage Disk Array Monitor (msamon)
The following applies to PA and IA systems:
- JAGae98401
A new monitor is being added to monitor the HP MSA 1000 Storage Disk Array.
- From HP-UX 11i V2 March 2004 release of OnlineDiag, msamon monitors HP Storage Works Modular SAN Array 1000 supported on the following cards:
- A6795A
- A6826A
- From HP-UX 11i June 2004 release and HP-UX 11i V2 September 2004 release of OnlineDiag, msamon also monitors HP Storage Works Modular SAN Array 1000 supported on the following cards:
- A9782A (2 Gbps, 64-bit PCI Dual Channel combo)
- A9784A (2 Gbps, 64-bit PCI Dual Channel combo)
- Peripheral Status Monitor (psmmon).
- N/A
- RAID Adapter (dm_raid_adapter)
- JAGaf08549
Error message provides only the cause of the error and does not provide the error description.- Remote Monitor (RemoteMonitor).
- N/A
- SCSI Disk Monitor (scsi_disk).
- N/A
- SCSI Tape Monitor (dm_stape).
- N/A
- System Status Monitor (sysstat_em)
- N/A
- UPS Monitor (dm_ups).
- N/A
Changes to Platform and Interface
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems):
- JAGaf32486
The monconfig utility was displaying the following error, when monitoring was disabled/killed:____ERROR__________ usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] -e pattern_list... [-f pattern_file...] [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] [-e pattern_list...] -f pattern_file... [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] pattern [file...] ____________ERROR______________Now the problem has been fixed, so that these errors are no longer displayed.
- JAGaf30395
The send_test_event test utility was NOT finding the monitor name in the .sapcfg file associated with the monitor, when more than 256 entries were found across all the .sapcfg files together.The following error was being flashed:
send_test_event: Failed to find monitor name in sapcfg files.The send_test_event test utility has now been fixed to operate as expected, when more than 256 entries, across all .sapcfg files taken together, are found. The send_test_event test utility will now support upto 8192 entries, across all .sapcfg files, taken together.
- JAGaf23077
Monconfig was sometimes corrupting Monitoring Requests, while adding/modifying Monitoring Requests, when more than 256 entries were found across all sapcfg files, taken together.Monconfig has been fixed to allow Monitoring Requests to be added/modified to allow 8192 entries across all sapcfg files, taken together.
The following error will be flashed, once there is no more space available to accommodate the monitoring entries:
ERROR: Monconfig cannot add this Monitoring Request. Buffer holding Monitoring Entries has no more space. Delete or modify existing Monitoring Request(s) in order to add this Monitoring Request successfully.- JAGaf22321
While updating the monitor's .cfg files for configuration verbs, User would see the following error in the api.log file:-------------------Start Event-------------------- User event occurred at Mon May 3 17:37:55.709202 2004 Process ID: 5520 (/usr/sbin/stm/uut/bin/.../memory_ia64) Log Level: Error The event (100140) specified on DEFINE EVENT verb in the monitor's configuration file (/var/stm/config/tools/monitor/memory_ia64.cfg) or the Global configuration file could not be configured because of an internal memory error. Possible Causes/Recommended Action: A maximum of 1000 events can be configured for a monitor. Internal Application error. -------------------End Event----------------------Any changes made to configuration verbs are NOT effective when this problem occurs. This error was being flashed, even if the user had less than 1000 events configured for a monitor. Now the problem is corrected.
- JAGaf06008
The 'time_window' value in the clcfg file for hardware monitors is not used when events are generated. This was noticed with dm_memory, where event 4000 was triggered after 20 total occurrences of an SBE on a DIMM, not 20 within 24 hours. This problem appears only during the next startup of diagnostics after the cfg is modified, and the diagnostics are brought down. This problem is applicable to all the hardware monitors.The following changes apply to IA systems only:
- JAGae94038
The set_fixed -L command used to display all resources in the DOWN state, dumps core on IA systems running 11iV2.Customer-Visible Interface Changes
- N/A
If the SysFaultMgmt product is installed on your system, you must take the following steps, before and after updating the OnlineDiag product:
- Shut down the SysFaultMgmt subsystem via the command line:
/sbin/init.d/cimserver stop- Perform the OnlineDiag update.
- After the update is completed, restart the SysFaultMgmt subsystem via the command line:
/sbin/init.d/cimserver startNOTE:Should you need to kill monitoring during normal system operation, you must use steps 1 and 3, above, to avoid an automatic restart of monitoring; step 2 is the action you need to perform while monitoring is turned off.
Future updates to SysFaultMgmt and OnlineDiag will resolve this problem.
PROBLEM: Memory Page Deallocation (MPD) is not available on the RX 4610 computer.The Memory Page Deallocation (MPD), which runs on most current HP-UX computer systems, does not work on the RX 4610 computer. If you look in the activity log for memlogd, you will see a message saying, "unsupported device."
MPD cannot be implemented on RX 4610 because of the design of that system. The memlogd daemon cannot run on it.
PROBLEM: dm_fc_hub and dm_fc_sw monitors not functional.In the June 2003 release of 11i V2.0, the Fibre Channel Arbitrated Loop Hub monitor (dm_fc_hub) and the Fibre Channel Switch monitor (dm_fc_sw) are probably not functional, because these monitors depend on SNMP functionality which may not be included in this release (check the latest version of the Web page for the EMS Release Notes for the most current information) .
For the September 2004 release of HP-UX 11.23, the following monitors are scheduled to be available:The following monitors are NOT provided:
- CMC Monitor (cmc_em).
- Core Hardware (dm_core_hw)
- Core Hardware for Itanium (ia64_corehw)
- Core Hardware Monitor -- Asama (ipfcorehw_asama)
- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi)
- CPE Monitor (cpe_em)
- CPU Monitor (lpmc_em)
- CPU Monitor -- Hitachi (cmc_em_hitachi)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fibre Channel Adapter (ql_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- Forward Progress Log (FPL) Monitor (fpl_em)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- Kernel Resource Monitor (krmond)
- Memory (dm_memory)
- Memory IA64 (memory_ia64)
- Memory Monitor -- Hitachi (ipfmemory_hitachi)
- MSA1000 Storage Disk Array Monitor (msamon)
- Peripheral Status Monitor (psmmon)
- RAID Adapter (dm_raid_adapter)
- Remote Monitor (RemoteMonitor)
- SCSI Disk Monitor (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
- dm_FCMS_adapter
- fw_disk_array: hardware not supported on system
- scsi123_em: hardware not supported on system
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files