These release notes cover the June 2002 release of the Support Tools (diagnostics) for HP-UX 11i V1.6.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
Included with the OnlineDiag bundle of support tools are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs.
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can eliminate most undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Diagnostics section of Hewlett-Packard's online documentation Web site at:
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site.
For the most current information on HP-UX 11i V1.6 diagnostics, see the following Web pages at the Diagnostics site:
- "DIAGNOSTICS.readme for HP-UX 11i V1.6 (June 2002)" at:
http://docs/hp.com/hpux/onlinedocs/diag/st/str_1122.htm- "Release Notes for STM on HP-UX 11i V1.6 (June 2002)" at:
http://docs/hp.com/hpux/onlinedocs/diag/stm/str_1122.htm- "Release Notes for EMS Hardware Monitors (HP-UX 11i V1.6, June 2002)" at:
http://docs/hp.com/hpux/onlinedocs/diag/ems/emr_1122.htmFor 11i V1.6, the EMS hardware monitors use version A.03.30 of the EMS platform. HP-UX 11i V1.6 does not support the full functionality of the EMS platform. However, all EMS functionality required by the hardware monitors is provided.
The notification method "SNMP" that can be configured (in previous releases) for EMS HW Monitors will probably NOT be available to monitors running on HP-UX 11i V1.6 (Check the latest the latest Web page version of the EMS Release Notes for the most current information).
Memory Page Deallocation (MPD) and the memlogd daemon are not implemented on the RX 4610 computer.
Changes in the EMS Hardware Monitors for the the June 2002 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
N/A
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
- JAGad51518; JAGad75425; JAGad73813; JAGad58875
The following JAGS have been fixed in this armmon release:
- JAGad51518: armmoncfg.clcfg file contains wrong information.
- JAGad75425: Template versions of monitor should eventually be used on an llx OS.
- JAGad73813: The catalog for armmon and fc60mon refer to SCSI-TEMPLATE.
- JAGad58875: EMS monitor fc60mon and armmon are stated even if no array is connected.
- Chassis Code Monitor (dm_chassis).
N/A- CMC Monitor (cmc_em).
- Updated CMC monitor.
- Core Hardware for Itanium (ia64_corehw).
- JAGae32973
The monitor was enhanced for 11.22 release to identify the Platform (Ironman, Everest....) to be able to take different actions accordingly. In the code that checks the box_id of the platform, AzusA was not listed, and therefore, the monitor would not run on that platform. This submittal fixes the problem. The CHART info is : JAGae32973 - CoreHW monitor does not recognize AzusA boxes.- JAGae28704
The CoreHW monitor is now enhanced to briefly explain the cause of a Critical Interrupt on IA-64 systems. Following is the current content of the Event Details for events 114000-114999: Event Details : Event Date .............: Thu Jun 6 16:17:50 2002 Sensor Number ..........: 0xfb Sensor Type ............: Critical Interrupt Sensor Class ...........: Sensor specific Sensor Reading/Offset...: 0000 (Offset) Event Type.............: Assertion Entity ID ..............: Unknown Entity FRU Id Info......: Unknown The new content of the Event Details will include a brief explanation of the Sensor Reading/Offset value, as below: Event Details : Event Date .............: Thu Jun 6 16:17:50 2002 Sensor Number ..........: 0xfb Sensor Type ............: Critical Interrupt Sensor Class ...........: Sensor specific Sensor Reading/Offset...: 0000 (Offset) Diagnostic Interrupt/FrontPanel NMI Event Type.............: Assertion Entity ID ..............: Unknown Entity FRU Id Info......: Unknown An important thing to remember is that the text above is explanation of just one value of the Sensor Reading/Offset. There are ten (10) possible values for the Critical Sensor and their meanings are : 0x00 = Diagnostic Interrupt / FrontPanel NMI 0x01 = Bus Timeout 0x02 = I/O channel check NMI 0x01 = Bus Timeout 0x02 = I/O channel check NMI 0x03 = Software NMI 0x04 = PCI PERR 0x05 = PCI SERR 0x06 = EISA Fail Safe Timeout 0x07 = Correctable Bus Error 0x08 = Uncorrectable Bus Error 0x09 = Fatal NMI (port 61h, bit 7)- JAGae27752
A fix was made for JAGae27752: The behaviour of the monitor prior to the fix, as per the original design, was that on 11.22 systems, when the monitor received a SEL event that it did not understand, it shut the system down. It would also shut the system down for an overtemp condition. The monitor's design is changed so that it does NOT shut the system down when it does not understand a SEL event. The behaviour of the monitor after the change will be: On Ironman and AzusA systems, when the monitor receives a SEL event indicating that the critical temp limit is being reached, it will shut the system down. On LongsPeak and Everest systems, the BMC is still responsible for shutting the system down in critical conditions, including overtemp conditions.- JAGae22162
This is the fix to the monitor for reading if a sensor is ENABLED before reading its value.- Disk Array FC60 Monitor (fc60mon).
N/A- Disk Monitor (disk_em).
- JAGae09750; JAGae31385
JAGae09750- disk_em will monitor fixed disks with HP supported firmware version and also specific products if they are fixed disks. The disk_em monitor functionality after this change will be like this: a) The monitor will ignore any removable medium disk devices, like MO, CDROM, etc. b) The monitor monitors a specific set of product numbers, if they are FIXED disks. The disk_em monitor will determine the set of disks to monitor by getting the following information related to device:Based on the above information, we determine whether to monitor or not, as shown in the table below:
- Does the disk have HP supported firmware version?
- Does the disk belong to specific set of product number list?
- Is the disk of type FIXED?
- Is the device of type removable medium?
CRITERIA MONITOR NOT-MONITOR 1 AND 3 Yes ----- 2 AND 3 Yes ------ 1 AND 2 AND 3 Yes ----- 1 AND 4 ---- yes 2 AND 4 ---- yes 1 AND 2 AND 4 --------- yesA specific set of product number lists will be updated on the following web page: http://wojo.rose.hp.com/hpux/onlinedocs/diag/ems/emd_disk.htm JAGae31385-Implemented a defect table in disk_em monitor, which will specify the maximum number of allowable defects based on the disk capacity. disk_em monitor checks the current defects of the drive against this table, to determine whether we should generate event 4 message, or not.HDD Capacity P+G List Threshold G List Only Threshold 2 GB and <2 GB 1024 NA 4 GB 2048 NA 9 GB and Above NA 8190NOTE:The 9GB drive which has model number ST19171 has been treated, as this comes under 4GB category; hence, the maximum allowed defects for this drive is 2048(P+G), because this drive supports a P+G list of max 2900. This information will be updated on the following web page: http://wojo.rose.hp.com/hpux/onlinedocs/diag/ems/emd_disk.htm
- JAGae20019; JAGae28126
A fix was made to stop disk_em monitor from generating error messages (event 4) for external disk drives (SC-10 and DS2100) connected through QLogic 12160 ultra3 SCSI card (HBA), even though the devices are in a good state.- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- JAGae22655; JAGae26505
Added support for 2 GB Fibre Channel Adapter A6795A.- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
N/A- High Availability Disk Array Monitor (ha_disk_array) .
- JAGae11549
Whenever PCI card is suspended, the ha_disk_array monitor will not generate events for the devices connected for the suspended PCI card. This has been fixed.- JAGae05958
A fix was made for JAGae05958 (ha_disk_array can potentially fill up the file system).- High Availability Storage System (dm_ses_enclosure)
- JAGae11549
Whenever PCI card is suspended, the dm_ses_monitor will not generate events for the devices connected for the suspended PCI card.The dm_ses_enclosure monitor is Predictive Enabled (multi-view) for 11.22 release.
- JAGae05962
Fixed JAGae05962 (There is a possibility of dm_ses_enclosure monitor filling up the file system.).- Kernel Resource Monitor (krmond)
N/A- Memory IA64 (memory_ia64)
- Added support for an EMS memory monitor on IA-64 to support the following IA-64 systems:
System Name: System Model String Everest: ia64 hp server rx2600 Everest DC-: ia64 hp server rx5630 Long's Peak: ia64 hp workstation zx6000, ia64 hp server rx2600 Wilson Peak: ia64 hp workstation zx2000- Peripheral Status Monitor (psmmon).
- Fix psmmon to recover from a failure to determine if psmctd is running. Prior to fix, if psmmon couldn't determine if psmctd was running, it would return RM_ERROR_TYPE on all requests for device state. It would continue to report this error even when it later was able to determine psmctd was running.
- Remote Monitor (RemoteMonitor).
- Initial submittal of the Remote Monitor for HP-UX 11.20.
- SCSI Disk Monitor (scsi_disk).
- JAGad96956; JAGae11549
Fixes were made to:
- To stop monitor from getting info from Raid 4si disks.
- To support monitor for PCI suspend/resume.
- JAGae96420; JAGad95550; JAGad94427; JAGad94330
To stop polling devices such as Magneto Optical disks, Mitsumi CD-ROM, HP DVD ROM 305. Handling of status returned by tlscsidev such as STE_SCSI_DEV_INCOMPLETE.- SCSI Tape Monitor (dm_stape).
- JAGae28998
The error code STE_SCSI_DEV_INIT_FAILED was not being handled within perform_polling.c (prior to A.25 this error code was handled as a default condition, but the default error handling was changed in perform_polling.c revision 1.29). The code changes required were the additions of this error code to the switch statement within perform_polling.c, as well as the inclusion of this error code in the table within perform_polling.h.- System Status Monitor (sysstat_em)
N/A- UPS Monitor (dm_ups).
- JAGae14473
If the dm_ups EMS monitor is running and the /etc/ups_conf file contains a "upstty" entry that does not end in ":SOLA", then dm_ups will erroneously generate event #43. This bug has been fixed.- JAGae05950; JAGab67905
Customers have complained about the temporary format file named /var/tmp/dm_ups.fmt file filling up their disk. This version of dm_ups will remove the dm_ups.fmt file after each event is formatted and logged.Changes to Platform and Interface
- JAGad99498
Increased maximum number of disabled instances that can be in the /var/stm/data/tools/monitor/disabled_instances file to 1024.Modified moncheck, toggle_switch and startmon_client, used by monconfig to check monitoring requests, disable monitoring and enable monitoring respectively and psmctd to use the new rm_service_up routine in the EMS library to check if the EMS services are available prior to attempting to connect to EMS.
The code will display an error message to the user indicating the registrar service has not been started if the 5 minutes expires. "EMS Registrar inetd service not started. Start registrar and retry".
This message will be displayed in monconfig when the user selects the K)ill or C)heck command. It is not displayed for an E)nable command as there is no communication between monconfig and the program that actually enables the monitors. However, if the service is not available and the user selects E)nable, the command will complete but the monitors will NOT be enabled and the state displayed in monconfig will indicate monitors are NOT enabled. No errors can be logged into the EMS error logs as logging requires connection to the registrar, which isn't started.
For psmctd, an error will be logged into the System Activity log indicating psmctd exited due to an initialization error below:
Wed Feb 27 15:29:12 2002: Daemon process (psmctd) with process identifier (26984) exited. Wed Feb 27 15:29:12 2002: Daemon process completed with exit_status SYS_INIT_FAILED_EXIT (100) indicating the process exited because it could not perform basic initialization. Possible Causes/Recommended Action: Process internal error. For a remap hardware process, check the map log for more information. Wed Feb 27 15:29:12 2002: Daemon process (psmctd) will not be restarted as restart attempts have exceeded the maximum allowed (5). Start daemon process manually using User Interface.- JAGad94752
Added a description of what a monitoring request is to the text displayed by monconfig.- Increased number of hardware paths allowed in disabled_instances file to 1024.
- Enhanced monconfig to perform error checking on the input of the client configuration file. In addition, monconfig will not allow a user to input a client configuration file if more than one monitor is selected in the Add or Modify command. This is because client configuration files are monitor specific and you cannot select one file and have it apply to multiple monitors. If the file name input is not in the correct directory, the following error will be displayed:
ERROR: File name must be in the directory /var/stm/config/tools /monitor/. Please re-enter: []If the file name input is not in the correct format, the follow ing error message will be displayed:ERROR: File name must be of the format *_sysstat_em.clcfg. Please re-enter: []If the file doesn't exist, the following warning will be displayed:WARNING: File doesn't exist. Events will not be sent for this monitoring request until the file exists.Customer-Visible Interface Changes
N/A
PROBLEM: Memory Page Deallocation (MPD) is not available on the RX 4610 computer.The Memory Page Deallocation (MPD), which runs on most current HP-UX computer systems, does not work on the RX 4610 computer. If you look in the activity log for memlogd, you will see a message saying, "unsupported device."
MPD cannot be implemented on RX 4610 because of the design of that system. The memlogd daemon cannot run on it.
PROBLEM: dm_fc_hub and dm_fc_sw monitors not functional.In the June 2002 release of 11i V1.6, the Fibre Channel Arbitrated Loop Hub monitor (dm_fc_hub) and the Fibre Channel Switch monitor (dm_fc_sw) are probably not functional, because these monitors depend on SNMP functionality which may not be included in this release (check the latest version of the Web page for the EMS Release Notes for the most current information) .
For the June 2002 release of HP-UX 11i V1.6, the following monitors are scheduled to be available:The following monitors are NOT provided:
- AutoRAID Disk Array (armmon)
- CMC Monitor (cmc_em).
- Core Hardware for Itanium (ia64_corehw)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- Kernel Resource Monitor (krmond)
- Memory IA64 (memory_ia64)
- Peripheral Status Monitor (psmmon)
- Remote Monitor (RemoteMonitor)
- SCSI Disk Monitor (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
- dm_core_hw: replaced by ia64_corehw
- dm_FCMS_adapter
- fw_disk_array: hardware not supported on system
- lpmc_em: replaced by cmc_em
- scsi123_em: hardware not supported on system
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files