These release notes cover the December 2002 release of Support Plus for HP-UX 11i (11.11) running on S800/S700 systems.
The Support Tools Manager (STM) provides a complete set of online support tools for HP-UX systems, enabling you to verify and troubleshoot PA-RISC system hardware, and to examine system logs.
STM offers several tool types, including information tools, verifiers, exercisers, expert tools, firmware update tools, diagnostics and utilities.
Installed with STM (as of IPR 9902) are the EMS Hardware Monitors, an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. For more information, see /usr/sbin/stm/Rel_NOTES.HWE.
For the latest and most complete information on STM and EMS Hardware Event Monitors, see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.
The online Support Tools Manager (STM) was enhanced and updated for the current release.
Changes to User Interface and Platform
Removed variable address of type int, as it cannot hold value of type unsigned long long; hence, cause for address truncation has been removed.
Set the page status of the pdt entries to reflect only the pdt was successful, but we cannot tell if OS deallocation of this page has been performed yet. This is to fix the problem that not all entries from the PDT table will automatically be deallocated, to handle the scenario where the page status is Active, but the memory page containing the memory error (which met the "2 within 24 hour" threshold) was not able to be entered into the PDT (e.g., PDT is full), and was also not able to request the OS to deallocate the page.
... PDC Version (core cell)....:??.??
This was labeled incorrectly. It wasn't the PDC version, but it was the PDC Firmware datecode. This revision of the system info tool replaces the incorrect label, which now is displayed in the following manner:
... PDC Firmware Date Code.....: 4228 (yyww 1960+yy=year;ww=Week of year)
Once the invalid error address of -1 is entered into the PDT, the memory information tool would not be able to run to completion successfully and each time when memlogd restarts, it will log error messages to the memlogd activity log file indicating that it can not convert the invalid error address (-1) to its physical location.
Symptoms:
On both non-vPar-Superdome and vPar-Superdome system:
.... <hostname> : <IP address> ....
-- Information Tool Activity Log for MEMORY on path <hardware_path> --
Log creation time:
<timestamp>
<timestamp>: Information tool (memory) starting on path (<hardware_path>).
<timestamp>: The routine std_phys_addr_info received an unexpected
failure condition while making a PDC call to
determine the physical location for the address
ffffffffffffffc1, PDC return status =
fffffffffffffff6.
Possible Causes/Recommended Action:
Internal application error.
<timestamp>: The attempt to convert a PDT entry to determine the
FRU's physical location failed unexpectedly.
Possible Cause(s)/Recommended Action(s):
Internal application error.
<timestamp>: The library call to tlmem.sl failed. The tool was
trying to convert the phyiscal address to identify
the memory array.
Possible Causes:
Internal application error.
Recommended Action:
1. Rerun the tool and if tool fails again, report
the error.
<timestamp>: The tool was unable to access the Page Deallocation
Table (PDT).
<timestamp>: The tool was unable to get memory information.
<timestamp>: Tool completed with exit_status MOD_INCOMPLETE (5)
indicating the tool was started but could not
properly complete execution.
* The following error messages logged in the memlogd activity log
file:
.... <hostname> : <IP address> ....
-- Memlogd Activity Log on the host <hostname> --
Log creation time:
<timestamp>
Log was last reset by user (root) on host (<hostname>).
<timestamp>: Memlogd diagnostic logging daemon shut down.
<timestamp>: Memlogd diagnostic logging daemon started.
<timestamp>: The routine std_phys_addr_info received an unexpected
failure condition while making a PDC call to
determine the physical location for the address
ffffffffffffffc1, PDC return status =
fffffffffffffff6.
Possible Causes/Recommended Action:
Internal application error.
<timestamp>: The attempt to convert a PDT entry to determine the
FRU's physical location failed unexpectedly.
Possible Cause(s)/Recommended Action(s):
Internal application error.
Fix to JAGae37088:
To fix this, mt_memlogd has been modified to workaround the -1 error address returned from the firmware by not handling any memory errors with -1 error address.
SAL
Thu May 9 16:18:27 2002: Attempt to rename the system map file from
/var/stm/data/uut_status_tmp to
/var/stm/data/uut_status failed with errno 2.
EACCES (13), EBUSY (16), EDQUOT (69), EEXIST (17),
EFAULT(14) errnos returned from a rename system call
indicate that one of the file paths was not a valid
path to a file.
Possible Causes/Recommended Action:
Correct the permission or other indicated file
system problem.
Thu May 9 16:18:27 2002: Re-map hardware configuration process with process
identifier (5772), initiated by user request,
completed.
UIAL
Fri May 10 09:06:04 2002: User Name: root, UI Process ID: 6704
The UUT status file
(/var/tmp/stm6704/hpdst325/data/uut_status)
representing the new device map from the Unit Under
Test (UUT) could not be successfully loaded into
memory.
Fri May 10 09:06:04 2002: User Name: root, UI Process ID: 6704
The most recent device map for the Unit Under Test
(UUT) could not be built successfully. This means
operations apparently available, based on this old
map, may not be, and might fail.
Please refer to the Map Log and/or the System
Activity Log on that system for more details.
Changed code to correctly display the software id of the system, as opposed to the negative software id that was displayed earlier.
No Write Memory Test was incorporated in Memory Information Tool. The Memory Information Tool's Activity Log will display whether the Hardware Memory Protection Mechanism test Passed or Failed. It will also report the address where it failed, in case of a failure.
Example of suspended message:
.... hpdst268.cup.hp.com : 15.244.81.93 ....
-- Information Tool Log for PCI SCSI Interface on path 0/10/0/0 --
Log creation time: Wed Aug 28 15:51:15 2002
Hardware path: 0/10/0/0
The pci path (0/10/0/0) is currently suspended and no info data can be
retrieved until it is resumed.
Example of non-suspended message:
.... hpdst268.cup.hp.com : 15.244.81.93 ....
-- Information Tool Log for PCI SCSI Interface on path 0/10/0/0 --
Log creation time: Wed Aug 28 15:52:12 2002
Hardware path: 0/10/0/0
Product ID: PCI SCSI Interface
Device ID: 0x000f
Revision ID 0x0001
Vendor ID: 0x1000 ( Symbios Logic Inc.)
Class Code: 0x010000
Base Class: 0x01 ( Mass Storage Controller. )
Sub-Class/Interface: 00/00 ( SCSI bus controller )
Device Status: 0x0200
Bit 9-10: DEVSEL timing 01 - medium
When many A5236As (Transformers) are connected, to use this tool follow these steps:
The multiple update tool will download the firmware on all the controllers.
This is related to memory expert tool. On selection of the following tests -- Read, Write, and Read-Write -- UI would select only the default, i.e., the Read-Write test. With this problem, customer will not be able to perform read and write tests.
RESOLUTION:
Now the code is modified to handle the proper selection through the UI, and to perform the tests for all options available in the expected way.
Memlogd on non-Superdome-class systems is reporting the page status of "marked for deallocation" pages as having a page status of "Deallocated: page is no longer in use", instead of a page status of "Pending: page could not be obtained", when displaying the memlog file via Logtool.
For example, there may be a solid sbe that has been entered into the Page Deallocation Table (PDT) by memlogd which continues to occur. Although memlogd has requested the OS to set the page containing this sbe to bad to prevent further access to this page, the page can only be marked for deallocation, but is still active. Hence, if the page status of this page shows "Deallocated: page is no longer in use", and errors on this page continue to occur, this page status is truly incorrect. This often then leads to confusion for the user, who asks: "The page is in the PDT and the page status shows that it is deallocated, so why am I still getting memory errors on this page???".
In this case, the following is an example of the behavior that will be seen with this problem:
*** Display of the memlog file via Logtool: a memory entry with a count of 2 and a page status of "Deallocated: page is no longer in use."
Memory Controller in Slot EXT0 ========================================================== Slot: 0a Error Type: Single/hard: solid, repeatable single-bit error. Page Status: Deallocated: page is no longer in use. Bit Num / Bank: 27 / 0 Logged By: Memlogd First Detected: Tue Sep 10 23:33:39 2002 Last Detected: Tue Sep 10 23:36:41 2002 Error Count: 2 Error Addr: 0x2188ded0 ==========================================================
*** Display of the memlog file via Logtool: sometime later, the count of the memory error goes up, even though the page status of this memory error was previously "Deallocated: page is no longer in use":
Memory Controller in Slot EXT0 ========================================================== Slot: 0a Error Type: Single/hard: solid, repeatable single-bit error. Page Status: Deallocated: page is no longer in use. Bit Num / Bank: 27 / 0 Logged By: Memlogd First Detected: Tue Sep 10 23:33:39 2002 Last Detected: Tue Sep 10 23:37:42 2002 Error Count: 3 Error Addr: 0x2188ded0 ==========================================================
Fix to JAGad96824:
To fix this, modified memlogd to set the page status of "marked for deallocation" pages as having a page status of "pending: page could not be obtained" in the memlog file.
Info Tool Causes HPMCs on K- and T-Class (HP-UX 11i only)
K-Class and T-Class computers running HP-UX 11i will experience HPMCs or Data Page Faults if you run the STM Info Tool to retrieve configuration information from HP-PB SE SCSI adapters (JAGad88317, JAGad97126).
The root cause is a problem in the HP-UX 11i kernel, which is exposed when the Info Tool is run as described above. To correct the problem and avoid potential HPMCs, load patch PHKL_25552 or its successor. This is a kernel patch and requires a system reboot.
STM only exposes this problem if the Info Tool is run on HP-PB SE SCSI adapters. STM does not expose the problem if the Info Tool is run against other I/O devices.
Use CHART to report defects in STM. The project name is diag.stm.tools.hpux for individual tools, and diag.stm.ui.hpux for the user interface. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The product number for STM is B4708AA.
SD PRODUCT: Sup-Tool-Mgr
Description: On-line Diagnostic System (Series 800/700)
SD SUB-PRODUCT: Manuals
Description: Support Tools Manager Manual Pages
FILESET: STM-MAN
Description: S800/S700 STM Manual Pages
FILESET: STM-SHLIBS
Description: S800/S700 STM Shared Libraries
FILESET: STM-UI-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/S700 STM User Interface
FILESET: STM-UUT-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/700 STM Unit Under Test Runtime