These release notes cover the Special vPar Release (June 2002) of Support Plus for HP-UX 11i (11.11) running on S800 systems.
The Support Tools Manager (STM) provides a complete set of online support tools for HP-UX systems, enabling you to verify and troubleshoot PA-RISC system hardware, and to examine system logs.
STM offers several tool types, including information tools, verifiers, exercisers, expert tools, firmware update tools, diagnostics and utilities.
Installed with STM (as of IPR 9902) are the EMS Hardware Monitors, an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. For more information, see /usr/sbin/stm/Rel_NOTES.HWE.
For the latest and most complete information on STM and EMS Hardware Event Monitors, see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.
The online Support Tools Manager (STM) was enhanced and updated for the current release.
Changes to User Interface and Platform
Hence, users can potentially see the same memory errors logged by the memlogds in different virtual partitions. For example, a memory error, 0xadb2074, detected by memlogd A on virtual partition A belongs to the memory allocated to the virtual partition monitor, so memlogd A logs that error in its memlog file; later on, the same memory error, 0xadb2074, belonging to the memory allocated to the virtual partition monitor occurs again and is detected by memlogd B on virtual partition B, so memlogd B logs that error in its memlog file; hence, when the user views the memlog file in partition A, it will see the memory error 0xadb2074 and then when the user views the memlog file in partition B, it will also see the memory error 0xadb2074 as well. As a result, users can potentially be alerted by the EMS HW Memory Monitor on each virtual partition (depending on how the memory monitor's configuration file, default_dm_memory.clcfg, file is setup on each virtual partition) on the system about the same faulty memory component (this is because from resulting from the same memory errors).
This page status is used to indicate to the user that the following page could not be entered into the page deallocation table (PDT) either because the pdt is full or an error was encountered when memlogd tried to enter this page into the PDT.
This page status logged by the memlogd in the memlog file only applies to pages not belonging to the virtual partition which the memlogd is running in to indicate to the user that the page has been entered into the PDT but memlogd can not determine if the OS has deallocated the page or not.
Keystone(Perfpak) hversion 0x5eb
Matterhorn(Leone) 0x5eb
Marcato W+ DC- 0x5da
Superdome(Pacu) 0x5ea
Failed to convert HPA (0xfffffffffce7a000) to spu_number. It may be possible that the CPU is deconfigured or the CPU is not in the Current partition.Even though this looks like an error, it in fact is NOT an error because it is referring to the CPU outside the partition.
The CPU (LPMC) monitor in non-vPar environment will deactivate a CPU and activate one of the iCOD CPUs if any are available. Since, in vPar environment, iCOD CPUs are NOT visible to the partition, the monitor will not be able to activate the CPU. The user is advised to use iCOD command to find out if iCOD CPUs are available on the system and activate one of them. To determine the number of iCOD cpus that are instantly activatable for the local vPar, use the following command:
icod_stat -iand the command will print out a single number. Alternatively, the user can use the command icod_stat and look for information for Unassigned processors that can be assigned.
To activate an iCOD processor, use the command:
icod_modify -a 1 \
<[description]:user_name:manager_name:manager_email:manager_phone>
For more information on the iCOD commands, refer to iCOD
documentation.
In vPar environment, when the CPU (LPMC) monitor marks the last CPU in the partition for deconfiguration, the user is strongly advised to reboot the system - rather than rebooting the partition only. When the partition is rebooted, the faulty CPU will still be active but when the system is rebooted after bringing all the partitions down, the faulty processor will not be visible to the system. The goal of the CPU (LPMC) monitor is to remove the faulty component and this goal can be achieved only by rebooting the system.
hpfcs214.cup.hp.com
Dev Last Last Op
Num Path Product Active Tool Status
=== ==================== ========================= =========== ===== ========
1 10 Bus Adapter (582)
2 10/0 PCI Bus Adapter (782)
3 10/0/12/0 Core PCI 100BT Interface
.
.
.
25 255/0/0.0 iSCSI Virtual Node (iSCSI
26 255/0/0.0.0.0 SCSI Disk (SEAGATEST39103
27 255/0/0.0.0.1 SCSI Disk (SEAGATEST39103
28 255/0/1.0 iSCSI Virtual Node (iSCSI
29 255/0/1.0.0.0 SCSI Disk (SEAGATEST39103
30 255/0/1.0.0.1 SCSI Disk (SEAGATEST39103
31 255/0/2.0 iSCSI Virtual Node (iSCSI
32 255/0/2.0.0.0 SCSI Disk (SEAGATEST39103
Example of STM map after filtering out iSCSI devices:
hpfcs214.cup.hp.com
Dev Last Last Op
Num Path Product Active Tool Status
=== ==================== ========================= =========== ===== ========
1 10 Bus Adapter (582)
2 10/0 PCI Bus Adapter (782)
3 10/0/12/0 Core PCI 100BT Interface
.
.
.
28 255/0/0.0 iSCSI Virtual Node (iSCSI
29 255/0/1.0 iSCSI Virtual Node (iSCSI
30 255/0/2.0 iSCSI Virtual Node (iSCSI
An unexpected return came from the stable store function. This is informational only and does not materially effect the results.
An attempt to obtain exclusive access to a memory controller via the vpmonitor driver failed. This error is unexpected. The memory expert tool can not continue with the memory test. Possible Cause(s)/Recommended Action(s): Internal application error.
The memory subsystem is not recognized by this tool to support virtual partitions. The memory expert tool can not continue with the memory test. Possible Cause(s)/Recommended Action(s): Internal application error.
An attempt to read syndrome error information was not successful. The memory expert tool can not continue with the memory test. Possible Cause(s)/Recommended Action(s): A process (possibly memlogd) in another virtual partition has took the virtual partition lock away from the memory expert tool. Rerun the tool later.
An attempt to release the virtual partition lock failed. This error is unexpected. The memory expert tool can not continue with the memory test. Possible Cause(s)/Recommended Action(s): Internal application error.
-- Error -- Hostname (sshapd1.rose.hp.com) could not be connected due to an invalid user login name or password. Please Refer to the UI Activity Log for more details.The current workarounds are to: 1) login as a different user. 2) change user password.
CAUTION! RAID 0 DOES NOT PROVIDE DATA REDUNDANCY. THE FAILURE OF ANY DISK WITHIN A RAID 0 LUN WILL RESULT IN THE LOSS OF ALL DATA IN THAT LUN. RAID 0 SHOULD ONLY BE USED FOR NON CRITICAL DATA THAT COULD BE LOST IN THE EVENT OF A HARDWARE FAILURE. RAID 0 SHOULD ONLY BE USED IN SITUATIONS WHERE HIGH PERFORMANCE IS IMPORTANT THAT DATA PROTECTION.to:
CAUTION! RAID 0 does not provide data redundancy. The failure of any
disk within a RAID 0 LUN will result in the loss of all data in that
LUN. RAID 0 should only be used for non-critical data since it might
^ ^^^^^^^^^^^^^^
be lost in the event of a hardware failure. RAID 0 should only be used
^^
in situations where high performance is more important than data
^^^^ ^^^^
protection.
where the ^^^^ indicates change in the message text.
The child process with process ID xxxxx (0xhhhhh) exited with unexpected status (125 (0x7d)). This child process was launched to exercise floating point coprocessor associated with processor x (0xh) at path xxxxxxxxxxx (at HPA 0hhhhhhhhhhhhhhhh). Possible Causes/Recommended Action: Internal application error.
Address: 0x00000000000001 Page: 0 Page Status: RESERVED by operating system Board: CELL 1/4 0a/b/c/d/b/c/d Physical Bank: N/A Logical Bank: N/Awhereas now, that same request returns
Address: 0x00000000000001 Page: 0 Page Status: RESERVED by operating system Board: CELL 1/4 0a/b/c/d Physical Bank: N/A Logical Bank: N/A
(note the change for "Board:" not having /b/c/d displayed twice.)
Info Tool Causes HPMCs on K- and T-Class (HP-UX 11i only)
K-Class and T-Class computers running HP-UX 11i will experience HPMCs or Data Page Faults if you run the STM Info Tool to retrieve configuration information from HP-PB SE SCSI adapters (JAGad88317, JAGad97126).
The root cause is a problem in the HP-UX 11i kernel, which is exposed when the Info Tool is run as described above. To correct the problem and avoid potential HPMCs, load patch PHKL_25552 or its successor. This is a kernel patch and requires a system reboot.
STM only exposes this problem if the Info Tool is run on HP-PB SE SCSI adapters. STM does not expose the problem if the Info Tool is run against other I/O devices.
Use CHART to report defects in STM. The project name is diag.stm.tools.hpux for individual tools, and diag.stm.ui.hpux for the user interface. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The product number for STM is B4708AA.
SD PRODUCT: Sup-Tool-Mgr
Description: On-line Diagnostic System (Series 800/700)
SD SUB-PRODUCT: Manuals
Description: Support Tools Manager Manual Pages
FILESET: STM-MAN
Description: S800/S700 STM Manual Pages
FILESET: STM-SHLIBS
Description: S800/S700 STM Shared Libraries
FILESET: STM-UI-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/S700 STM User Interface
FILESET: STM-UUT-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/700 STM Unit Under Test Runtime