Release Notes for STM on HP-UX 11i (11.11) (September 2001)
These release notes cover the September 2001 release of Support
Plus for HP-UX 11i (11.11) running on S800/S700 systems.
Overview
The Support Tools Manager (STM) provides a complete set of online
support tools for HP-UX systems, enabling you to verify and
troubleshoot PA-RISC system hardware, and to examine system logs.
STM offers several tool types, including information tools,
verifiers, exercisers, expert tools, firmware update tools,
diagnostics and utilities.
Installed with STM (as of IPR 9902) are the EMS Hardware Monitors,
an important tool for maintaining system availability. The EMS
hardware monitors allow you to monitor the operation of a wide
variety of hardware products and be alerted immediately if any
failure or other unusual event occurs. For more information, see
/usr/sbin/stm/Rel_NOTES.HWE.
Documentation
For the latest and most complete information on STM and EMS
Hardware Event Monitors, see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference
Cards, Frequently Asked Questions (FAQs), and much other
material.
Changes
The online Support Tools Manager (STM) was enhanced and updated
for the current release.
Changes to User
Interface and Platform
- Added support for VPAR (virtual partitions), an optional
feature initially offered on N-Class and L-Class computers. If
virtual partitions are installed, the behavior of memlogd and some
tools will be different than normal. For more information, see "STM
and Virtual Partitions" at:
//docs.hp.com/hpux/onlinedocs/diag/stm/stm_vpar.htm
- Fixed a problem that occurred on the March 01 and June 01
diagnostic releases, whereby messages from the user interface and
the tools were not being displayed.
- JAGad7422
Fixed problem, whereby customer-licensed tools did not work on hp
server rp8400 (S16K-A) and some new Superdome systems.
Background:
On the June 2001 release of diagnostics for HP-UX 11i, there is a
problem with the customer-licensed diagnostic tools for:
- The hp server rp8400 (9000/800/S16K-A). This system may
also be known as "Keystone" or S-Class. Both 650 and 750 MHz
systems are affected.
- New versions of Superdome: 9000/800/SD16000, SD32000 and
SD64000. These systems may also be known as "Caribe" 16-way,
32-way, and 64-way.
The following types of tools may require a customer license
and hence may be affected by this problem:
- Expert tools
- Firmware update tools
- Diagnose tools
- Several utilities: (MOutil, copyutil, and modmutil)
- Offline (ODE) tools:
ACMEDIAG AR60DIAG AR60DIAG2 ARDIAG ARDIAG2 ASTRODIAG ASTRODIAG2
CASARRY CIODIAG DINOTEST DISKEXPT DISKEXPT2 EDBC EDPROC
IKEDIAG2 JAVADIAG KEYDIAG L2DIAG LASIDIAG LDIAG MEMTEST MEM2
MULTIDIAG NIKEARRY NIKEARRY2 PDIAG REODIAG SIODIAG TDIAG
TIMIDIAG TOGODIAG U2TEST UDIAG VADIAG VADIAG2 VXITEST WDIAG
WAXTEST
This problem is fixed in the September 2001 release of
diagnostics. You can install the OnlineDiag bundle and then run
xstm, mstm, or cstm to install the class license, or you can run
the offline diagnostics from the SupportPlus media to install the
class license. Once the class license is installed either by
online or by offline tools, it will stay effective for all online
and offline programs.
The problem only occurs with customer-licenses; CE licenses
are NOT affected. The problem only occurs with the hp server
rp8400 and the newer versions of Superdome; the problem does not
occur with any other server.
If you try to run a customer-licensed online diagnostic tool
on one of these systems, you will see an error message similar to
one of the following:
-- Error --
The Install license command could not be successfully completed.
An unexpected failure occurred in the Support Tool system.
OR
The password requested to be installed is a valid
machine level license yet license could not be
installed on the system due to unexpected errors.
For customer-licensed offline tools, the error will look like
this:
Entered password is not recognized, status = xxx, try again.
where xxx is a number.
- Fixed problem with the STM User Interface (UI), whereby user
files created throughout the UI were created with rw-rw-rw
permissions. For example, if the user saved a log file from the
file viewer, it would be created with these permissions. Now these
files will be saved with rw-rw-r-- permissions. This problem was
temporarily fixed for the June 01 release. This is a more global
fix.
- Fixed problem with display of icon for graphics device on xstm
system map. Now any graphics device on the system, including
hp-visualize fx5/10 (aka Rockwood, Lego), will have a graphics icon
built for it on the system map.
- Fixed problem with mstm shortcut help, whereby mstm would
occasionally exit with a SIGSEGV error.
- Modified STM platform to map NEC iStorage 4000/2000/1000 disks
as an NEC Array, so they will be ignored by the SCSI Disk Monitor
(disk_em). In addition, added an entry in the xref file so that no
tools will be shown to be available for these devices.
- JAGad73874
Fixed problem with xstm, whereby xstm would report a SIGSEGV error
and exit when the Filter->Set command was selected in text map
mode.
- Fixed a problem with false reporting of connection failures by
audit system. There was a debug facility in the diagnostics and
monitors that would attempt to connect to a port to output debug
trace information. This caused the audit system (audsys) to see
lots of connection failures. This change eliminates the connection
attempts when debugging is not going on.
- Fixed a problem with cclogd (chassis code log daemon) that
occurred on the March 01 and June 01 diagnostic releases, whereby
messages from this daemon were not displayed and an error message
would report: correct version of message catalog does not
exist. The cclogd daemon is only supported on A-Clas, N-Class,
and Superdome computers, so only those machines are affected by
this problem.
- JAGad72005
Fixed the "within 24 hours (or deallocation time range)" criteria
from 13.5111 hours to 24 hours in the legacy memlogd and
multi-threaded memlogd in HP-UX 11i.
Intended behavior of memlogd: when a single-bit error re-occurs
within 24 hours, memlogd will deallocate the page on which the
error is located (by entering into the PDT table, if there is a
PDT table on the system, and request the OS to not to use that
page).
Before the fix: when memlogd detected an error re-occurring
within 13.511 hours, memlogd will deallocate the page on which
the error is located but it will not deallocate the page on which
the error is located if the error re-occurred after 13.5111
hours.
After the fix: when memlogd detected an error re-occurring within
24 hours, memlogd will deallocate the page on which the error is
located.
- JAGad67440
Rewrote the error message that memlogd on non-SuperDome systems
logs in the memlogd activity log when the PDT table is full. The
new message is more accurate and helpful.
BEFORE:
Attempt to add entry to the Page Deallocation Table failed.
Possible Cause(s)/Recommended Action(s)
The Page Deallocation Table (PDT) may not be enabled. In most cases,
the PDT should be enabled (Inform your HP representative). However,
the PDT should only be enabled if the system is on HP-UX 10.0 or above,
otherwise, it won't be able to handle double-bit errors.
Internal application error.
AFTER:
The Page Deallocation Table (PDT) is full.
Possible Cause(s)/Recommended Action(s)
The Page Deallocation Table has reached its threshold for the maximum
number of pages deallocated. Troubleshoot for failed memory component.
(Inform your HP representative).
Internal application error.
- JAGad67514
Fixed a problem with memlogd on non-SuperDome systems, whereby
all the errors in the Page Deallocation Table (PDT) were not
inserted into the memlog file during memlogd initialization.
(Note: PDT entries may not be able to isolate down to the
DIMM.)
Background:
When the PDT contains new entries (entered by the firmware during
selftest that are not currently in the memlog file, memlogd was
supposed to enter these PDT entries into its memlog file during
initialization; however, with the previous memlogd, if there was
a match between an entry in the PDT and an entry in the memlog
file, all the new entries in the PDT following that would not be
entered into the memlog file. The fix to this problem is to add
all the new entries in the PDT that are not already in the memlog
file into the memlog file during memlogd initialization
- JAGad55912
Fixed a problem whereby memlogd was not incrementing the time on
Superdome systems. If you run the logtool utility and do a "view
detail," then in the header you see a start time, check time and
time interval. Previously, memlogd on Superdome was not changing
the time values here. Code was added in memlogd so that the check
time changes according to the interval specified and also when an
error was detected. Start time and check time change to current
time when you clear the log and want to start all over again.
- JAGad67534
Fixed problems with page states/status reported by memlogd on
non-SuperDome systems.
- Enhanced memlogd to distinguish page states displayed by
memlogd in the memlog file more correctly.
Background:
In the previous memlogd, there was no page state to indicate
whether an error was successfully entered into the PDT table or
not, because if an error was NOT entered into the PDT table and
OS could not deallocate the page, the page status in the memlog
file would show: "pending: page could not be obtained"
However, if an error WAS entered into the PDT table but OS
could not deallocate the page, the page status in the memlog
file would also show: "pending: page could not be
obtained"
Hence, one could not tell if PDT is successful or not, unless
the user could look at the PDT entries from the memory
information module. So, added a new page state to indicate the
case when an error is not entered into the PDT table: "pending:
page could not be entered into PDT"
- Enhanced memlogd to show page status more accurately.
Background:
In the previous memlogd, if an error was entered into the PDT
table and memlogd was restarted (either by restarting memlogd
and system reboot), it assumed that those entries are
OS-deallocated -- but this may not be true because if the pages
belong to the kernel, they can not be OS-deallocated and that
we cannot tell if the OS was rebooted or if memlogd was
restarted; hence, added a check for page status which is
executed each time memlogd restarts.
- JAGad45063, JAGad69327
This enhancement was partially implemented in the June 01
release. In the Sept 01 release, the enhancement is fully
implemented.
Enhanced diagmond so that it can be configured to only accept
connections and requests from the local system -- any requests
from a remote system will be rejected. The new configuration
parameter is in the /var/stm/config/sys/diagmond.cfg file. It is
called LOCAL_ONLY_ENABLE. If it is set to 1, only local
connections are allowed. If it is set to 0, local and remote
connections are allowed. By default, it is set to 0.
NOTE: Once the file is changed, the user must
go into the UI and run the RereadUUTConfigFile command to cause
the new values to be re-read.
If a remote connection is attempted with diagmond configured
in this manner, a message will logged in the system activity log,
as follows:
Tue Feb 27 15:31:09 2001:
System is configured only to accept requests from
the local host. Rejecting a request from a non-local
host. Message was sent from system name
(magnumpi.rose.hp.com) at IP address (15.8.134.7)
with system port number (52175).
The remote UI will behave exactly as if diagmond itself was not
running on the system to which it is trying to connect. The only
difference is that the message displayed, and logged into the UI
activity log, that will have an additional cause of the remote system
being configured to not accept remote connections. The modified message
is listed below:
An unexpected error was encountered while attempting to retrieve the
host info for hostname (XXXXX).
This could be due to either of the following conditions:
1) The support tool daemon "diagmond" may not be running on that system.
Use the STM Startup command (in the administration menu under the
file menu.)
2) The support tool daemon "diagmond" on that system may be configured
to only allow local connections. Check the value of the configuration
parameter LOCAL_ONLY_ENABLE in the /var/stm/config/sys/diagmond.cfg
file on that system.
3) The IP address for the system may be invalid or may not be associated
with a valid host. Use a valid IP address.
4) Networking may be incorrectly configured on one of the systems involved.
Verify networking by comparing 'nslookup `hostname`' with the output
of ifconfig of the LANs identified by lanscan.
More details may be available in the System Activity Log and in the
syslog on that system.
It may be necessary to access these using the Local Unit Under Test (UUT)
logs (in the administration menu under the file menu.)
General
Changes to Tools
Changes to
Specific Tools
- Information tools:
- JAGad76532
Fixed problem with System Info tool for Superdome and hp
server rp8400 (900/800/S16K-A, "Keystone") systems.
Previously, the System Info tool would not complete execution
on these systems, and would not provide complete information
about the system. In the activity log, several messages would
appear, such as:
Failed to execute the PDC_PAT_COMPLEX PDC
call by performing an ioctl call through
the diag2 pseudo driver.
- JAGad49061
Fixed a problem with the Information tool, which would not
complete successfully when run on the 1000 Base-xx LAN
interface card (A4926A). Sample error message:
Attempt to retrieve identification information
from the device failed due to the device not
supporting the request.
- JAGad69409.
Fixed a problem, whereby the Info tool for Virtual Array 7100
and Virtual Array 7400 ("Cassini" and "Cronus") did not work.
The problem only occurred on the March and June 01 diagnostic
releases for HP-UX 11i.
- Exerciser tools:
- JAGad55415
Fixed a problem with the SCSI Disk Exercise tool, whereby the
tool would experience a segmentation fault, signal(11), and
finish unsuccessfully when the tool is run on an SC10 external
disk.
- Verifier tools:
- Diagnose tools:
- Firmware update tools:
- Expert tools:
- JAGad63532
Fixed problem with the Expert Tool for SCSI CD, whereby the
tool would fail when the verify function is selected. Sample
error text:
Attempt to retrieve user input for the test options FAILED
due to an unknown and unexpected problem (!).
Verify command completed with errors (FAILED).
- JAGad64024
Fixed an incorrect message logged by the CPU Expert tool to
its activity log after a PID (process ID) assignment had been
successfully made.
BEFORE:
PID Assignment encountered an internal error.
The get_assign_options routine returned an
UNSUCCESSFUL status. The PID Assignment was
UNSUCCESSFUL.
AFTER:
The PID Assignment was SUCCESSFUL.
- JAGad48507
Fixed problem with the Expert Tool for FC60, whereby an
existing option to the tool to "automatically fix parities
errors" during the FC60 parity scan could corrupt otherwise
good data.
Fix: Removed the "Repair Parity Scan" option from the STM UI
(Expert Tool->Tests->Parity Scan).
Background: If data and parity do not match with this option,
the parity will always be changed to match the data (according
to LSI), when in fact we do not know which one is in error.
Thus, there is a 50/50 chance of corrupting otherwise good
data.
STM does provide a warning, stating that parity will be changed
to match the data, but it was decided that this does not
sufficiently capture the potential seriousness of the
situation. Instead, it was decided to eliminate the option.
Then, if a parity error is found, the customer should be
alerted to call HP for support and/or restore the affected LUN
from backup.
- JAGad48007
Fixed a problem with the Expert Tool for FC60 whereby during a
parity scan of a LUN, the Logical Block Addresses of detected
errors are not correct.
- Logtool:
- JAGad65444
Enhanced logtool to report rank, memory sub system and error
syndrome on Superdome systems.
- JAGad48760
Fixed two problems with Logtool's reporting of memory logs:
- Previously, double-bit errors were reported as
single-bit errors.
- Previously, the memory error log incorrectly identified
a single DIMM for entries placed in the PDT by hardware
(PDC), whereas it should have reported a rank of 4
DIMMs.
Both the Logtool Utility : View Memory Report and the
Memory Information: Memory Error Log Summary incorrect
report the error to a single DIMM, 0A. The Memory
Information: PDT information correctly identifies the
rank.
Background:
Since the PDT only keeps the cacheline address and no
syndrome information, the best FRU isolation possible for
PDT entries placed into the PDT by hardware / PDC is to a
rank of DIMMs. On Superdome a rank is 4 DIMMs, 0A/0B/0C/0D
for example.
For memory errors detected by the memory monitor (memlogd),
we do have the syndrome information and we can identify the
error to a DIMM.
- Utilities:
- JAGad48375
Fixed a problem with the cstm version of copyutil, whereby a
backup device could not be selected. The problem does not occur
in xstm.
- New utility: mca (machine check analyzer). This utility
analyzes the contents of an HPMC tombstone file generated from
a Superdome or hp server rp8400 (900/800/S16K-A, "Keystone")
system. It is primarily targeted for HP support
personnel.
The mca utility can be run on HP-UX 11.00 or 11i. It is a port
of the Windows HPMC Superdome Analyzer currently available to
HP support personnel. For more information, see the man page on
"mca".
NOTE: This utility is not part of the STM platform, but is a
standalone utility located in /usr/sbin/diag/contrib .
- New utility: Management Processor (MP) chassis code
decoder. The decoder is composed of three tools: cc_translator,
cclogview, ccux2hex. It is primarily targeted for HP support
personnel. For more information, see the man pages on these
tools.
NOTE: These tools are not part of the STM platform, but are
standalone programs located in /usr/sbin/diag/contrib .
Known
Problems
CAUTION:
Info Tool Causes HPMCs on K- and T-Class (HP-UX 11i only)
K-Class and T-Class computers running HP-UX 11i will experience
HPMCs or Data Page Faults if you run the STM Info Tool to retrieve
configuration information from HP-PB SE SCSI adapters (JAGad88317,
JAGad97126).
The root cause is a problem in the HP-UX 11i kernel, which is
exposed when the Info Tool is run as described above. To correct the
problem and avoid potential HPMCs, load patch PHKL_25552 or its
successor. This is a kernel patch and requires a system reboot.
STM only exposes this problem if the Info Tool is run on HP-PB SE
SCSI adapters. STM does not expose the problem if the Info Tool is
run against other I/O devices.
Defect
Reporting
Use CHART to report defects in STM. The project name is
diag.stm.tools.hpux for individual tools, and diag.stm.ui.hpux for
the user interface. If you don't have access to CHART, contact an HP
representative to enter a defect for you.
SD Product
Structure
The product number for STM is B4708AA.
SD PRODUCT: Sup-Tool-Mgr
Description: On-line Diagnostic System (Series 800/700)
SD SUB-PRODUCT: Manuals
Description: Support Tools Manager Manual Pages
FILESET: STM-MAN
Description: S800/S700 STM Manual Pages
FILESET: STM-SHLIBS
Description: S800/S700 STM Shared Libraries
FILESET: STM-UI-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/S700 STM User Interface
FILESET: STM-UUT-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/700 STM Unit Under Test Runtime
Top of Page
/ Diagnostics HOME
URL:
http://docs.hp.com/hpux/onlinedocs/diag/stm/str_0109_11i.htm
Last updated: Thu Jan 3 01:27:56 PDT 2002