Release Notes for STM on HP-UX 11.00 (September 2001)
These release notes cover the September 2001 release of Support
Plus for HP-UX 11.00 running on S800/S700 systems.
NOTE: As of the September 1999 release, the name of the
Diagnostic/IPR Media has been changed to Support Plus. In addition,
the format has changed so that there is a separate CD-ROM for each
version of the operating system (HP-UX 10.20 and HP-UX
11.0).
Overview
The Support Tools Manager (STM) provides a complete set of online
support tools for HP-UX systems, enabling you to verify and
troubleshoot PA-RISC system hardware, and to examine system logs.
STM offers several tool types, including information tools,
verifiers, exercisers, expert tools, firmware update tools,
diagnostics and utilities.
Installed with STM (as of IPR 9902) are the EMS Hardware Monitors,
an important tool for maintaining system availability. The EMS
hardware monitors allow you to monitor the operation of a wide
variety of hardware products and be alerted immediately if any
failure or other unusual event occurs. For more information, see
/usr/sbin/stm/Rel_NOTES.HWE.
Documentation
For the latest and most complete information on STM and EMS
Hardware Event Monitors, see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference
Cards, Frequently Asked Questions (FAQs), and much other
material.
Changes
The online Support Tools Manager (STM) was enhanced and updated
for the current release.
Changes to User
Interface and Platform
- Fixed problem with the STM User Interface (UI), whereby user
files created throughout the UI were created with rw-rw-rw
permissions. For example, if the user saved a log file from the
file viewer, it would be created with these permissions. Now these
files will be saved with rw-rw-r-- permissions. This problem was
temporarily fixed for the June 01 release. This is a more global
fix.
- Fixed problem with mstm shortcut help, whereby mstm would
occasionally exit with a SIGSEGV error.
- Modified STM platform to map NEC iStorage 4000/2000/1000 disks
as an NEC Array, so they will be ignored by the SCSI Disk Monitor
(disk_em). In addition, added an entry in the xref file so that no
tools will be shown to be available for these devices.:w
- JAGad73874
Fixed problem with xstm, whereby xstm would report a SIGSEGV error
and exit when the Filter->Set command was selected in text map
mode.
- Fixed a problem with false reporting of connection failures by
audit system. There was a debug facility in the diagnostics and
monitors that would attempt to connect to a port to output debug
trace information. This caused the audit system (audsys) to see
lots of connection failures. This change eliminates the connection
attempts when debugging is not going on.
- Fixed a problem with cclogd (chassis code log daemon) that
occurred on the March 01 and June 01 diagnostic releases, whereby
messages from this daemon were not displayed and an error message
would report: correct version of message catalog does not
exist. The cclogd daemon is only supported on A-Clas, N-Class,
and Superdome computers, so only those machines are affected by
this problem.
- JAGad72005
Fixed the "within 24 hours (or deallocation time range)" criteria
from 13.5111 hours to 24 hours in the memlogd daemon.
Intended behavior of memlogd: when a single-bit error re-occurs
within 24 hours, memlogd will deallocate the page on which the
error is located (by entering into the PDT table, if there is a
PDT table on the system, and request the OS to not to use that
page).
Before the fix: when memlogd detected an error re-occurring
within 13.511 hours, memlogd will deallocate the page on which
the error is located but it will not deallocate the page on which
the error is located if the error re-occurred after 13.5111
hours.
After the fix: when memlogd detected an error re-occurring within
24 hours, memlogd will deallocate the page on which the error is
located.
- JAGad67440
Rewrote the error message that memlogd logs in the memlogd
activity log when the PDT table is full. The new message is more
accurate and helpful.
BEFORE:
Attempt to add entry to the Page Deallocation Table failed.
Possible Cause(s)/Recommended Action(s)
The Page Deallocation Table (PDT) may not be enabled. In most cases,
the PDT should be enabled (Inform your HP representative). However,
the PDT should only be enabled if the system is on HP-UX 10.0 or above,
otherwise, it won't be able to handle double-bit errors.
Internal application error.
AFTER:
The Page Deallocation Table (PDT) is full.
Possible Cause(s)/Recommended Action(s)
The Page Deallocation Table has reached its threshold for the maximum
number of pages deallocated. Troubleshoot for failed memory component.
(Inform your HP representative).
Internal application error.
- JAGad67514
Fixed a problem with memlogd, whereby all the errors in the Page
Deallocation Table (PDT) were not inserted into the memlog file
during memlogd initialization. (Note: PDT entries may not be able
to isolated down to the DIMM.)
Background:
When the PDT contains new entries (entered by the firmware during
selftest that are not currently in the memlog file, memlogd was
supposed to enter these PDT entries into its memlog file during
initialization; however, with the previous memlogd, if there was
a match between an entry in the PDT and an entry in the memlog
file, all the new entries in the PDT following that would not be
entered into the memlog file. The fix to this problem is to add
all the new entries in the PDT that are not already in the memlog
file into the memlog file during memlogd initialization
- JAGad67534
Fixed problems with page states/status reported by memlogd.
- Enhanced memlogd to distinguish page states displayed by
memlogd in the memlog file more correctly.
Background:
In the previous memlogd, there was no page state to indicate
whether an error was successfully entered into the PDT table or
not, because if an error was NOT entered into the PDT table and
OS could not deallocate the page, the page status in the memlog
file would show: "pending: page could not be obtained"
However, if an error WAS entered into the PDT table but OS
could not deallocate the page, the page status in the memlog
file would also show: "pending: page could not be
obtained"
Hence, one could not tell if PDT is successful or not, unless
the user could look at the PDT entries from the memory
information module. So, added a new page state to indicate the
case when an error is not entered into the PDT table: "pending:
page could not be entered into PDT"
- Enhanced memlogd to show page status more accurately.
Background:
In the previous memlogd, if an error was entered into the PDT
table and memlogd was restarted (either by restarting memlogd
and system reboot), it assumed that those entries are
OS-deallocated -- but this may not be true because if the pages
belong to the kernel, they can not be OS-deallocated and that
we cannot tell if the OS was rebooted or if memlogd was
restarted; hence, added a check for page status which is
executed each time memlogd restarts.
- JAGad45063, JAGad69327
This enhancement was partially implemented in the June 01
release. In the Sept 01 release, the enhancement is fully
implemented.
Enhanced diagmond so that it can be configured to only accept
connections and requests from the local system -- any requests
from a remote system will be rejected. The new configuration
parameter is in the /var/stm/config/sys/diagmond.cfg file. It is
called LOCAL_ONLY_ENABLE. If it is set to 1, only local
connections are allowed. If it is set to 0, local and remote
connections are allowed. By default, it is set to 0.
NOTE: Once the file is changed, the user must
go into the UI and run the RereadUUTConfigFile command to cause
the new values to be re-read.
If a remote connection is attempted with diagmond configured
in this manner, a message will logged in the system activity log,
as follows:
Tue Feb 27 15:31:09 2001:
System is configured only to accept requests from
the local host. Rejecting a request from a non-local
host. Message was sent from system name
(magnumpi.rose.hp.com) at IP address (15.8.134.7)
with system port number (52175).
The remote UI will behave exactly as if diagmond itself was not
running on the system to which it is trying to connect. The only
difference is that the message displayed, and logged into the UI
activity log, that will have an additional cause of the remote system
being configured to not accept remote connections. The modified message
is listed below:
An unexpected error was encountered while attempting to retrieve the
host info for hostname (XXXXX).
This could be due to either of the following conditions:
1) The support tool daemon "diagmond" may not be running on that system.
Use the STM Startup command (in the administration menu under the
file menu.)
2) The support tool daemon "diagmond" on that system may be configured
to only allow local connections. Check the value of the configuration
parameter LOCAL_ONLY_ENABLE in the /var/stm/config/sys/diagmond.cfg
file on that system.
3) The IP address for the system may be invalid or may not be associated
with a valid host. Use a valid IP address.
4) Networking may be incorrectly configured on one of the systems involved.
Verify networking by comparing 'nslookup `hostname`' with the output
of ifconfig of the LANs identified by lanscan.
More details may be available in the System Activity Log and in the
syslog on that system.
It may be necessary to access these using the Local Unit Under Test (UUT)
logs (in the administration menu under the file menu.)
General
Changes to Tools
Changes to
Specific Tools
- Information tools:
- Exerciser tools:
- JAGad55415
Fixed a problem with the SCSI Disk Exercise tool, whereby the
tool would experience a segmentation fault, signal(11), and
finish unsuccessfully when the tool is run on an SC10 external
disk.
- Verifier tools:
- Diagnose tools:
- Firmware update tools:
- Expert tools:
- JAGad63532
Fixed problem with the Expert Tool for SCSI CD, whereby the
tool would fail when the verify function is selected. Sample
error text:
Attempt to retrieve user input for the test options FAILED
due to an unknown and unexpected problem (!).
Verify command completed with errors (FAILED).
- JAGad64024
Fixed an incorrect message logged by the CPU Expert tool to
its activity log after a PID (process ID) assignment had been
successfully made.
BEFORE:
PID Assignment encountered an internal error.
The get_assign_options routine returned an
UNSUCCESSFUL status. The PID Assignment was
UNSUCCESSFUL.
AFTER:
The PID Assignment was SUCCESSFUL.
- Fixed a problem with the CPU expert tool, whereby the tool
can de-activate a CPU marked for deconfiguration but cannot
re-activate a CPU marked for deconfiguration.
- JAGad48507
Fixed problem with the Expert Tool for FC60, whereby an
existing option to the tool to "automatically fix parities
errors" during the FC60 parity scan could corrupt otherwise
good data.
Fix: Removed the "Repair Parity Scan" option from the STM UI
(Expert Tool->Tests->Parity Scan).
Background: If data and parity do not match with this option,
the parity will always be changed to match the data (according
to LSI), when in fact we do not know which one is in error.
Thus, there is a 50/50 chance of corrupting otherwise good
data.
STM does provide a warning, stating that parity will be changed
to match the data, but it was decided that this does not
sufficiently capture the potential seriousness of the
situation. Instead, it was decided to eliminate the option.
Then, if a parity error is found, the customer should be
alerted to call HP for support and/or restore the affected LUN
from backup.
- JAGad48007
Fixed a problem with the Expert Tool for FC60 whereby during a
parity scan of a LUN, the Logical Block Addresses of detected
errors are not correct.
- Logtool:
- Utilities:
- JAGad48375
Fixed a problem with the cstm version of copyutil, whereby a
backup device could not be selected. The problem does not occur
in xstm.
- New utility: mca (machine check analyzer). This utility
analyzes the contents of an HPMC tombstone file generated from
a Superdome or hp server rp8400 (900/800/S16K-A, "Keystone")
system. It is primarily targeted for HP support
personnel.
The mca utility can be run on HP-UX 11.00 or 11i. It is a port
of the Windows HPMC Superdome Analyzer currently available to
HP support personnel. For more information, see the man page on
"mca".
NOTE: This utility is not part of the STM platform, but is a
standalone utility located in /usr/sbin/diag/contrib .
- New utility: Management Processor (MP) chassis code
decoder. The decoder is composed of three tools: cc_translator,
cclogview, ccux2hex. It is primarily targeted for HP support
personnel. For more information, see the man pages on these
tools.
NOTE: These tools are not part of the STM platform, but are
standalone programs located in /usr/sbin/diag/contrib .
Known
Problems
CAUTION
: Monitoring Changes for disc30, sdisk and disk array devices
As of IPR 9902 (Feb 99 release), there has been a change to the
way that monitoring is done for disc30, sdisk and the HA Disk Array
Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec,
pharaymon_exec, and psdisk_exec) handled driver error entries for
these devices.
As of IPR 9902, these programs have been deleted and their
functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd
exec programs (disk30_exec.cfg, sdisk_exec.cfg, and
haraymon_exec.cfg) you may wish to re-configure the EMS Hardware
Monitors to achieve the same results.
Defect
Reporting
Use CHART to report defects in STM. The project name is
diag.stm.tools.hpux for individual tools, and diag.stm.ui.hpux for
the user interface. If you don't have access to CHART, contact an HP
representative to enter a defect for you.
SD Product
Structure
The product number for STM is B4708AA.
SD PRODUCT: Sup-Tool-Mgr
Description: On-line Diagnostic System (Series 800/700)
SD SUB-PRODUCT: Manuals
Description: Support Tools Manager Manual Pages
FILESET: STM-MAN
Description: S800/S700 STM Manual Pages
FILESET: STM-SHLIBS
Description: S800/S700 STM Shared Libraries
FILESET: STM-UI-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/S700 STM User Interface
FILESET: STM-UUT-RUN Corequisite Filesets: STM-SHLIBS
Description: S800/700 STM Unit Under Test Runtime
Top of Page
/ Diagnostics HOME
URL:
http://docs.hp.com/hpux/onlinedocs/diag/stm/str_0109_11.htm
Last updated: Mon Jul 16 10:56:00 PDT 2001