Contents: General STM / Installing STM / Starting STM / Device map/Available tools / General tool / Specific tool / CSTM scripts
:
:
TopThere are three user interfaces (all in /usr/sbin):
xstm - The X Windows interface
mstm - The menu interface
cstm - The command line interface
xstm: click on device icon
mstm: move cursor over device and hit the SPACE key.
Tools --> <tool> --> Run
Tools --> <tool> --> <log>
Tools --> Information
Tools --> Verify
Tools --> Exercise
Logtool is used to view OS Error Logs
Tools --> Utility --> Run...
Select logtool
Copyutil is used copy disk data to another disk (possibly via tape), in order to replace a bad disk
Tools --> Utility --> Run...
Select copyutil
Select device(s)
Device --> Current Device Status
STM provides several different logs for different situations:
Select device
Tools --> <tool> --> Failure Log
Select device(s)
Tools --> <tool> --> Activity Log
Select device(s)
Tools --> Information --> Information Log
System --> System Activity Log
Note: If the UI cannot connect to the UUT, use
instead:
File --> Administration --> Local UUT Logs --> System
Activity Log
System --> Map Log
Note: If the UI cannot connect to the UUT, use
instead:
File --> Administration --> Local UUT Logs --> Map
Log
The Map Log contains information and errors logged while scanning the system hardware.
File --> UI Activity Log
The UI Activity Log contains errors logged by the UI.File --> Administration --> Local UUT Logs --> syslog
The Syslog contains information and errors logged by a variety of HP-UX programs.Tools --> Utility --> Run...
Select logtool
When any of the following releases are installed, SYSDIAG will be removed from the system:
10.30
IPR9707
T600 Patch
Gamma (Workstation ACE)
SYSDIAG is removed because the functionality provided in STM in these releases is a superset of SYSDIAG, completing the transition to STM. All STM releases from this point on will eliminate SYSDIAG if it is found on the system.
HP-UX 9.x systems have a program called STM. (This is NOT the same program as the STM released with HP-UX 10.x and 11.x.) This program offers a limited number of verifiers and exercisers, and can invoke the 'sysdiag' diagnostic system for some limited diagnostics. In general, however, HP-UX 9.x and MPE systems rely on the 'sysdiag' diagnostics for their support tools. . A port of STM to MPE/iX is currently planned. There are no plans at the present time to port the new STM to 9.x.
A new STM utility called logtool is used to look at system device and memory logs. See the logtool tutorial for more information on this utility.
You may have a "multi-homed" machine. To find whether your machine is "multi-homed" , issue the command:
nslookup `hostname`
If this command reports "Addresses: " and then lists more than one IP address associated with the host name, then your machine is multi-homed.
STM has known problems with this configuration, which also affect the Predictive software. The problems are related to the fact that the Domain Name (DNS) server uses a round-robin method which returns a different IP address for the hostname each time it is called. STM expects a given hostname to resolve to the same IP address, and gets confused when this is not the case.
The STM development team is currently working on a fix for these problems.
Workaround to Run the Support Tool Manager:
Check whether the diagnostic daemon diagmond is running by executing the command:
ps -ef | grep diagmond
If diagmond is running, but the UI is reporting an invalid password, then you should be able to get it to run by giving it a particular IP address. Use the System -> Sel System to Test -> Select Current System... command. Place the IP address (or a hostname which is associated with just one IP address) in the "add" field (and, in mstm, hit return.) Then hit OK in the name / password screen. This should cause the User Interface (UI) to connect to local system's diagnostic daemon.
Workarounds to Run STM and psconfig:
The following workaround was reported:
# /etc/nsswitch.conf # # # Using files 1st for hostname lookup to workaround muli-homed problem # with predictive. # hosts: files [NOTFOUND=continue] dns nis ~so that /etc/hosts is checked first, then the DNS.
This method worked after stopping and restarting diagmond. The psconfig command brought up the menu normally with no errors and predictive was able to run for the first time with no errors.
The round-robin feature of DNS is only disabled for the system on which the /etc/nsswitch.conf file has been modified. Furthermore, the round-robin feature is only disabled when looking up a host with an entry in the /etc/hosts file.
Diagmond is the DIAGnostic MONitor Daemon. (The final "d" is a UNIX convention; all daemon names end with "d".)
Diagmond is the only diagnostic daemon which is started directly by the initialization scripts. It in turn launches the other diagnostic daemons: diaglogd, memlogd and psmond. Diaglogd (DIAGnostic LOGging Daemon) is the daemon that does the OS logging.
If you shut down diagmond, it shuts down those other diagnostic daemons. So... if diagmond fails to start for some reason, then diaglogd won't be running. Of course, if diagmond is running, it is still possible for diaglogd to fail for some reason (like the diag2 driver not being in the kernel.)
For the latest information, go to the diagnostics web site: http://docs.hp.com/hpux/diag/ . Under the heading "Online Diagnostics: Support Tools Manager (STM)", click on "Release Notes". At this site, you can also find Release Notes for EMS Hardware Monitors.
In IPR 9909, there is a problem whereby the diagnostic daemon "diagmond" uses a large percentage (between 10% and 15%) of the CPU whevenever a User Interface (UI) for STM is not running.
To fix this problem, load one of the following patches (available from the HP ITResource Center (http://itresourcecenter.hp.com):
PHSS_20005: s700 10.20 STM panic, disk_em,diagmond,tlscsidev PHSS_20006: s800 10.20 STM panic, disk_em,diagmond,tlscside PHSS_20007: s700_800 11.00 STM panic, disk_em,diagmond,tlscsidev
Alternately, there is a workaround: start the STM user interface (UI) (for example, cstm) and leave it running. As long as a UI is running, the CPU usage by diagmond will be normal.
Symptoms: In xstm (the graphical interface for online diagnostics), no online help appears when you enter a command for online help. The diagnostics themselves are still functional.
Cause/Action: Online help for xstm requires that the Netscape browser be installed on the computer. The xstm interface invokes the Netscape browser to display .htm files located in the diagnostics sub-directories on disk. (The mstm and cstm interfaces do not use the Netscape browser.)
The easiest way to install the Netscape browser is from the AR (Application Release) Media supplied with the computer.
Alternative: Using any computer, you can view online help for xstm, mstm, and cstm on the Internet at: http://docs.hp.com/hpux/onlinedocs/diag/stm/sth_summ.htm
A blade server is a chassis which contains several server blades and other blades. A server blade is a "computer on a a card" that runs its own OS. Thus, the server blade is the computer system.
When STM tools and hardware event monitors refer to the computer system, they will give the HP-UX hostname of the server blade. For example, when hardware event monitors report events, they name the system on which the event occurs in a format such as:
hpdst313 sent Event Monitor notification information
where "hpdst313" is the HP-UX hostname of the blade reporting the event. As of May, 2002, the support tools do not directly name the slot and chassis in which the server blade is located.
To make it easier to find the location of a particular blade, given its hostname, you can:
We suggest giving the slot number in the MP hostname. (If you move the server blade to a new slot, it is easier to change the MP hostnames than to change the hostnames for server blades.)
Do not install patches PHSS_14401 through PHSS_14407, unless specifically instructed to do so by support personnel. These patches will load an earlier version of STM to be loaded. PHSS_14401 (10.01 S800) and PHSS_14402 (10.01 S700) load version A.09.00. All other PHSS_1440x patches load version A.10.00.
The BEST SOLUTION is to not to load this patch, but rather to load the latest version of the diagnostics. These are available from the Support Plus Media (IPR 9909 or later), Diagnostic/IPR Media (IPR 9906 or earlier), or from HP's Software Depot at http://www.hp.com/go/softwaredepot. (Look under "Enhancement Releases" for the bundle titled "Support Tools for the HP 9000".)
If you do load the patch, and run into problems, here's the procedure to get rid of it:
Patches PHSS_17884, PHSS_17885, PHSS_17886, PHSS_17887 and PHSS_17888 contain the entire diagnostic system, rather than just changes to the diagnostics already loaded on your system.
These patches are the same as patches PHSS_14401 - PHSS_14406, but with a script file in the patch depot which verifies that you are not backdating your diagnostics by loading these patches on top of a more recent version.
If, however, the diagnostics that are on your system are very old (STM version A.03.00 and earlier,) or if there is not a version of STM loaded on your system, then the patch cannot determine the revision of the existing diagnostics and concludes that it should not be loaded. The checkinstall script will fail, and the /var/adm/sw/swagent.log will contain a message something like:
ERROR: The patch PHSS_1788x is trying to update a system which already contains a OnlineDiag bundle version newer than is required for this patch or there is no OnlineDiag bundle installed. Do not override this install, doing so may cause STM diagnotics to improperly operate.
If you experience this problem:
-x enforce_scripts=false
(or unset the "Enforce script failures" option in the swinstall
options menu.)
It is typically not necesary to do an swremove before installing the Support Tools on a system with an older version of Support Tools.
On HP-UX 10.20, if the old version of Support Tools is earlier than May 1997 (IPR 9705), then it is advisable (but not necessary) to do an swremove.
(If you do an swremove of the Support Tools on HP-UX 10.20, it will cause the system to reboot after the removal takes place.)
: Do not load the OnlineDiag product from the June 1999 Diagnostic/IPR CD-ROM if your system includes a V-Class computer, a DLT tape library, or Predictive Support. For problem description and fix, see the customer letter (PDF)
There was an swinstall problem that occurred when certain releases of Support Tools were used to update a system with a previous release. This problem occurs only on HP-UX 10.20 on the following releases: June 99, Dec 99, Mar 00.
If a file was being delivered via /usr/newconfig and it had
changed from the version on the system, then swinstall would not
actually install the file. Instead, it would leave it in
/usr/newconfig/*. It would also log a message like the following to
/var/adm/sw/swagent.log :
NOTE: A new version of "/var/stm/config/tools/utility/os_decode_xref" has been placed on the system. The new version is located at "/usr/newconfig/var/stm/config/tools/utility/os_decode_xref". The contents of the newly installed file differ from the contents of "/var/stm/config/tools/utility/os_decode_xref", and the previously delivered file is not available for comparison. Therefore "/var/stm/config/tools/utility/os_decode_xref" is not being overwritten. The System Administrator should resolve this situation manually.
In this example, you could solve the problem by entering the following command:
cp /usr/newconfig/var/stm/config/tools/utility/os_decode_xref \ /var/stm/config/tools/utility/os_decode_xref
Note any similar lines in swagent.log about other files and perform a similar copy for them.
This problem is almost the same as another FAQ: When I try to run logtool, I get an error message about a missing log decoding program..
For the latest information, go to the diagnostics web site: http://docs.hp.com/hpux/diag/ . Under the heading "Diagnostics (Support Tools): General Information", click on "DIAGNOSTICS.readme files".
As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus Media.
You might have used the "match_target=true" swinstall option while updating the diagnostics. The default value of march_target is "false", and in general, you should leave the swinstall options set at their default values when installing diagnostics.
To see if the "match_target=true" swinstall option caused the problem, look in /var/adm/sw/swagent.log at your most recent installation of OnlineDiags. Within this session, you should see a line which begins (depending on your system type):
On 10.20 S700:
* Installing fileset "Sup-Tool-Mgr-700.STM-CATALOGS,r=
On 10.20 S800:
* Installing fileset "Sup-Tool-Mgr-800.STM-CATALOGS,r=
On 11.00:
* Installing fileset "Sup-Tool-Mgr.STM-CATALOGS,r=
If this line is not present, then it is likely that an older installation of diagnostics was updated with the "match_target" option set to true. To fix the problem, re-install the diagnostics. This time, leave the swinstall options set to their default values, with the exception of the option to install filesets even if the the same revision exists; this option should be said to true ("reinstall=true").
Detailed explanation:
When updating the OnlineDiags, do not set the "match_target"
swinstall option to "true". The "match_target" option causes only the
filesets found on your system to be updated. "Filesets" are used by
swinstall to contain the actual files which make up the software.
Software bundles consist of multiple filesets. Over time, new
filesets have been added to the OnlineDiag bundle to accommodate new
functionality, and files previously delivered in existing filesets
has been moved to new filesets. Using the "match_target" option means
that only those filesets which were part of the diagnostics at the
time of your last installation will be updated. This can cause the
diagnostic system to fail in multiple ways. Message catalogs were
moved to a new fileset, to accommodate the downloading of message
catalogs when the UI connects to a remote system. Therefore, one
failure is that messages are missing or the message catalogs are
found to be out of date. Also, the Event Monitoring Service (EMS)
product was added to the bundle in IPR 9902. If a system previous to
IPR 9902 was updated with the "match_target=true" set, then EMS will
not be loaded. This will cause the psmctd daemon to abort with a
signal 6. This can be seen in the system activity log.
This is usually because the machine address cannot be resolved or because the diagmond daemon is not running on the machine that is being connected to.
It is also possible that the networking is not configured correctly. In this case, there will have probably been a long delay before the "Waiting to obtain host information..." dialog is displayed. To test the network configuration:
If this is a first time install, it is possible that the installation/configuration did not complete properly.
If this is an update, it is possible that the installation/configuration did not complete properly.
File->Administration->Local Log Viewing->Local Map Log
If you use 'kill -9' to kill diagmond, you will not be able to restart it. This is because it was not able to release its socket port when it was killed and the port is therefore not available. If you need to kill the diagnostic daemon, use 'kill -2' or use the function File->Administration->STM Shutdown.
X Windows displays can only show a limited number of colors at any one time. Some applications, particularly Netscape, use so many colors that there are not enough colors left over for xstm to display the system map properly. This may result in icons being drawn in black, with the writing on them also in black. This problem can be fixed by closing the applications which are using a large number of colors, and restarting xstm. Netscape can also be run with the "-install" option, which causes it to release its colors when the pointing device is not pointing within the netscape window.
STM requires that the Motif and X Windows libraries be present on the system. If they are not, error messages similar to the following are displayed:
/usr/lib/dld.sl: Can't find path for shared library: libXm.4
/usr/lib/dld.sl: No such file or directory
/usr/sbin/mstm[24]: 15845 Abort(coredump)
If you see a message like this (the particular version of the library, which is ".4" in this case, may vary), load the Motif libraries from the HP-UX OS media.
STM uses sockets to communicate between processes. When networking is incorrectly configured, STM may hang indefinitely. To help users identify this problem, STM incorporates a check to verify that the networking was correctly configured. Unfortunately, this check falsely reports bad configurations in certain cases, including when an ATM network card is the primary LAN connection.
This problem was corrected in the IPR 9802 release. On prior releases, the following work-around can be used (alternately, you can update your copy of STM to a more recent version).
export DONT_CHECK_LAN=1
/sbin/init.d/diagnostic start
export DONT_CHECK_LAN=1
In the future, you must enter "export DONT_CHECK_LAN=1" once in a session before starting xstm, mstm or cstm in that session.
(This problem is fixed in IPR 9806.)
STM can experience problems as a result of extra files being left on the system by the software update utility swinstall and the frecover utility.
The directory /usr/sbin/stm/uut/bin/sys should contain only the three files: diagmond, diaglogd, and memlogd. If there are other files in this directory, they should be removed (or moved out of this directory). Files to be removed typically have names like EMYa01203 (the exact name will vary). Other files to be removed have names like diagmond.1284 (the string "diagmond," "diaglogd," or "memlogd", with a number appended).
/sbin/init.d/diagnostics stop
/sbin/init.d/diagnostic start
If you use sam(1M) or swcluster(1M) to install the IPR 9902 or IPR 9904 version of diagnostics on an NFS Diskless Cluster on HP-UX 10.20, some of the required files are not installed in the /var directory of the clients. (If this is the case, a message stating that /var/stm/data/id_mod_xref could not be accessed will be found in the local map log visible using command File -> Administration -> Local UUT Logs -> Map Log .)
To fix this problem, log onto the cluster server, and issue the following command for each client. (Replace the string <client_name> with the name of the client in the directory /export/private_roots/ .)
cp -p -r /export/shared_roots/OS_700/var/stm/* \ /export/private_roots/<client_name>/var/stm/ cp -p -r /export/shared_roots/OS_700/etc/opt/resmon/* \ /export/private_roots/<client_name>/etc/opt/resmon/
Then, on each client, type the following to restart the diagnostic daemon:
/sbin/init.d/diagnostic start
An alternate way to restart the diagnostic daemon is to reboot the client.
When using sam (1M) or swcluster (1M) to install OnlineDiag (IPR 9902 or IPR 9904) on the diskless systems, one of the install scripts incorrectly checks for the presence of the correct patches on the server, rather than the clients. To satisfy this check, the patches (or their S800 equivalent, if the server is a series 800) should be installed on the server.
On IPR 9902 (STM version A.14.00,) there is a problem which can occasionally corrupt a temporary data file. To fix this problem, do the following:
rm /var/stm/data/uut_status # Remove the corrupt file /sbin/init.d/diagnostic start # Restart the diagnostics daemon
The diagnostic system should now work normally. Note that it can take several minutes for the daemon to map the system, and the user interface will wait until this is complete. To see if the daemon is running, type:
ps -ef | grep diagmond
The reason CSTM appears to be hung is that the diagnostics daemons need to be re-started. To re-start daemons:
The diagmond deamon will create a system map which can take several minutes depending on the I/O configuration.
The causes for an "unknown" device can be one of the following:
This is usually because the "device type" and "qualifier type" that were selected don't both match a device in the system. For example, selecting a type of "Disk" and a qualifier of "SCSI" will not select any devices because SCSI disks use qualifiers of "Hard, Floppy, etc." Use the Device-->Current Device Status command to determine the valid type and qualifier that apply to a specific device.
When diagnostics are installed, the newly supported drivers for new hardware and new tools for hardware are all part of configuration files which are installed with the diagnostics system.
However, if the configuration files were modified after the previous version of the diagnostic system was installed, these configuration files will not be overwritten with new versions but placed in the /usr/newconfig directory.
Check the /var/adm/sw/swagent.log file and see if it indicates that the prod_op_xref and id_mod_xref files were not installed but copies were placed in the /usr/newconfig/var/stm/data/prod_op_xref and /usr/newconfig/var/stm/data/id_mod_xref.
If the files were not installed but copies saved in /usr/newconfig:
xstm: System->Remap System (old command: System->Rescan HW)
mstm: system->REMAP SYSTEM (old command: system->RESCAN HW)
cstm: remapsystem (old command: rescanhw)
Known problem (JAGad36990). Running xstm on very large configurations yields a display where most of the devices overlap each other so that you cannot select a single device to perform any tasks. This has been seen on systems with 400 or more disks.
The problem can occur on Dec 00 and previous diagnostic releases (HP-UX 10.20, 11.00, and 11i). A fix for this problem is planned for an upcoming release of the diagnostics.
The workaround for this problem is to display the map in tabular format. Select the Options-->Map command and clicki on the "Display xstm Device Map in Text Format" button. All operations can be carried out as with the graphical map.
As an alternative, use MSTM or CSTM. These interfaces display the entire map correctly.
STM monitors the progress of all running tools and expects each tool to send a "heartbeat" every minute or so. If these heartbeat indications are not received within a two minute window, the tool state is changed to "hung". The causes for a hung tool can be one of the following:
This is due primarily to delays attributable to messaging traffic between tools and STM, which is particularly heavy when tools are first run. One or more tools may even enter a "hung" state during this initialization time, but this should clear up once all of the tools have gotten through this phase.
This problem is fixed with version A.06.00 (and above) of STM.
This was a feature of some SYSIDAG tools, but with the exception of some disk arrays, this feature not available in STM. Instead, you can run multiple expert tools simultaneously. To do this in MSTM:
Start the first expert tool.
Hit CTRL C.
Hit the ESCAPE TO UI key.
Start another expert tool on another device.
To re-connect to the original tool:
Hit CTRL C.
Hit the ESCAPE TO UI key.
Hit tools -> tool mgmt -> attach
Select the tool to attach to and hit OK.
There are three problems which may occur when connecting from particular revisions of the UI to particular revisions of the UUT. These problems may cause the tool (or its help) to fail outright with a message indicating that a message catalog with a particular revision (or a help volume) could not be found. If you encounter one of these problems, the work-around is to log in directly to the system on which you want to run STM, rather than using the Select System To Test dialog to select a remote system.
This is usually due to the fact that the drive being tested is an older drive, which did not implement these commands in the form they are being used. If this is the case, the cause/action text in the log message will suggest this as a possible cause of the problem.
Running exercisers on more than two graphics devices
simultaneously may result in one or more of the exercisers
terminating due to problems getting resources such as semaphores or
shared memory. It is also possible that the graphics exercisers will
execute correctly but other exercisers such as the CPU exerciser will
fail if started while more than two graphics exercisers are running.
This problem can be corrected by modifying the following kernel
parameters and rebuilding the kernel:
- The semmini parameter should be increased by 64
- The semmns parameter should be increased by 128
In addition, the /etc/X11/X<server>screens file (where
<server> is the number of the X server -- e.g. 0, 1, 2, or 3)
should be modified to add the following:
ServerOptions
GraphicsSharedMemorySize 0xc00000
These lines should be added just above the "Screen /dev/crt" line. If
a "ServerOptions" line already exists in the file, another
"ServerOptions" should not be added, but, rather, the
"GraphicsSharedMemorySize" line should simply be added to the
existing entries under "ServerOptions".
After modifying these files, the X server must be restarted.
The graphics exerciser will exit with an INCOMPLETE status when
the X server is configured with multiple graphics heads in a single
logical screen.
The following message will be logged into the tool activity log:
"The graphics exerciser does not support testing of hardware
devices that are part of a single logical screen.
Possible Causes/Recommended Action:
Modify the 'X%screens' file so that the device under test is not part
of a single logical screen, restart the X server, and rerun the
graphics exerciser.
To work around this problem, modify the 'X%screens' file so that the
device under test is not part of a single logical screen, restart the
X server, and rerun the graphics exerciser."
You can use the new Logtool to view old (pre DART 34 and / or pre 10.30) log files. To manually convert system log files from the old sysdiag format to the new STM raw log format:
/usr/sbin/conlog /var/adm/diag/ /var/stm/logs/os/
When diagnostics are installed, the monitor for the High Availability SCSI Disk Arrays is part of configuration files which are installed with the diagnostics system.
However, if the configuration files were modified after the previous version of the diagnostic system was installed, these configuration files will not be overwritten with new versions but placed in the /usr/newconfig directory.
Check the /var/adm/sw/swagent.log file to see if it indicates that the diaglogd.cfg files was not installed but a copy was placed in the /usr/newconfig/var/stm/config/sys/diaglogd.cfg.
If the file was not installed but a copy saved in /usr/newconfig:
The "Nike" monitor should now launch when a High Availability SCSI Disk Array encounters an error and generates the event.
Some support tools fail when run on the DDS Media Changer (Autoloader)
The problem occurs only on Models C1553A and C1557A. As of February 1999, there is no workaround to this problem, but a fix is planned for an upcoming release of STM.
The problem occurs when the device has a non-zero Logical Unit Number (LUN) in its hardware path. For example:
13 8/16/5.3.0 SCSI Tape (HPC1557A) 14 8/16/5.3.1 SCSI Media Changer (HPC1553A)
The Information tool fails with a WARNING, with the following text in the Tool Activity Log:
Mon Jan 25 09:42:18 1999: The following sense data was returned by the device:
Sense Key: 0x05 (ILLEGAL REQUEST)
Additional Sense Code/Qualifier: 0x25/00
Error Description:
The logical unit is not supported.
Mon Jan 25 09:42:18 1999: The standard SCSI LOG SENSE command was being
performed when the failure occurred.
Mon Jan 25 09:42:18 1999: Tool completed with exit_status
MOD_WARNING (1) indicating tool completed
with a warning.
The Firmware Update tool won't be able to find a valid firmware update file, even if one exists.
If the user runs the Expert tool and selects any of the commands, the following error message will appear in the Expert Tool window:
The sense key indicates an Illegal Request. =============== (0x2500) An invalid LUN was specified.
On IPR 9904 for HP-UX 10.20, there was a problem with the STM memory information tool. The tool completes successfully, but generates an output with no data. The problem only exists in IPR 9904 for HP-UX 10.20 and has been corrected for later releases.
In release IPR 9902 and IPR 9904, in systems with large configurations you may find that ioscan is using a large amount of CPU time. The CPU time used is considerably worse on HP-UX 11.00 systems, although 10.20 systems may also experience an undesirable amount of CPU use. This problem has been corrected in IPR 9906.
For IPR 9902 and IPR 9904, patches are available that solve the problem:
The decoder (disk_em) on IPR 9904 release decodes only the 'sdisk' driver data. Because of the same problem, the Event Monitoring System also will not be able to generate events for this driver. In both cases, the decoder may abort with SIGBUS.
The fix is provided in IPR 9906 release.
Within the dialog box for the 'Format Raw...' command, you may see text similar to the following:
The Format Raw operation is currently in progress.
Entries processed is 1 of 13 total entries; entries formatted is 1.
Either an appropriate log decoding program name could not be located
for one or more entries in the raw log file, or the decode routine
itself failed to execute. The data portion of the entry(s) will be
formatted for display in hex.
See the completed formatted log file or the Test Activity Log for
specific information.
Entries processed is 13 of 13 total entries; entries formatted is 13.
The Format Raw operation completed successfully. The following raw log
file(s) were formatted into /var/stm/logs/os/test_it.fmt1:
/tmp/test_it.raw
In the activity log for Logtool, you will see an entry similar to the following:
Thu Apr 29 11:34:20 1999: Decode routine (disk_em) starting on path
(32.4.0).
Thu Apr 29 11:34:20 1999: The internal tables for managing the data from
configuration-file(s) are not yet initilized..
Possible Causes/Recommended Action:
Make sure that the monitor calls ev_monitor_init()
before calling one of the routines to access the
configuration values.
Thu Apr 29 11:34:20 1999: Tool is exiting due to receipt of an unexpected
signal (10).
SIGBUS (10) signal indicates a bus error.
Possible Causes/Recommended Action:
Internal Application error. Tool attempted to
reference an invalid address. Usually a NULL or
bad pointer.
The user reported that "when I select FILE ... Select raw ... there are no files in /var/stm/logs/os."
There are two reasons why Logtool hasn't generated any logtools.
First and most likely scenario: If no errors have been detected, no log files will have been created.
This behavior is different than logtool in "Sherlock", the diagnostic system previous to STM. The old logtool always created a log file, even though it didn't have anything to log.
If you would like to verify that Logtool is working correctly, you can purposely cause an error to generate a log. A good way to cause an event to be generated is to put a bad tape in a tape drive, then try to read from it.
Second scenario (unlikely): If "diagmond", the diagnostic daemon, or "diaglogd", the logging daemon, are not running, then no log files will be created. To see if the daemons are running, enter:
ps -ef | grep diagmond ps -ef | grep diaglogd
If the diagmond daemon is not running, you can start it again with the command:
/sbin/init.d/diagnostic start
Support tools for SCSI devices may terminate with a status of INCOMPLETE, if you loaded Diagnostics from the June 99 (IPR 9906) or September 99 (IPR 9909) releases onto an older system, such as T-Class or "Nova" (F,G,H,I Class; xx7 Family). The problem does not occur on newer systems such as D-Class, K-Class, N-Class or V-Class. The problem has been reported on both HP-UX 10.20 and 11.00.
If the system has Predictive Support, you you may see "SCSISCAN 500" errors reported.
To fix the problem, issue the following commands (as root):
cd /dev insf -e # re-creates the diagnostic device files
The problem may occur when an Information, Expert or Firmware Download Tools is run on SCSI devices on systems with the SIO bus. An entry in the tool's Activity Log will report the error "/dev/diag/diag0 not found."
The problem is caused by an error in a SCSI library which removes /dev/diag/diag0 if a call to get access to the SIO passthru driver fails. The error will be fixed in a future release.
This problem could occur after updating the diagnostics, if the format of the previous version of the os_decode_xref file is different from the new version of the file.
This problem occurs only on HP-UX 10.20 on the following releases: June 99, Dec 99, Mar 00.
To verify this problem, look in the test activity log for logtool (e.g.: Tool | Utility | Activity Log | logtool ). If this is the problem, you will see an entry like:
Wed Feb 2 18:01:37 2000: A corresponding log decoding program name could not be located for sdisk. Wed Feb 2 18:01:37 2000: Syntax error in the os_decode_xref file. Failed while parsing (dc_flex) in the entry (disc2 dc_flex). etc.
Look in the file /var/adm/sw/swagent.log. If this problem exists, you will see an entry like:
NOTE: A new version of "/var/stm/config/tools/utility/os_decode_xref" has been placed on the system. The new version is located at "/usr/newconfig/var/stm/config/tools/utility/os_decode_xref". The contents of the newly installed file differ from the contents of "/var/stm/config/tools/utility/os_decode_xref", and the previously delivered file is not available for comparison. Therefore "/var/stm/config/tools/utility/os_decode_xref" is not being overwritten The System Administrator should resolve this situation manually.
To fix the problem, follow the directions in the NOTE by executing the command:
cp /usr/newconfig/var/stm/config/tools/utility/os_decode_xref \ /var/stm/config/tools/utility/os_decode_xref
Note any similar lines in swagent.log about other files and perform a similar copy for them.
This problem is almost the same as another FAQ: When I update a previous version of the Support Tools, the swagent.log file reports that some files are not being correctly installed.
"CELL 1/5" means #1 cabinet, #5 cell. On Superdome systems with multiple cabinets, each cabinet can contain cells 0 through 7. To specify a given cell, both the cabinet and the cell ID must be specified.
We've received reports of this problem on fully-configured Superdome, N-Class, and V-Class systems. The problem occurs when the user selects multiple CPUs and multiple memory controllers, and tries to run exercisers on all of these.
The workaround is simple: instead of selecting multiple CPUs and memory controllers, JUST SELECT ONE CPU AND ONE MEMORY CONTROLLER. When you select the exercise tool, all CPUs and all memory controllers will be exercised.
Normally, if you select multiple CPUs and memory controllers and then select the exercise tool, exercisers will start on all these modules. Each exercise process then tries to start exercise processes on all the other CPU and memory controller modules. Since an exerciser only needs to be run on one CPU or memory controller to exercise all the CPUs or memory controllers, the extra exerciser processes are killed. Only one exerciser process is left running on a CPU and one on a memory controller.
This expected behavior does not occur on some fully configured Superdome, N-Class and V-Class systems. In these case, all exerciser processes are killed and no exerciser processes are left running.
On the June 2001 release of diagnostics for HP-UX 11i, there is a problem with the customer-licensed diagnostic tools for:
The following types of tools may require a customer license and hence may be affected by this problem:
This problem is fixed in the September 2001 release of diagnostics. You can install the OnlineDiag bundle and then run xstm, mstm, or cstm to install the class license, or you can run the offline diagnostics from the SupportPlus media to install the class license. Once the class license is installed either by online or by offline tools, it will stay effective for all online and offline programs.
The problem only occurs with customer-licenses; CE licenses are NOT affected. The problem only occurs with the hp server rp8400 and the newer versions of Superdome; the problem does not occur with any other server.
If you try to run a customer-licensed online diagnostic tool on one of these systems, you will see an error message similar to one of the following:
-- Error -- The Install license command could not be successfully completed. An unexpected failure occurred in the Support Tool system.
OR
The password requested to be installed is a valid machine level license yet license could not be installed on the system due to unexpected errors.
For customer-licensed offline tools, the error will look like this:
Entered password is not recognized, status = xxx, try again.
where xxx is a number.
There is a problem with the System Info tool for Superdome and hp server rp8400 (900/800/S16K-A, "Keystone") systems. On the June 2001 diagnostics release for HP-UX 11i, the System Info tool does not complete execution on these systems, and does not provide complete information about the system. In the activity log, several messages would appear, such as:
Failed to execute the PDC_PAT_COMPLEX PDC call by performing an ioctl call through the diag2 pseudo driver.
The problem has been fixed on the Sept 2001 diagnostics release for HP-UX 11i.
K-Class and T-Class computers running HP-UX 11i will experience HPMCs or Data Page Faults if you run the STM Info Tool to retrieve configuration information from HP-PB SE SCSI adapters (JAGad88317, JAGad97126).
The root cause is a problem in the HP-UX 11i kernel, which is exposed when the Info Tool is run as described above. To correct the problem and avoid potential HPMCs, load patch PHKL_25552 or its successor. This is a kernel patch and requires a system reboot.
STM only exposes this problem if the Info Tool is run on HP-PB SE SCSI adapters. STM does not expose the problem if the Info Tool is run against other I/O devices.
On both SuperDome (see JAGae02288) and legacy (see JAGad56511) systems, a problem has been encountered when trying to use Logtool to view the memory log. When an error occurs in the Page Deallocation Table (PDT), the status appears in the Logtool view of the memory log as "pending." When the "pending" status appears, it generally means that the page will be deallocated when the system is rebooted. However, if on reboot the "pending" status still appears for that particular page, then that means that the page is in the kernel, and cannot be deallocated (our code does not allow the pages in the kernel to be deallocated).
You can use the following sample CSTM scripts by highlighting them and pasting them at the hp-ux prompt (using your own e-mail address where appropriate).
This script runs all information tools and sends the results to a specified email address:
echo
"selall;info;wait;infolog
view
done
"|cstm|mail user_name@xxx.xxx.xxx.com
This script sends the System Activity Log, Map Log, and User Activity Log to a specified email address:
echo
"sysact
view
done
maplog
view
done
uiact
view
done
"|cstm|mail user_name@xxx.xxx.xxx.com