Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing HP-UX Software With SD-UX: HP 9000 Computers > Appendix B Troubleshooting SD

Problems

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This section presents a selection of problems you might encounter and how to resolve them.

Cannot Contact Target Host Daemon/Agent

If you see the following error message:

ERROR:	Could not contact host <hostname>. Make sure the hostname
is correct.

it means that the hostname you specified could not be found in the hosts database. Make sure you have typed the hostname correctly (you can use the nslookup(1) command to verify hostnames). If the target hostname is not in the hosts database, but you know its network address, you can use it (in standard "dot" notation) in place of the hostname.

If you see this error message:

ERROR:		A Remote Procedure Call to a daemon has failed.  Could not
start a management session for <target>. Make sure the
host is accessible from the network, and that its daemon,
swagentd, is running. If the daemon is running see the
daemon logfile on this target for more information.

it means SD could not contact the daemon program on a specific target system. Note that this may occur even if you haven't specified any targets, for example, if the daemon on your local host is not running and the select_local option is set to "true."

Resolution:

If the SD daemon/agent is not installed on a given target system, you must install it before you can begin managing the system with SD.

If you've verified that the daemon/agent component has been installed on a target system and you still have trouble contacting it, check to see that the daemon is running:

  1. On the target system, type:

    ps -e | grep swagentd

  2. If the daemon does not appear to be running, you can start it by typing (as root on the target system):

    /usr/sbin/swagentd

  3. If you attempt to start a daemon when one is already running, you will see a message about the other daemon; this is harmless.

    You can also kill and restart a currently running daemon by typing:

    /usr/sbin/swagent -r

Other possible causes for this problem are listed in the section “Connection Timeouts and Other WAN Problems ”.

Tip:

An easy way to determine if a target system has the SD daemon installed and running is to type:

/usr/sbin/swlist -l depot @ <one or more target hostnames>

which will attempt to contact each target to get a list of registered depots. Those targets which have the SD daemon installed will report either:

# Initializing...
# Target <hostname> has the following depot(s):
<list of depots>
.
.
.

or

# Initializing...
WARNING: No depot was found for <hostname>.

For more information on daemon activity, see the daemon logfile in /var/adm/sw/swagentd.log.

Access To An Object Is Denied

There are a number of things that can cause denial of access to SD objects.

  • ACL Permissions

    • The effects of ACL modifications

    • Modifying ACLs without using swacl

  • Inter-host Secrets

  • Working with "image" copies of depots

Resolution:

Generally, when you are denied access to an SD object, the system tells you that you do not have the required access permission for the object. Sometimes it may be unclear which object is not accessible. For example, when using swcopy to copy a product from system A to a depot, a number of ACLs may be checked:

  1. If the destination depot does NOT exist, the host ACL is checked to verify that the user has "insert" permission.

  2. If the destination depot does exist, the depot ACL is checked to verify that the user has "write" permission.

  3. The source depot's ACL is checked to make sure the user has "read" permission on the source depot.

  4. The source product's ACL is also checked to make sure that the user and the destination system both have "read" access to the product.

If any of these access permissions is absent, the whole operation is disallowed and the error message becomes critical in understanding the cause. To see more about what type of security or access problems exist, see the daemon log file on the target system: /var/adm/sw/swagentd.log

The Effects of ACL Modifications

The default SD ACLs make it fairly easy to administer ACLs, but do not always give the desired level of access control. When restricting access, especially by removing the any_other "read" permission, it is fairly easy to restrict access in unexpected ways. Remember that "host" entries are required for any destination systems for swcopy and swinstall operations.

Review Chapter 7 “Modifying IPD or Catalog Contents ” for a full discussion of the access tests performed by SD for each operation.

The Effects Of Modifying ACL Files Without swacl

Since ACLs in SD are stored in the file system as plain text files, it may be tempting to edit them with a conventional editor. This can lead to unexpected corruption of the ACL. Most cases of this corruption simply result in a message indicating the corruption, but inserting additions to the ACL file without updating the num_entries value can result in unreported problems leading only to denial of access. A common failure could occur, for instance, if a user entry were inserted, pushing the "any_other" entry down beyond the num_entries limit, resulting in the SD ACL manager never reading the any_other entry and causing problems. The best guard against this is to always use the swacl command to manipulate ACLs.

Inter-host Secrets

The default /var/adm/sw/security/secrets file shipped with SD contains a single entry:

default      -sdu-

The "-sdu-" should be replaced by a different default secret, or the entire entry eliminated if you wish to explicitly name all hosts from which controllers can be run. See Chapter 7 “Modifying IPD or Catalog Contents ” for a thorough discussion of the secrets file.

The controller (for swinstall, swcopy, etc.) looks up the secret for the system on which it runs and passes it in an encrypted form to its agent. The agent receiving a request from the controller looks up the secret for the host from which the call comes, encrypts it and compares the encryption to that provided by the controller. If the two secrets do not match, access is denied. If you have problems with this mechanism, make sure that all systems have matching entries. You can also revert to the original secrets file (/etc/newconfig/sd/secrets on 9.x and /usr/newconfig/var/adm/sw/security/secrets on 10.x) on all hosts, or simply copy a single secrets file to all hosts.

Working With Depot "Images"

There is a problem in using cp, tar, cpio, dd, and other commands to copy images of depots for use on other systems. The problem is that depot and product ACLs in the image have built-in knowledge of the host on which the depot originated. In particular, the ACL's default realm will be wrong and local users will be confused with users on the originating host. For example, attempts to add local users to the access list will, in fact, grant access to remote users.

Other problems can also arise. Since there is no way to alter the default realm of an ACL from that set when it is created, this is an intractable problem. Another common problem with such images occurs when they are imported to systems which cannot resolve all the hostnames (see resolver(4) and nslookup(1)) which exist in the ACLs.

If your purpose is to create a "staged" installation, use swcopy to propagate the depot. This will create new ACLs, based on local templates, for each instance of the depot.

If the sole intent of a depot is for such image distribution, you may wish to set the swpackage.create_target_acls option to false to prevent ACL creation on the depot and products during the swpackage operation. This is the option used to create tape and CD-ROM images. ACL-less depots and products grant the local superuser all privileges, while all other users and systems have read access. Note that when you copy or install this ACL-less depot with swcopy or swinstall, the copies (installations) are automatically protected by ACLs based on templates on the destination host.

Slow Network Performance

When using swinstall or swcopy in an environment where network bandwidth is the "bottleneck," the file transfer rate between source and target(s) can become very slow.

Resolution:

The compress_files=true option compresses files transferred from a source depot to a target. This can reduce network usage by approximately 50%; the exact amount of compression depends on the type of files. Binary files compress less than 50%, text files generally compress more.

The greatest throughput improvements are seen when transfers are across a slow network (approximately 50kbyte/sec or less), and the source depot server is serving a few target hosts at a time.

NOTE: This option should be set to "true" only when network bandwidth is clearly restricting total throughput. If this option is used with a fast network or with a depot server simultaneously connected to many target hosts, this option can actually reduce overall throughput or performance, unless the source depot is already compressed.

If it is not clear that this option will help in your situation, compare the throughput of a few install or copy tasks (both with and without compression) before changing this option value.

Connection Timeouts and Other WAN Problems

Low-throughput, wide-area networks can cause SD to encounter time-out problems when establishing and maintaining network connections with remote agents on other systems.

If you see the following messages:

ERROR:	A Remote Procedure Call to a daemon has failed.  Could
not start a management session for <target>. Make sure the
host is accessible from the network, and that its daemon,
swagentd, is running. If the daemon is running see the
daemon logfile on this target for more information.

or

ERROR:   Could not perform the requested operation for <target>,
possibly due to a network communications failure. Check
that the host is still accessible from the network.

and you have verified that the system is up and the daemon program (swagentd) is running on it, it may be that network delays are causing the connection to time-out.

Resolution:

Increase the time-out value used by SD when performing Remote Procedure Calls (RPCs) by specifying a higher value for the rpc_timeout option, either via the command line or in the defaults file. RPC time-out values range from 0 to 9, with 9 being the longest time-out. The SD default RPC time-out value is 5. Note that these values do not represent any specific time units. See Appendix A “Default Options and Keywords ” for more information on the rpc_timeout option.

Increasing the rpc_timeout can also help in situations where the target agents in an install or copy session are timing out when trying to contact the source agent. This problem is indicated by the following error messages in the agent log file:

ERROR:	Could not open remote depot/root <path> due to an RPC
or network I/O error.
ERROR: Cannot open source. Check above for errors, as well as the
daemon logfile on the source host (default location:
/var/adm/sw/swagentd.log).
ERROR: Cannot continue the Analysis Phase until the previous
errors are corrected.

Another factor that can affect RPC timeouts on a slow network is the choice of network protocol. SD supports both UDP- and TCP-based communication (the default is UDP). TCP communication can be more reliable on a WAN because it is connection-based. If you are still having time-out problems on a slow-throughput WAN, you can tell SD to use the TCP-based communication instead, via the rpc_binding_info option:

rpc_binding_info=ncacn_ip_tcp:[2121]

As with any controller option, you can specify this option on the command line using the -x option or by editing the defaults file. Note that the daemon program (swagentd) listens for both UDP- and TCP-based RPCs by default. See Appendix A “Default Options and Keywords ” for more information on the rpc_binding_info option.

A final WAN-related issue may arise when using the interactive GUI. During the analysis and execution phases of an interactive SD session, each target agent is periodically "polled" for up-to-date status information. The polling_interval option can be used to control the number of seconds that elapse between successive status polls of a given target system. On networks where even this minor data transfer is a problem, you can increase this polling interval, thus decreasing the frequency of polling, and reducing an interactive session's overall demands on the network. See Appendix A “Default Options and Keywords ” for more information on the polling_interval option.

Disk Space Analysis Is Incorrect

Your installation or copy operation runs out of space even though the disk space analysis succeeded.

Upon further checking, you find that the results of the disk space analysis differ from the actual space available.

Resolution:

Possible causes of this problem:

  • A control script associated with the installation has consumed disk space by creating or copying additional files that aren't accounted for during analysis.

  • Your target systems were not idle when the analysis was done and some other activity (unrelated to SD or another active SD session) was consuming disk space.

  • The depot from which the product was installed or copied was created by swpackage with the package_in_place option set to true, and source files have been modified since the product was packaged. The swverify command can be used to diagnose this problem.

The Packager Fails

A swpackage operation may fail because of the incorrect use of the end keyword in the Product Specification File (PSF).

Resolution:

The end keyword marks the end of a depot, vendor, product, subproduct or fileset specification in a PSF. It requires no value and is optional. However, if you use it and it is incorrectly placed, the specification will fail. Check to make sure, if you use it, there is an end keyword for every object specification (especially the last one).

Truncating The Daemon Logfile

If you want to shorten (truncate) the SD daemon logfile because it is getting too long, follow this procedure:

Resolution:

If the daemon is currently running, DO NOT remove its logfile. The running daemon will continue to log messages to its logfile even after you've removed it, causing any subsequent information to be lost. Also, the disk space used by the logfile will not be freed as long as the daemon is running.

Instead, truncate the logfile by typing (as root):

echo > /var/adm/sw/swagentd.log

which replaces the data that was there with an empty string.

If you inadvertently remove the daemon logfile while it is running, you must kill and restart the daemon if you want to see subsequent daemon log messages and free up the disk space used by the logfile. You can stop (kill) a daemon by typing:

usr/sbin/swagentd -k

You can also kill and restart a currently running daemon by typing:

usr/sbin/swagentd -r

Cannot Read a Tape Depot

If you are trying to access a tape depot and see the following error message in the daemon logfile, it means that the tape is either corrupt or is not in SD format.

ERROR:	The INDEX file on the source did not exist or could not be
read.
ERROR: The target <depot_path> could not be opened.

Resolution:

Make sure that you have correctly specified the tape device and that the correct tape is in the drive. SD will only read tapes that are in SD format (for example, SD does not read update(1m) format tapes).

Installation Fails

If an installation fails part way through the install, SD lets you easily restart the operation (either by just re-executing the same command from the command line, or by recalling the session file swinstall.last that was automatically saved for you).

By default, SD checkpoints to the fileset level, meaning that the operation will start transferring files with the last fileset to be attempted. By setting the reinstall_files option to false, the distribution and installation of files will begin with the file that was last attempted. SD does not support checkpointing below the file level. Also, all checkpointing can be overridden by setting both the reinstall and reinstall_files options to true.

See Appendix A “Default Options and Keywords ” for more information on these options.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1997 Hewlett-Packard Development Company, L.P.