Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: Installation Guide > Chapter 12 Troubleshooting

Troubleshooting the Imaging Process

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This section describes hints to troubleshoot the imaging process.

System imaging and node configuration information is stored in the following log files:

  • /hptc_cluster/adm/logs/imaging.log

  • /var/log/systemimager/rsyncd

  • /hptc_cluster/adm/logs/startsys.log

Table 12-1 lists problems you might encounter as the golden image is being propagated to client nodes and describes how to diagnose and resolve the problem.

Table 12-1 Diagnosing System Imaging Problems

SymptomHow To DiagnosePossible Solution
A node boots to local disk and runs through the node configuration phase (nconfigure) instead of imaging.An nconfig starting entry appears in the imaging.log file.Verify BIOS settings to ensure that the node is set to network boot and that the correct network adapter is at the top of the boot order.
A node hangs while imaging.You can determine when a node hangs during imaging by monitoring the imaging.log file, which is described in “How To Monitor An Imaging Session”. Further inspection can be done by setting the correct console parameter in the /tftpboot/pxelinux.cfg/default file before booting.
  • Retry the imaging operation.

  • Verify that the network is functioning properly.

A node is dropped out of the imaging process.

You can determine when a node drops out of the imaging process by monitoring the imaging.log file.

The reason the node dropped out might be that the speed of the node dropped below the acceptable range.

The ethtool was added to the imaging environment, and it queries the speed of the network connection with the head node and drops a node from the imaging process if the speed is less than 1000 MB per second.

Configure the maximum speed by adding ETHSPEED=n to the kernel command line. If the reported speed of the network device is greater than n, imaging proceeds. Setting ETHSPEED=0 forces imaging to occur unconditionally.

Disk device not found.Identified by monitoring imaging.log file or watching the console.Ensure that disk is working correctly and is properly seated in the node.
The node configuration phase (nconfig) fails, and the system is left in single-user mode.Identified by monitoring imaging.log file. The system will completely boot, but the node will not show up as available by the sinfo command.

Correct the cluster configuration using the cluster_config utility. Then, you can use the startsys command to reimage or you can rerun the nconfigure phase:

# service nconfig nconfigure
A node spontaneously reboots during imaging.Verified by multiple “starting imaging” messages in the rsyncd log file.Verify hardware, BIOS, and kernel boot option settings.
The network boot times out.The system boots from local disk and runs nconfigure. You can verify this by checking messages written to the imaging.log file.
  • Verify DHCP settings and status of daemon.

  • Verify network status and connections.

  • Monitor the /var/log/dhcpd.log file for DHCPREQUEST messages from the client node MAC address.

  • Check boot order and BIOS settings.

  • Rerun imaging/booting operations with less nodes.

A node configuration (nconfigure) operation fails while attempting to access the configuration and management database on the head node.The system is placed in single-user mode.
  • Ensure that the mysqld daemon is running on the head node.

    # service mysqld status
  • Verify network connections.

  • Boot fewer nodes in a single operation.

An imaged node boots correctly, but the node hangs in the autoinstall script waiting for the first multicast operation.Verify that the node has started imaging by looking for “imaging_started” messages in the rsyncd log file. Verify that no “finished” messages are in the imaging.log file.
  • Ensure that startsys is was used to image the nodes.

  • Check for instances of flamethrower running on the head node.

    # ps -aef | fgrep flamethrower
Multicast operation fails.Verify that the imaging operation has failed by examining the imaging.log file and look for multiple retries of flamethrower.
  • Verify that the network is quiet. A very busy network can cause dropped multicast UDP packets.

  • Try this:

    1. Stop the imaging operation.

    2. Verify that no flamethrower daemons are running.

    3. Open the /etc/systemimager/flamethrower.conf file.

    4. Comment out the line with FEC =

    5. Save the changes to the file and exit the text editor.

    6. Retry the imaging operation.

 

/hptc_cluster File System Does Not Mount

It is possible to experience a mount failure when nodes image, boot, and attempt to NFS mount the /hptc_cluster file system. The nodes receive a Permission denied error message from the attempt to mount the /hptc_cluster file system.

Run the following commands on the head node to restart nfs and re-run node configuration scripts and restart services on all other nodes:

# service nfs restart
# pdsh -a touch /var/hptc/nconfig.1st
# stopsys
# startsys

Client Node or Nodes Do Not Network Boot

The following message displays on a client node if the per-node symbolic links to the elilo.efi file are lost:

TSize..
PXE-E23: Client received TFTP error from server.
PXE-E98: Code: 1h  File not found
Load of Netboot failed: Not Found

Enter the following command on the affected node to fix the network boot problem:

setnode --resync node_name

How To Monitor An Imaging Session

To monitor an imaging operation, use the tail -f command in another terminal window to view the imaging log files.

It is possible to actually view an installation through the remote serial console, but to do so, you must edit the /tftpboot/pxelinux.cfg/default file before the installation begins and add the correct serial console device to the APPEND line. If this is done, disable the cmfd services and image a smaller group of nodes sat any one time. The network traffic caused by the serial console can adversely affect the imaging operation.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.