| A node boots to local disk and
runs through the node configuration phase (nconfigure) instead of
imaging. | An nconfig starting entry appears
in the imaging.log file. | Verify BIOS settings to ensure that the node is set to network boot
and that the correct network adapter is at the top of the boot order. |
| A node hangs while imaging. | You can determine when a node hangs during imaging
by monitoring the imaging.log file, which is
described in “How To Monitor An Imaging Session”. Further inspection can be done
by setting the correct console parameter in the /tftpboot/pxelinux.cfg/default file before booting. | Retry the imaging operation. Verify that the network is functioning properly.
|
A node is dropped out of
the imaging process. | You can
determine when a node drops out of the imaging process by monitoring
the imaging.log file. The reason
the node dropped out might be that the speed of the node dropped below
the acceptable range. The ethtool was
added to the imaging environment, and it queries the speed of the
network connection with the head node and drops a node from the imaging
process if the speed is less than 1000 MB per second. | Configure the maximum speed by adding ETHSPEED=n to the kernel command
line. If the reported speed of the network device is greater than n, imaging proceeds. Setting ETHSPEED=0 forces imaging to occur unconditionally. |
| Disk device not found. | Identified by monitoring imaging.log file or watching the console. | Ensure that
disk is working correctly and is properly seated in the node. |
| The node configuration phase (nconfig)
fails, and the system is left in single-user mode. | Identified by monitoring imaging.log file. The system will completely boot, but the node will not show
up as available by the sinfo command. | Correct the cluster configuration using the cluster_config utility. Then, you can use the startsys command to reimage or you can rerun the nconfigure
phase: # service nconfig nconfigure |
|
| A node spontaneously reboots during
imaging. | Verified by multiple “starting
imaging” messages in the rsyncd log file. | Verify hardware, BIOS, and kernel boot option settings. |
| The network boot times out. | The system boots from local disk and runs nconfigure.
You can verify this by checking messages written to the imaging.log file. | Verify DHCP settings and status of daemon. Verify network status and connections. Monitor the /var/log/dhcpd.log file for DHCPREQUEST messages from the client node
MAC address. Check boot order and BIOS settings. Rerun imaging/booting operations with less nodes.
|
| A node configuration (nconfigure)
operation fails while attempting to access the configuration and management
database on the head node. | The system is
placed in single-user mode. | Ensure that the mysqld daemon is
running on the head node. Verify network connections. Boot fewer nodes in a single operation.
|
| An imaged node boots correctly,
but the node hangs in the autoinstall script waiting for the first
multicast operation. | Verify that the node
has started imaging by looking for “imaging_started” messages
in the rsyncd log file. Verify that no “finished”
messages are in the imaging.log file. | Ensure that startsys is was used
to image the nodes. Check for instances of flamethrower running on the
head node. # ps -aef | fgrep flamethrower |
|
| Multicast operation fails. | Verify that the imaging operation has failed by
examining the imaging.log file and look for multiple
retries of flamethrower. | Verify that the network is quiet. A very busy network
can cause dropped multicast UDP packets. Try this: Stop the imaging operation. Verify that no flamethrower daemons are running. Open the /etc/systemimager/flamethrower.conf file. Comment out the line with FEC = Save the changes to the file and exit the text editor. Retry the imaging operation.
|