The
first time the entire system is started with the startsys command,
power to each node is turned on, each node boots from its network adapter,
and the SystemImager automatic installation environment is downloaded. This
environment automatically installs and configures each node from the golden
image. The startsys command may take several minutes to
power on the nodes on large-scale systems because of scale requirements.
The number of nodes to be installed influences the amount
of time it takes to complete the process. After all nodes are installed, they
automatically reboot to the login prompt. This process can take between two
to three hours on a system with 1024 compute nodes.
This release uses the multicast file transfer technology
to download software to client nodes during their image installation. Multicast
file transfer technology provides a fast and scalable method of installing
systems. Using multicast imaging sends data to many nodes simultaneously
that have been previously set up to listen to a multicast from the designated
image server. Multicast imaging provides very little resource drain on the
image server as compared to other file transfer technologies, and therefore,
allows systems of all sizes to be installed relatively quickly.
Multicast imaging uses the udpcast open source package,
and the flamethrower functionality of SystemImager. A series of udp-sender
daemons are run on the image server, and each client node runs a series of
udp-receiver daemons during the imaging operation. The udp-sender daemons
are managed by the startsys command. The startsys command
starts these daemons when the --image_only or --image_and_boot options
are entered on the command line and then shuts these daemons down after the
imaging operation is complete. Therefore, you must use startsys when
performing a full installation through the imaging operation.
Follow this procedure to start the system and propagate
the golden image to all nodes; the command-line options depend on the number
of nodes in the system:
Make sure the XC.lic license
key file is located in the following directory:
# ls /opt/hptc/etc/license |
Use the startsys command
to turn on power to all nodes, image the nodes, and boot the nodes. As shown
in Table 3-9, the command-line
options for the initial system image and boot depend upon the size of the
system. See startsys( 8) for the complete list of command options.
Table 3-9 The startsys Command-Line Options for Initial System Image and Boot
| startsys Command-Line Option | Description | Use on Systems with
Fewer Than 300 Nodes? | Use on Systems with More Than 300 Nodes? |
|---|
--image_and_boot | Images and then reboots all nodes so they complete
their per-node configuration phase, thus completing the installation on the
nodes. This option applies only for nodes that have previously been set up
to network boot. | Yes | No |
--image_only | Completes the imaging phase and then halts the nodes
before their per-node configuration phase. For large-scale systems, booting
while imaging is not recommended. | No | Yes |
Use this startsys command on systems with fewer
than 300 nodes to image and boot all nodes. For larger hardware configurations,
use the command shown in step 4.
# startsys --image_and_boot |
Use these startsys commands on systems with
more than 300 nodes. The image and boot process is completed in two phases:
Image the nodes:
Boot the nodes:
# startsys --boot_group_delay=240 |
 |
 |  |
 |
 | NOTE: Use the --boot_group_delay=240 option
only the first time nodes are booted after being imaged; the value 240 specifies
the number of seconds to wait between groups of nodes as they are booting.
For more information about this value, see startsys(8) |
 |
 |  |
 |
If you want to watch as the startsys command
images and powers on nodes, open a second terminal window and issue a tail command
to view the following log files:
/hptc_cluster/adm/logs/imaging.log
/hptc_cluster/adm/logs/startsys.log
Command output on a small, 16-node configuration is similar to the following:
 |
# startsys --image_and_boot
Thu Sep 28 08:49:10 2006 Enabled nodes: 16 nodes -> n[1-16]
Thu Sep 28 08:49:12 2006 Removing the execution node: n16
Thu Sep 28 08:49:12 2006 Boot hierarchy of specified nodes is: n15 n[1-14]
Thu Sep 28 08:49:15 2006 Initial power test - please wait.
Thu Sep 28 08:49:24 2006 Nodes that will image: 15 nodes -> n[1-15]
You must manually power on the following nodes:
n1
Press enter after applying power to these nodes.
continuing ........
Thu Sep 28 08:49:29 2006 Powering on for image: 14 nodes -> n[2-15]
Thu Sep 28 08:50:34 2006 Retrying power --on command: 3 nodes -> n[2-3,15]
*** Thu Sep 28 08:52:19 2006 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
Flamethrower started: nodes waiting: 15 nodes -> n[1-15]
*** Thu Sep 28 08:55:19 2006 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
*** Thu Sep 28 08:58:19 2006 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
Thu Sep 28 08:58:34 2006 Imaging completed; will be powered off: 2 nodes -> n[1-2]
You must manually power off the following nodes:
n1
Press enter after removing power from these nodes.
continuing ........
Thu Sep 28 08:59:02 2006 Powering off: 1 node -> n2
Thu Sep 28 08:59:48 2006 Imaging completed; will be powered off: 9 nodes -> n[4-10,12,14]
Thu Sep 28 08:59:48 2006 Powering off: 9 nodes -> n[4-10,12,14]
Thu Sep 28 09:00:04 2006 Imaging completed; will be powered off: 3 nodes -> n[11,13,15]
Thu Sep 28 09:00:04 2006 Powering off: 3 nodes -> n[11,13,15]
Thu Sep 28 09:00:52 2006 Imaging completed; will be powered off: 1 node -> n3
Thu Sep 28 09:00:52 2006 Powering off: 1 node -> n3
Thu Sep 28 09:01:07 2006 Retrying power --off command: 1 node -> n15
*** Thu Sep 28 09:01:22 2006 Current statistics:
Waiting for hierarchy to boot: 15 nodes -> n[1-15]
Progress:
Thu Sep 28 09:01:22 2006 Powering on for boot: 1 node -> n15
Thu Sep 28 09:02:33 2006 Retrying power --on command: 1 node -> n15
Thu Sep 28 09:04:18 2006 Processing completed for: 1 node -> n15
*** Thu Sep 28 09:04:33 2006 Current statistics:
Booted and available: 1 node -> n15
Waiting for hierarchy to boot: 14 nodes -> n[1-14]
Progress:
You must manually power on the following nodes:
n1
Press enter after applying power to these nodes.
continuing ........
Thu Sep 28 09:04:37 2006 Powering on for boot: 13 nodes -> n[2-14]
Thu Sep 28 09:05:33 2006 Retrying power --on command: 12 nodes -> n[2-6,8-14]
Thu Sep 28 09:06:48 2006 Processing completed for: 1 node -> n1
Thu Sep 28 09:07:03 2006 Processing completed for: 1 node -> n7
Thu Sep 28 09:07:18 2006 Processing completed for: 9 nodes -> n[4-5,8-14]
*** Thu Sep 28 09:07:33 2006 Current statistics:
Booted and available: 15 nodes -> n[1-15]
Progress:
Thu Sep 28 09:07:33 2006 Processing completed for: 3 nodes -> n[2-3,6]
*** Thu Sep 28 09:07:33 2006 Current statistics:
Booted and available: 15 nodes -> n[1-15]
Progress:
Thu Sep 28 09:07:34 2006 startsys process exiting with code 0 |
 |
See “Troubleshoot the Imaging Process” if
you encounter problems imaging nodes.
Proceed to “Task 12: Perform Postconfiguration Tasks for the InfiniBand Interconnect”.