 |
» |
|
|
 |
The first time the entire
system is started with the startsys command, power
to each node is turned on, each node boots from its network adapter,
and the SystemImager automatic installation environment is downloaded.
This environment automatically installs and configures each node from
the golden image. The startsys command might take
several minutes to power on the nodes on large-scale systems because
of scale requirements. The number of nodes to be installed influences
the amount of time it takes to complete the process. After all nodes
are installed, they automatically reboot to the login prompt. This
process can take between two to three hours on a system with 1024
compute nodes. This release uses the multicast file transfer technology
to download software to client nodes during their image installation.
Multicast file transfer technology provides a fast and scalable method
of installing systems. Using multicast imaging sends data to many
nodes simultaneously that have been previously set up to listen to
a multicast from the designated image server. Multicast imaging provides
very little resource drain on the image server as compared to other
file transfer technologies, and therefore, allows systems of all sizes
to be installed relatively quickly. Multicast imaging uses the udpcast open source
package, and the flamethrower functionality of SystemImager. A series
of udp-sender daemons are run on the image server, and each client
node runs a series of udp-receiver daemons during the imaging operation.
The udp-sender daemons are managed by the startsys command. The startsys command starts these daemons
when the --image_only or --image_and_boot options are entered on the command line and then shuts these daemons
down after the imaging operation is complete. Therefore, you must
use startsys when performing a full installation
through the imaging operation. Startup ProcedureFollow this procedure to start the system and propagate
the golden image to all nodes; the command-line options depend on
the number of nodes in the system: Determine whether you want to override
files delivered in the golden image. This step is optional, and typically
during an initial system installation and configuration, it is not
necessary. However, be aware that you can modify the files delivered
in the golden image, and the HP XC System Software Administration Guide describes how to do so.  |  |  |  |  | NOTE: If the hardware configuration is not homogenous (that is, not
all nodes are the same hardware model) and contains one or more HP
ProLiant DL145 G2 nodes and they are using InfiniBand PCI-X cards,
add the additional kernel boot option noapic to
the grub.conf file as an override to the system
image. |  |  |  |  |
Make sure the XC.lic license key file is located in the following directory: # ls /opt/hptc/etc/license |
Ensure that the power is off
on all nodes except the head node. Use the startsys command to turn on power to all nodes, image the nodes, and boot
the nodes. The command-line options
for the initial system image and boot are listed in Table 3-9 and depend
on the number of nodes, whether or not the hardware configuration
contains HP server blades and enclosures, and the size of the disks. Table 3-9 startsys Command-Line Options Based on Hardware Configuration | Hardware Configuration | startsys Command Line |
|---|
| Fewer than 300 nodes | For small-scale hardware configurations, nodes are imaged and
rebooted in one operation. The nodes complete their per-node configuration
phase, thus completing the installation. This option applies only
for nodes that have previously been set up to network boot. Enter the following command to image and boot all nodes in
one step: # startsys --image_and_boot |
| | More than 300 nodes | For large-scale hardware configurations, booting nodes while
imaging is not recommended. Thus, issue the following commands to
image and boot nodes in two separate steps: Propagate the golden image to all nodes: Boot all nodes: # startsys --boot_group_delay=240 |
 |  |  |  |  | NOTE: Use the --boot_group_delay=240 option only
the first time nodes are booted after being imaged. The value 240 specifies the number of seconds to wait between groups
of nodes as they are booting. |  |  |  |  |
| | Contains HP server blades
and enclosures | If the hardware configuration
contains HP server blades, booting nodes while imaging is not recommended,
and additional options are required on the command line. Issue the
following commands to boot and image all nodes: Propagate the golden image to all nodes: # startsys --image_only --flame_sync_wait=480 \
--power_control_wait=90 \
--image_timeout=90 |
Boot all nodes when the imaging process is complete: # startsys --power_control_wait=90 \
--boot_group_delay=45 \
--max_at_once=50 |
| | Nodes have disks that
are 250 GB or larger in size or have SATA disks | If
the hardware configuration contains disks that are 250 GB or larger
in size or contains SATA disks, you might need to increase the 45
minute default allowed for imaging by the startsys command. For example, on systems with disks larger
than 250 GB, increase the image timeout limit to 60 minutes, as follows: # startsys --image_and_boot --image_timeout=60 |
You might use a different value depending upon the
size of the disks in the system. |
For more information about startsys command-line
options and option values, see startsys(8). If you want to
watch as the startsys command images and turns
on power to the nodes, open a second terminal window and issue a tail command to view the following log files: /hptc_cluster/adm/logs/imaging.log /hptc_cluster/adm/logs/startsys.log
Command output on a small, 16-node configuration
is similar to the following:  |
Fri Jul 06 08:49:10 2007 Enabled nodes: 16 nodes -> n[1-16]
Fri Jul 06 08:49:12 2007 Removing the execution node: n16
Fri Jul 06 08:49:12 2007 Boot hierarchy of specified
nodes is: n15 n[1-14]
Fri Jul 06 08:49:15 2007 Initial power test - please wait.
Fri Jul 06 08:49:24 2007 Nodes that will image:
15 nodes -> n[1-15]
You must manually power on the following nodes:
n1
Press enter after applying power to these nodes.
continuing ........
Fri Jul 06 08:49:29 2007 Powering on for image:
14 nodes -> n[2-15]
Fri Jul 06 08:50:34 2007 Retrying power --on command:
3 nodes -> n[2-3,15]
*** Fri Jul 06 08:52:19 2007 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
Flamethrower started: nodes waiting: 15 nodes -> n[1-15]
*** Fri Jul 06 08:55:19 2007 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
*** Fri Jul 06 08:58:19 2007 Current statistics:
Imaging: 15 nodes -> n[1-15]
Progress:
Fri Jul 06 08:58:34 2007 Imaging completed; will be powered off:
2 nodes -> n[1-2]
You must manually power off the following nodes: 1
n1
Press enter after removing power from these nodes.
continuing ........
Fri Jul 06 08:59:02 2007 Powering off: 1 node -> n2
Fri Jul 06 08:59:48 2007 Imaging completed; will be powered off:
9 nodes -> n[4-10,12,14]
Fri Jul 06 08:59:48 2007 Powering off: 9 nodes -> n[4-10,12,14]
Fri Jul 06 09:00:04 2007 Imaging completed; will be powered off:
3 nodes -> n[11,13,15]
Fri Jul 06 09:00:04 2007 Powering off: 3 nodes -> n[11,13,15]
Fri Jul 06 09:00:52 2007 Imaging completed; will be powered off:
1 node -> n3
Fri Jul 06 09:00:52 2007 Powering off: 1 node -> n3
Fri Jul 06 09:01:07 2007 Retrying power --off command: 1 node -> n15
*** Fri Jul 06 09:01:22 2007 Current statistics:
Waiting for hierarchy to boot: 15 nodes -> n[1-15]
Progress:
Fri Jul 06 09:01:22 2007 Powering on for boot: 1 node -> n15
Fri Jul 06 09:02:33 2007 Retrying power --on command: 1 node -> n15
Fri Jul 06 09:04:18 2007 Processing completed for: 1 node -> n15
*** Fri Jul 06 09:04:33 2007 Current statistics:
Booted and available: 1 node -> n15
Waiting for hierarchy to boot: 14 nodes -> n[1-14]
Progress:
You must manually power on the following nodes: 2
n1
Press enter after applying power to these nodes.
continuing ........
Fri Jul 06 09:04:37 2007 Powering on for boot:
13 nodes -> n[2-14]
Fri Jul 06 09:05:33 2007 Retrying power --on command:
12 nodes -> n[2-6,8-14]
Fri Jul 06 09:06:48 2007 Processing completed for: 1 node -> n1
Fri Jul 06 09:07:03 2007 Processing completed for: 1 node -> n7
Fri Jul 06 09:07:18 2007 Processing completed for:
9 nodes -> n[4-5,8-14]
*** Fri Jul 06 09:07:33 2007 Current statistics:
Booted and available: 15 nodes -> n[1-15]
Progress:
Fri Jul 06 09:07:33 2007 Processing completed for:
3 nodes -> n[2-3,6]
*** Fri Jul 06 09:07:33 2007 Current statistics:
Booted and available: 15 nodes -> n[1-15]
Progress:
Fri Jul 06 09:07:34 2007 startsys process exiting with code 0 3 |
 |
| 1 | Watch the screen carefully for the message You must manually power off the following nodes. This message
means that you must go to the specific node and press the power button
to turn off power to the node. After doing so, return to the screen
and press the Enter key. | | 2 | Watch the screen carefully for the message You must manually power on the following nodes. This message
means that you must go to the specific node and press the power button
to turn on power to the node. After doing so, return to the screen
and press the Enter key. | | 3 | The message exiting with code 0 indicates successful completion. |
See “Troubleshooting the Imaging Process” if you
encounter problems during the node imaging process.
Proceed to “Task 13: Perform Postconfiguration Tasks for the InfiniBand
Interconnect”.
|