The first time the entire system is started with the startsys command, power to each node is turned on, each node boots from its network adapter, and the SystemImager automatic installation environment is downloaded. This environment automatically installs and configures each node from the golden image. The startsys command may take several minutes to power on the nodes on large-scale systems due to scale requirements.
The number of nodes to be installed influences the amount of time it takes to complete the process. After all nodes are installed, they automatically reboot to the login prompt. This process can take between two to three hours on a system with 1024 compute nodes.
This release uses the multicast file transfer technology to download software to client nodes during their image installation. Multicast file transfer technology provides a fast and scalable method of installing systems. Using multicast imaging sends data to many nodes simultaneously that have been previously set up to listen to a multicast from the designated image server. Multicast imaging provides very little resource drain on the image server as compared to other file transfer technologies, and therefore, allows systems of all sizes to be installed relatively quickly.
Multicast imaging uses the udpcast open source package, and the flamethrower functionality of SystemImager. A series of udp-sender daemons are run on the image server, and each client node runs a series of udp-receiver daemons during the imaging operation. The udp-sender daemons are managed by the startsys command. The startsys command starts these daemons when the --image_only or --image_and_boot options are entered on the command line and then shuts these daemons down after the imaging operation is complete. Therefore, you must use startsys when performing a full installation through the imaging operation.
Follow this procedure to start the system and propagate the golden image to all nodes; the command line options depend on the number of nodes in your system:
Make sure the XC.lic license key file is located in the following directory:
# ls /opt/hptc/etc/license |
Use the startsys command to turn on power to all nodes, image the nodes, and boot the nodes. As shown in Table 4-7, the command line options depend upon the size of your system.
Table 4-7 The startsys Command Line Options
| startsys Command Line Options | Use on Systems with Fewer Than 300 Nodes? | Use on Systems with More Than 300 Nodes? |
|---|
--image_and_boot images and then reboots all nodes so they complete their per-node configuration phase, thus completing the installation on the nodes. This option applies only for nodes that have previously been set up to network boot. | Yes | No |
--image_only completes the imaging phase and then halts the nodes before their per-node configuration phase. For large-scale systems, booting while imaging is not recommended. | No | Yes |
Enter one of the following commands depending upon the size of your system:
On systems with fewer than 300 nodes, enter the following command to image and boot all nodes:
# startsys --image_and_boot |
On systems with more than 300 nodes, enter the following command to image all nodes. Then, proceed to step 3 to boot the nodes after they are imaged:
Command output on a small system looks similar to the following:
 |
Enabled nodes: n[14-16]
Removing the execution node: n16
Boot hierarchy of specified nodes is: n14 n15
Nodes requiring a power off -> n[14-15]
Powering off -> n[14-15]
Nodes that will image -> n[14-15]
Powering on for image -> n[14-15]
Nodes set up for flamethrow -> n[14-15]
Current statistics:
Currently processing -> n[14-15]
Current statistics:
Currently processing -> n[14-15]
Current statistics:
Currently processing -> n[14-15]
Power down required after image load -> n[14-15]
Nodes requiring a power off -> n[14-15]
Powering off -> n[14-15]
Powering on for boot -> n14
Current statistics:
Currently processing -> n[14-15]
Waiting to boot -> n15
Nodes with a valid loaded image -> n[14-15]
Current statistics:
Currently processing -> n[14-15]
Waiting to boot -> n15
Nodes with a valid loaded image -> n[14-15]
Current statistics:
Currently processing -> n[14-15]
Waiting to boot -> n15
Nodes with a valid loaded image -> n[14-15]
Processing completed for -> n14
Powering on for boot -> n15
Current statistics:
Currently processing -> n15
Nodes with a valid loaded image -> n[14-15]
No further processing required for -> n14
Current statistics:
Currently processing -> n15
Nodes with a valid loaded image -> n[14-15]
No further processing required for -> n14
Current statistics:
Currently processing -> n15
Nodes with a valid loaded image -> n[14-15]
No further processing required for -> n14
Processing completed for -> n15
Current statistics:
Nodes with a valid loaded image -> n[14-15]
No further processing required for -> n[14-15]
startsys process exiting with code 0 |
 |
Enter the following command to boot the client nodes on systems with more than 300 nodes because the nodes were not booted during their imaging operation:
# startsys --boot_group_delay=240 |
Run the SLURM postconfiguration utility to update the slurm.conf file with the correct processor count and memory size. Skip this step if you installed standard LSF.
# /opt/hptc/sbin/spconfig
Detected the Quadrics Elan interconnect; configuring SLURM support...
The new 'elanhosts' file contains entries for 3 nodes...
Configured nodes n[1-15] with 2 CPUs and 8108 MB of total memory...
Configured node n16 with 4 CPUs and 9549 MB of total memory...
Restarting SLURM...
SLURM Post-Configuration Done.
|
Complete this step for systems installed with an InfiniBand interconnect; skip this step for all other interconnect types.
Create a list of all nodes in the HP XC system. The InfiniBand diagnostic tools requires this list of nodes in order to function properly:
# shownode all > /usr/voltaire/scripts/HCA400-Checks/node-list |
Proceed to “Task 10: Confirm Compute Resources” to confirm compute resources.