Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software : Installation Guide > Chapter 4 Configuring and Imaging the System

Task 9: Run the startsys Utility To Start the System and Propagate the Golden Image

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The first time the entire system is started with the startsys command, power to each node is turned on, each node boots from its network adapter, and the SystemImager automatic installation environment is downloaded. This environment automatically installs and configures each node from the golden image. The startsys command may take several minutes to power on the nodes on large-scale systems due to scale requirements.

The number of nodes to be installed influences the amount of time it takes to complete the process. After all nodes are installed, they automatically reboot to the login prompt. This process can take between two to three hours on a system with 1024 compute nodes.

This release uses the multicast file transfer technology to download software to client nodes during their image installation. Multicast file transfer technology provides a fast and scalable method of installing systems. Using multicast imaging sends data to many nodes simultaneously that have been previously set up to listen to a multicast from the designated image server. Multicast imaging provides very little resource drain on the image server as compared to other file transfer technologies, and therefore, allows systems of all sizes to be installed relatively quickly.

Multicast imaging uses the udpcast open source package, and the flamethrower functionality of SystemImager. A series of udp-sender daemons are run on the image server, and each client node runs a series of udp-receiver daemons during the imaging operation. The udp-sender daemons are managed by the startsys command. The startsys command starts these daemons when the --image_only or --image_and_boot options are entered on the command line and then shuts these daemons down after the imaging operation is complete. Therefore, you must use startsys when performing a full installation through the imaging operation.

Follow this procedure to start the system and propagate the golden image to all nodes; the command line options depend on the number of nodes in your system:

  1. Make sure the XC.lic license key file is located in the following directory:

    # ls /opt/hptc/etc/license
    Caution:

    You cannot continue if the license key file is not present in this directory. See “Task 6: Have the License Key File Ready” and “Put the License Key File in the Correct Location (Required)” for more information about obtaining and positioning the license key file if you have not already done so.

  2. Use the startsys command to turn on power to all nodes, image the nodes, and boot the nodes. As shown in Table 4-7, the command line options depend upon the size of your system.

    Table 4-7 The startsys Command Line Options

    startsys Command Line OptionsUse on Systems with Fewer Than 300 Nodes?Use on Systems with More Than 300 Nodes?

    --image_and_boot images and then reboots all nodes so they complete their per-node configuration phase, thus completing the installation on the nodes. This option applies only for nodes that have previously been set up to network boot.

    YesNo

    --image_only completes the imaging phase and then halts the nodes before their per-node configuration phase. For large-scale systems, booting while imaging is not recommended.

    NoYes

     

    Enter one of the following commands depending upon the size of your system:

    • On systems with fewer than 300 nodes, enter the following command to image and boot all nodes:

      # startsys --image_and_boot
    • On systems with more than 300 nodes, enter the following command to image all nodes. Then, proceed to step 3 to boot the nodes after they are imaged:

      # startsys --image_only

    Command output on a small system looks similar to the following:

    Enabled nodes: n[14-16]
    Removing the execution node: n16
    Boot hierarchy of specified nodes is: n14 n15
    Nodes requiring a power off -> n[14-15]
    Powering off -> n[14-15]
    Nodes that will image -> n[14-15]
    Powering on for image -> n[14-15]
    Nodes set up for flamethrow -> n[14-15]
    Current statistics:
      Currently processing -> n[14-15]
    Current statistics:
      Currently processing -> n[14-15]
    Current statistics:
      Currently processing -> n[14-15]
    Power down required after image load -> n[14-15]
    Nodes requiring a power off -> n[14-15]
    Powering off -> n[14-15]
    Powering on for boot -> n14
    Current statistics:
      Currently processing -> n[14-15]
      Waiting to boot -> n15
      Nodes with a valid loaded image -> n[14-15]
    Current statistics:
      Currently processing -> n[14-15]
      Waiting to boot -> n15
      Nodes with a valid loaded image -> n[14-15]
    Current statistics:
      Currently processing -> n[14-15]
      Waiting to boot -> n15
      Nodes with a valid loaded image -> n[14-15]
    Processing completed for -> n14
    Powering on for boot -> n15
    Current statistics:
      Currently processing -> n15
      Nodes with a valid loaded image -> n[14-15]
      No further processing required for -> n14
    Current statistics:
      Currently processing -> n15
      Nodes with a valid loaded image -> n[14-15]
      No further processing required for -> n14
    Current statistics:
      Currently processing -> n15
      Nodes with a valid loaded image -> n[14-15]
      No further processing required for -> n14
    Processing completed for -> n15
    Current statistics:
      Nodes with a valid loaded image -> n[14-15]
      No further processing required for -> n[14-15]
    startsys process exiting with code 0
  3. Enter the following command to boot the client nodes on systems with more than 300 nodes because the nodes were not booted during their imaging operation:

    # startsys --boot_group_delay=240
    Note:

    The --boot_group_delay=240 option is only used the first time nodes are booted after being imaged; the value 240 specifies the number of seconds to wait between groups of nodes as they are booting.

  4. Run the SLURM postconfiguration utility to update the slurm.conf file with the correct processor count and memory size. Skip this step if you installed standard LSF.

    # /opt/hptc/sbin/spconfig
    Detected the Quadrics Elan interconnect; configuring SLURM support...
    The new 'elanhosts' file contains entries for 3 nodes...
    Configured nodes n[1-15] with 2 CPUs and 8108 MB of total memory...
    Configured node n16 with 4 CPUs and 9549 MB of total memory...
    Restarting SLURM...
    SLURM Post-Configuration Done.
    
    Note:

    In this example, the spconfig utility was run on a system with a QsNetII interconnect. Your output varies depending upon the type of interconnect in use.

    If your system is using a QsNetII interconnect, ensure that the number of node entries in the /opt/hptc/libelanhosts/etc/elanhosts file matches the expected number of operational nodes in the cluster. If the number does not match, verify the status of the nodes to ensure that they are all up and running and rerun the spconfig script.

    Output from the spconfig utility looks similar to the following for all other interconnect types:

    Configured unknown node n14 with 1 CPU and 4872 MB of total memory...
    Restarting SLURM...
    SLURM Post-Configuration Done.
  5. Complete this step for systems installed with an InfiniBand interconnect; skip this step for all other interconnect types.

    Create a list of all nodes in the HP XC system. The InfiniBand diagnostic tools requires this list of nodes in order to function properly:

    # shownode all > /usr/voltaire/scripts/HCA400-Checks/node-list

Proceed to “Task 10: Confirm Compute Resources” to confirm compute resources.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.