Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: Installation Guide > Chapter 3 Configuring and Imaging the System

Task 17: Finalize the Configuration of Compute Resources

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Perform the following tasks only if you configured SLURM or LSF or both; bypass this task if neither SLURM nor LSF are configured.

Perform SLURM Postconfiguration Tasks

If SLURM was configured, follow this procedure to finish configuring SLURM:

  1. Begin this procedure as the root user on the head node.

  2. Use the SLURM postconfiguration utility to update the slurm.conf file with the correct processor count and memory size. If a QsNetII interconnect switch is present, an elanhosts file is created with an entry for each compute node. The elanhosts file is part of the QsNetII switch support that is provided by SLURM.

    # spconfig
    Detected the Quadrics Elan interconnect; configuring SLURM support...
      The new 'elanhosts' file contains entries for 16 nodes...
      Configured nodes n[1-15] with 2 CPUs and 8108 MB of total memory...
      Configured node n16 with 4 CPUs and 9549 MB of total memory...
      Restarting SLURM...
      SLURM Post-Configuration Done.

    In this example, the spconfig utility was run on a system with a QsNetII interconnect. For all other interconnect types, the first two lines of the command output are not displayed.

    NOTE: If a compute node did not boot up, the spconfig utility configures the node as follows:
    Configured unknown node n14 with 1 CPU and 1 MB of total memory... 

    After the node has been booted up, re-run the spconfig utility to configure the correct settings.

  3. If the system is using a QsNetII interconnect, ensure that the number of node entries in the /opt/hptc/libelanhosts/etc/elanhosts file matches the expected number of operational nodes in the cluster. If the number does not match, verify the status of the nodes to ensure that they are all up and running, and re-run the spconfig utility. If the number does not match, it is possible that the QsNetII network card on the missing node is not fully operational.

  4. Edit the SLURM configuration file to add required customizations to the SLURM configuration and run the scontrol reconfig command to update the SLURM daemons with this new information.

    Assigning features to nodes is a common customization that is useful if the compute resources of the cluster are not consistent.

    For example, if compute nodes n[1-20] have significantly more memory than the rest of the nodes, assign a bigmem feature to nodes n[1-20], and assign a lowmem feature to the rest of the nodes. To do this, add Features to the NodeName entries to look similar to this:

    NodeName=n[1-20]  Procs=2 RealMemory=4096 Features=bigmem
    NodeName=n[21-46] Procs=2 RealMemory=2048 Features=lowmem

    Another SLURM customization you might want to make is to create two distinguishable sets of nodes, such as a prod set of nodes for production use and a test set of nodes for testing use.

    In general, when you assign features to SLURM nodes, the nodes can be used in job submissions to request that the job run within a specific set of nodes. These features are recognized by LSF-HPC with SLURM and can be used in LSF queue definitions as well as user job submissions. For more information, see the HP XC System Software Administration Guide and HP XC System Software User's Guide.

    A second common customization to the SLURM configuration is to create additional SLURM partitions. By default, an lsf partition is created with settings that prevent non-root users from requesting node allocations directly from SLURM. You might want to move a small set of nodes from the lsf partition into a new, more user-accessible partition.

    For more information about configuring SLURM partitions, see slurm.conf(8) and the comments in the slurm.conf file.

    In summary, follow this procedure to make SLURM customizations:

    1. Use the text editor of your choice to edit the SLURM configuration file:

       /hptc_cluster/slurm/etc/slurm.conf
    2. Use the guidelines already described in this step to customize the SLURM configuration according to your requirements.

    3. If you make changes to the slurm.conf file, save your changes.

    4. Update the SLURM daemons with this new information.

      # scontrol reconfig

Perform LSF Postconfiguration Tasks

Follow this procedure to set up the LSF environment and enable LSF failover (if you assigned the resource_management role to two or more nodes). Bypass this task if you did not configure LSF.

  1. Begin this procedure as the root user on the head node.

  2. Set up the LSF environment by sourcing the LSF profile file:

    # . /opt/hptc/lsf/top/conf/profile.lsf
  3. Verify that the LSF profile file has been sourced by finding an LSF command:

    # which lsid
    /opt/hptc/lsf/top/6.2/linux2.6-glibc2.3-x86_64-slurm/bin/lsid

    This sample output was obtained from an HP ProLiant server. Thus, the directory name linux2.6-glibc2.3-x86_64-slurm is included in the path (the string x86_64 signifies a Xeon- or Opteron-based architecture). The string ia64 is included in the directory name for HP Integrity servers. The string slurm exists in the path only if LSF-HPC with SLURM is configured.

    Remainder Applies to LSF-HPC with SLURM:

    The remainder of this procedure applies to LSF-HPC with SLURM. If standard LSF is configured, bypass the remaining steps.

  4. If you assigned two or more nodes with the resource_management role and want to enable LSF failover, enter the following command; otherwise, proceed to step 5.

    # controllsf enable failover
  5. Determine the node on which the LSF daemons are running:

    # controllsf show current
    LSF is currently running on node n32, and assigned to node n32
  6. Log in to the node that is running LSF if it is not running on the head node. If LSF is running on the head node, bypass this step.

    # ssh n32
  7. Restart the LIM daemon:

    # lsadmin limrestart
    Checking configuration files ...
       No errors found.
       
       Restart LIM on <lsfhost.localdomain> ...... done

    Restarting the LIM daemon is required because the licensing of LSF-HPC with SLURM occurs when the LIM daemon is started. This means that the LIM daemon is licensed only for the processors that are actually available at that time, which may be fewer than the total number of processors available after all of the nodes have been imaged and are up and running.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.