 |
» |
|
|
 |
Perform the following tasks only if you configured
SLURM or LSF or both; omit this task if neither SLURM nor LSF are
configured. Perform SLURM Postconfiguration Tasks |  |
If SLURM was configured, follow this procedure to finish the configuration
process: Begin this procedure as the
root user on the head node. Use the SLURM postconfiguration
utility to update the slurm.conf file with the
correct processor count and memory size. If a QsNetII interconnect switch is present, an elanhosts file is created with an entry for each compute node. The elanhosts file is part of the QsNetII switch support that is provided by SLURM. # spconfig
Detected the Quadrics Elan interconnect; configuring SLURM support...
The new 'elanhosts' file contains entries for 16 nodes...
Configured nodes n[1-15] with 2 CPUs and 8108 MB of total memory...
Configured node n16 with 4 CPUs and 9549 MB of total memory...
Restarting SLURM...
SLURM Post-Configuration Done. |
In this example, the spconfig utility was run on a system with a QsNetII interconnect. For all other interconnect types, the first two lines
of the command output are not displayed. If the system is using a QsNetII interconnect, ensure that the number of node entries
in the /opt/hptc/libelanhosts/etc/elanhosts file
matches the expected number of operational nodes in the cluster. If
the number does not match, verify the status of the nodes to ensure
that they are all up and running, and re-run the spconfig utility. If the number does not match, it is possible that the QsNetII network card on the missing node is not fully operational. Follow this procedure to add
customizations to the SLURM configuration file: Go to Appendix J to determine
the type of customizations that are available or required. For instance,
if you installed and configured SVA, SVA requires
certain SLURM customizations. Use the text editor of your
choice to edit the SLURM configuration file: /hptc_cluster/slurm/etc/slurm.conf |
Use the information in Appendix J to customize
the SLURM configuration according to your requirements. If you make changes to the slurm.conf file, save your changes and exit the text editor. Update the SLURM daemons with
this new information:
If some nodes are reported as being in the down state, see “Troubleshooting SLURM” for more information. Perform LSF Postconfiguration Tasks |  |
Follow this procedure to set up the LSF environment and
enable LSF failover (if you assigned the resource_management role to two or more nodes). Omit this task if you did not configure
LSF. Begin this procedure as the
root user on the head node. Set up the LSF environment
by sourcing the LSF profile file: # . /opt/hptc/lsf/top/conf/profile.lsf |
Verify that the LSF profile
file has been sourced by finding an LSF command: # which lsid
/opt/hptc/lsf/top/6.2/linux2.6-glibc2.3-x86_64-slurm/bin/lsid |
This sample output was obtained from an HP ProLiant
server. Thus, the directory name linux2.6-glibc2.3-x86_64-slurm is included in the path (the string x86_64 signifies
a Xeon- or Opteron-based architecture). The string ia64 is included in the directory name for HP Integrity servers. The
string slurm exists in the path only if LSF-HPC with SLURM is
configured. If you assigned two or more
nodes with the resource_management role and want
to enable LSF failover, enter the following command; otherwise, proceed
to step 5. # controllsf enable failover |
Determine the node on which
the LSF daemons are running: # controllsf show current
LSF is currently running on node n32, and assigned to node n32 |
Log in to the node that is
running LSF if it is not running on the head node. If LSF is running
on the head node, omit this step. Restart the LIM daemon: # lsadmin limrestart
Checking configuration files ...
No errors found.
Restart LIM on <lsfhost.localdomain> ...... done |
Restarting the LIM daemon is required because
the licensing of LSF-HPC with SLURM occurs when the LIM daemon is started.
This means that the LIM daemon is licensed only for the processors
that are actually available at that time, which might be fewer than
the total number of processors available after all of the nodes have
been imaged and are up and running. Update the LSF batch system
with the latest resource information reported by SLURM: # badmin reconfig
Checking configuration files ...
No errors found.
Reconfiguration initiated |
|