 |
» |
|
|
 |
Follow this procedure to verify that compute resources are functioning properly: Begin this procedure as the root user on the head node. Set up the LSF environment by sourcing the LSF profile file: # . /opt/hptc/lsf/top/conf/profile.lsf |
Verify that the LSF profile file has been sourced by finding an LSF command: # which lsid
/opt/hptc/lsf/top/6.1/linux2.6-glibc2.3-amd64-slurm/bin/lsid |
If SLURM is configured, verify that the lsf partition exists: # sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 3 idle n[14-16] |
Wait a few seconds for the LSF daemons to stabilize, then run the following commands to confirm that licensing is correct, the correct number of available processors are listed (it should match the number of processors in the lsf partition), and that the status of the system is shown as ok. Command output looks different depending upon which type of LSF is installed and configured: LSF-HPC with SLURM: Verify that LSF-HPC with SLURM is running: # lsid
Platform LSF HPC 6.1 for SLURM, LSF_build_date
Copyright 1992-2005 Platform Computing Corporation
My cluster name is hptclsf
My master name is lsfhost.localdomain |
Verify the static resource information: # lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
lsfhost.loc SLINUX6 Opteron8 60.0 6 1M - Yes (slurm) |
Verify the dynamic resource information: # bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
lsfhost.localdomai ok - 6 0 0 0 0 0 |
See the troubleshooting information in the HP XC System Software Administration Guide if you do not receive a status of ok from the bhosts command.
Standard LSF: Verify that standard standard LSF is running: # lsid
Platform LSF 6.1, LSF_build_date
Copyright 1992-2005 Platform Computing Corporation
My cluster name is hptclsf
My master name is n13
[root@xc5n16 ~]# lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
n13 LINUX64 Itanium2 16.0 2 4036M 6144M Yes ()
n16 LINUX64 Itanium2 16.0 2 4036M 6143M Yes ()
n1 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n2 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n3 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n4 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n5 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n6 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n7 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n8 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n9 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n10 LINUX64 Itanium2 16.0 2 977M 6144M Yes ()
n11 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
n12 LINUX64 Itanium2 16.0 2 3012M 6144M Yes ()
|
Verify the dynamic resource information: # bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
n1 ok - 2 0 0 0 0 0
n10 ok - 2 0 0 0 0 0
n11 ok - 2 0 0 0 0 0
n12 ok - 2 0 0 0 0 0
n13 ok - 2 0 0 0 0 0
n16 ok - 2 0 0 0 0 0
n2 ok - 2 0 0 0 0 0
n3 ok - 2 0 0 0 0 0
n4 ok - 2 0 0 0 0 0
n5 ok - 2 0 0 0 0 0
n6 ok - 2 0 0 0 0 0
n7 ok - 2 0 0 0 0 0
n8 ok - 2 0 0 0 0 0
n9 ok - 2 0 0 0 0 0
|
See the troubleshooting information in the HP XC System Software Administration Guide if you do not receive a status of ok from the bhosts command.
For more information about where to find LSF-HPC with SLURM or standard LSF documentation, see the Preface at the beginning of this document .
|