Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software : Administration Guide > Chapter 15 Managing LSF

Monitoring and Controlling LSF-HPC with SLURM Jobs

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

All the standard LSF commands for monitoring a job are supported. The bjobs command reports the status of a job. The following is an example of the bjobs command:

$ bjobs
  JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME SUBMIT_TIME
  116     lsfadmi RUN   normal     lsfhost.loc 8*lsfhost.l * sleep 50 date time

You can use the -l (long) option to obtain detailed information about a job, as shown in this example:

$ bjobs -l 116
  Job <116>, User <lsfadmin>, Project default, Status <RUN>, Queue <normal>, Co
             mmand <srun sleep 50>
  date time: Submitted from host <lsfhost.localdomain>, CWD <$HOME>, Ou
             tput File <./>, 8 Processors Requested;
  date time: Started on 8 Hosts/Processors <8*lsfhost.localdomain>, Exe
             cution Home <hptc_cluster/hptc_cluster>, Execution CWD <hptc/hptc;
             _cluster/lsf/home>;
  date time: slurm_id=7;ncpus=8;slurm_alloc=n[1-4];

   SCHEDULING PARAMETERS:
             r15s   r1m  r15m   ut      pg    io   ls    it    tmp   swp   mem
   loadSched   -     -     -     -       -     -    -     -     -     -     -  
   loadStop    -     -     -     -       -     -    -     -     -     -     -  

Note the output that identifies the SLURM_JOBID and the SLURM allocation:

date time: slurm_id=7;ncpus=8;slurm_alloc=n[1-4];

You can use the SLURM_JOBID with various SLURM commands, for example, use the squeue command to view information about jobs in the SLURM scheduling queue and use the scontrol show command to display the state of the job.

$ squeue -j 7
    JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST
        7       lsf hptclsf@ lsfadmin   R       0:14      4 n[1-4]
$ scontrol show job 7
  JobId=7 UserId=lsfadmin(502) GroupId=lsfadmin(503)
     Name=LSFclustername@LSF_JOBID JobState=RUNNING
     Priority=4294901755 Partition=lsf BatchFlag=0
     AllocNode:Sid=n16:27450 TimeLimit=UNLIMITED
     StartTime=10/11-17:54:05 EndTime=NONE
     NodeList=n[1-4] NodeListIndecies=0,3,-1
     ReqProcs=0 MinNodes=0 Shared=0 Contiguous=0
     MinProcs=0 MinMemory=0 Features=(null) MinTmpDisk=0
     ReqNodeList=(null) ReqNodeListIndecies=-1
     ExcNodeList=(null) ExcNodeListIndecies=-1

The NAME= output of the scontrol show command returns the name of the LSF cluster (the installation default is hptclsf) and the LSF-HPC with SLURM job number, separated by the at character (@).

The bhist command reports the history of a job.

After you have gathered information about a job, you can use other useful LSF commands to control LSF-HPC with SLURM jobs: bkill, bstop, and bresume.

The bkill command kills a running job. This command uses the SLURM scancel command.

The bstop command suspends the execution of a running job.

The bresume command resumes the execution of a suspended job.

For more information, see bkill(1), bstop(1), and bresume(1).

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.