Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: User's Guide > Chapter 10 Using LSF-HPC

Getting Information About Jobs

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

There are several ways you can get information about a specific job after it has been submitted to LSF-HPC integrated with SLURM. This section briefly describes some of the commands that are available under LSF-HPC integrated with SLURM to gather information about a job. This section is only intended to give you an idea of the commonly used commands and to describe any differences there may be in their operation in the HP XC environment, not as a complete reference on this topic. See the LSF manpages for full information about the commands described in this section.

The following LSF commands are described in this section:

Getting Job Allocation Information

Before a job runs, LSF-HPC integrated with SLURM allocates SLURM compute nodes based on job resource requirements.

After LSF-HPC integrated with SLURM allocates nodes for a job, it attaches allocation information to the job.

The bjobs -l command provides job allocation information on running jobs. The bhist -l command provides job allocation information for a finished job. For details about using these commands, see the LSF manpages .

A job allocation information string resembles the following:

slurm_id=slurm_jobid;ncpus=slurm_nprocs;slurm_alloc=node_list

This allocation string has the following values:

slurm_id

SLURM_JOBID environment variable. This is SLURM allocation ID (Associates LSF-HPC job with SLURM allocated resources.)

ncpus

SLURM_NPROCS environment variable. This the actual number of allocated cores. Under node-level allocation scheduling, this number may be bigger than what the job requests.)

slurm_alloc

A comma separated list of allocated nodes.

LSF-HPC integrated with SLURM sets the SLURM_JOBID and SLURM_NPROCS environment variables, when it starts a job.

Example 10-3 illustrates how to use the the bjobs -l command to obtain job allocation information about a running job:

Example 10-3 Job Allocation Information for a Running Job

$ bjobs -l 24
Job <24>, User <lsfadmin>, Project <default>,
                     Status <RUN>, Queue <normal>,
                     Interactive pseudo-terminal shell mode,
                     Extsched <SLURM[nodes=4]>, Command </bin/bash>

date and time stamp: Submitted from host <n2>, CWD <$HOME>,
                     4 Processors Requested, Requested Resources <type=any>;
date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];

SCHEDULING PARAMETERS:
           r15s  r1m  r15m  ut    pg  io  ls  it  tmp  swp  mem
loadSched   -    -     -    -      -   -   -   -    -   -    - 
loadStop    -    -     -    -      -   -   -   -    -   -    -
  
EXTERNAL MESSAGES:
     MSG_ID  FROM       POST_TIME     MESSAGE          ATTACHMENT
     0        -             -             -             -
     1       lsfadmin   date and time stamp  SLURM[nodes=4]    N

In particular, note the node and job allocation information provided in the above output:

date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];

Example 10-4 illustrates how to use the output obtained using the bhist -l command to obtain job allocation information about a job that has run:

Example 10-4 Job Allocation Information for a Finished Job

$ bhist -l 24
Job <24>, User <lsfadmin>, Project <default>,
                     Interactive pseudo-terminal shell mode,
                     Extsched <SLURM[nodes=4]>, Command </bin/bash>

date and time stamp: Submitted from host <n2>,                                        to Queue <normal>, CWD <$HOME>,
                     4 Processors Requested, Requested Resources <type=any>;
  
date and time stamp: Dispatched to 4 Hosts/Processors
                     <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];date and time stamp: Starting (Pid 4785);
date and time stamp: Done successfully. 
The CPU time used is 0.1 seconds;
date and time stamp: Post job process done successfully;
 
Summary of time in seconds spent in various states by date and time stamp
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN   TOTAL
  11       0        220      0        0        0       231

In particular, note the node and job allocation information provided in the above output:

date and time stamp: Dispatched to 4 Hosts/Processors 
                     <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];

Examining the Status of a Job

Once a job is submitted, you can use the bjobs command to track the job's progress. The bjobs command reports the status of a job submitted to LSF-HPC. By default, bjobs lists only the user's jobs that have not finished or exited.

The following are examples of bjobs command usage, and show the output it produces on an HP XC system. For more information about the bjobs command and its output, see the LSF-HPC manpages.

Example 10-5 provides abbreviated output of the bjobs command.

Example 10-5 Using the bjobs Command (Short Output)

$ bjobs 24
JOBID USER   STAT QUEUE  FROM_HOST EXEC_HOST           JOB_NAME  SUBMIT_TIME
24    msmith RUN  normal n16       lsfhost.localdomain /bin/bash date and time

As shown in the previous output, the bjobs command returns information that includes the job id, user name, job status, queue name, submitting host, executing host, job name, and submit time. In this example, the output shows that job /bin/bash was submitted from node n16 and launched on the execution host (lsfhost.localdomain).

Example 10-6 provides extended output of the bjobs command.

Example 10-6 Using the bjobs Command (Long Output)

$ bjobs -l 24
Job <24>, User <msmith>,Project <default>,Status <RUN>,
                     Queue <normal>, Interactive pseudo-terminal shell
                     mode, Extsched <SLURM[nodes=4]>, Command </bin/bash>
date and time stamp: Submitted from host <n16>, CWD <$HOME>,
                     4 Processors Requested, Requested Resources <type=any>;

date and time stamp: Started on 4 Hosts/Processors
                     <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];
 
SCHEDULING PARAMETERS:
           r15s  r1m  r15m   ut  pg  io  ls  it  tmp  swp  mem 
loadSched   -     -    -     -    -   -   -   -   -    -    - 
loadStop    -     -    -     -    -   -   -       -    -    -  

EXTERNAL MESSAGES:
      MSG_ID  FROM       POST_TIME     MESSAGE          ATTACHMENT
      0        -             -             -             -
      1       lsfadmin   date and time stamp  SLURM[nodes=4]    N

Viewing the Historical Information for a Job

The LSF bhist command is a good tool for tracking the lifetime of a job within LSF-HPC. The bhist command provides detailed information about a job, including running, pending, and suspended jobs, information such as the amount of time spent in various states, and in-depth information about a job's progress while the job was under LSF-HPC control.

See the LSF bhist manpage for more information about this command, its options, and its output.

A brief summary about a finished job can be obtained with the bhist command, shown in Example 10-7. This command provides statistics about the amount of time that the job has spent in various states.

Example 10-7 Using the bhist Command (Short Output)

$ bhist 24
Summary of time in seconds spent in various states: 
JOBID  USER   JOB_NAME  PEND  PSUSP RUN  USUSP SSUSP UNKWN TOTAL
24     smith  bin/bash  11     0    220   0    0     0     231

The information in the output provided by this example is explained in Table 10-2.

Table 10-2 Output Provided by the bhist Command

FieldDescription
JOBIDThe job ID that LSF-HPC assigned to the job.
USERThe user who submitted the job.
JOB_NAMEThe job name assigned by the user.
PENDThe total waiting time, excluding user suspended time, before the job is dispatched.
PSUSPThe total user suspended time of a pending job.
RUNThe total run time of the job.
USUSPThe total user suspended time after the job is dispatched.
SSUSPThe total system suspended time after the job is dispatched.
UNKWNThe total unknown time of the job.
TOTALThe total time that the job has spent in all states.

 

For detailed information about a finished job, add the -l option to the bhist command, shown in Example 10-8. The -l option specifies that the long format is requested.

Example 10-8 Using the bhist Command (Long Output)

$ bhist -l 24
Job <24>, User <lsfadmin>, Project <default>,
Interactive pseudo-terminal shell mode, 
Extsched <SLURM[nodes=4]>, Command </bin/bash>
date and time stamp: Submitted from host <n2>,
to Queue <normal>, CWD <$HOME>, 
4 Processors Requested, Requested Resources <type=any>;

date and time stamp: Dispatched to 4 Hosts/Processors
                     <4*lsfhost.localdomain>;
date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];
date and time stamp: Starting (Pid 4785); 

Summary of time in seconds spent in various states by
date and time stamp
  PEND     PSUSP    RUN     USUSP   SSUSP   UNKWN   TOTAL
  11       0        124     0       0       0       135
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.