Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: User's Guide > Chapter 2 Using the System

Overview of Launching and Managing Jobs

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This section provides a brief description of some of the many ways to launch jobs, manage jobs, and get information about jobs on an HP XC system. This section is intended only as a quick overview about some basic ways of running and managing jobs. Full information and details about the HP XC job launch environment are provided in “Using SLURM”) and the LSF-HPC section of “Using LSF-HPC”) of this document.

Introduction

As described in “Run-Time Environment”, SLURM and LSF-HPC cooperate to run and manage jobs on the HP XC system, combining LSF-HPC's powerful and flexible scheduling functionality with SLURM's scalable parallel job-launching capabilities.

SLURM is the low-level resource manager and job launcher, and performs core allocation for jobs. LSF-HPC gathers information about the cluster from SLURM. When a job is ready to be launched, LSF-HPC creates a SLURM node allocation and dispatches the job to that allocation.

Although you can launch jobs directly using SLURM, HP recommends that you use LSF-HPC to take advantage of its scheduling and job management capabilities. You can add SLURM options to the LSF-HPC job launch command line to further define job launch requirements. Use the HP-MPI mpirun command and its options within LSF-HPC to launch jobs that require MPI's high-performance message-passing capabilities.

When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF-HPC jobs. This partition is called the lsf partition.

When a job is submitted to LSF-HPC, the LSF-HPC scheduler prioritizes the job and waits until the required resources (compute nodes from the lsf partition) are available.

When the requested resources are available for the job, LSF-HPC creates a SLURM allocation of nodes on behalf of the user, sets the SLURM JobID for the allocation, and dispatches the job with the LSF-HPC JOB_STARTER script to the first allocated node.

A detailed explanation of how SLURM and LSF-HPC interact to launch and manage jobs is provided in “How LSF-HPC and SLURM Launch and Manage a Job”.

Getting Information About Queues

The LSF bqueues command lists the configured job queues in LSF-HPC. By default, bqueues returns the following information about all queues:

  • Queue name

  • Queue priority

  • Queue status

  • Job slot statistics

  • Job state statistics

To get information about queues, enter the bqueues as follows:

$ bqueues

For more information about using this command and a sample of its output, see “Examining System Queues”

Getting Information About Resources

The LSF bhosts, lshosts, and lsload commands are quick ways to get information about system resources. LSF-HPC daemons run on only one node in the HP XC system, so the bhosts and lshosts commands will list one host — which represents all the resources of the HP XC system. The total number of cores for that host should be equal to the total number of cores assigned to the SLURM lsf partition.

  • The LSF bhosts command provides a summary of the jobs on the system and information about the current state of LSF-HPC.

    $ bhosts

    For more information about using this command and a sample of its output, see “Examining System Core Status”

  • The LSF lshosts command displays machine-specific information for the LSF execution host node.

    $ lshosts

    For more information about using this command and a sample of its output, see “Getting Information About the LSF Execution Host Node” .

  • The LSF lsload command displays load information for the LSF execution host node.

    $ lsload

    For more information about using this command and a sample of its output, see “Getting Host Load Information”.

Getting Information About System Partitions

You can view information about system partitions with the SLURM sinfo command. The sinfo command reports the state of all partitions and nodes managed by SLURM and provides a wide variety of filtering, sorting, and formatting options. The sinfo command displays a summary of available partition and node (not job) information, such as partition names, nodes per partition, and cores per node).

$ sinfo

For more information about using the sinfo command and a sample of its output, see “Getting Information About the lsf Partition” .

Launching Jobs

To launch a job on an HP XC system, use the LSF bsub command. The bsub command submits batch jobs or interactive batch jobs to an LSF-HPC queue for execution.

See “Submitting Jobs” for full information about launching jobs.

Getting Information About Your Jobs

Use the LSF bjobs and bhist commands to obtain information about your running or completed jobs:

You can view the components of the actual SLURM allocation command with the LSF bjobs -l and bhist -l commands.

Stopping and Suspending Jobs

You can suspend or stop your jobs with the bstop and bkill commands:

  • Use the bstop command to stop or suspend an LSF-HPC job.

  • Use the bkill command to kill an LSF-HPC job.

Resuming Suspended Jobs

Use the LSF bresume command to resume a stopped or suspended job.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.