Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: User's Guide > Chapter 10 Using LSF-HPC

How LSF-HPC and SLURM Launch and Manage a Job

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This section describes what happens in the HP XC system when a job is submitted to LSF-HPC. Figure 10-1 illustrates this process. Use the numbered steps in the text and depicted in the illustration as an aid to understanding the process.

Consider the HP XC system configuration shown in Figure 10-1, in which lsfhost.localdomain is the virtual IP name assigned to the LSF execution host, node n16 is the login node, and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two cores, providing 20 cores for use by LSF-HPC jobs.

Figure 10-1 How LSF-HPC and SLURM Launch and Manage a Job

LSF-HPC and SLURM manage a job
  1. A user logs in to login node n16.

  2. The user executes the following LSF bsub command on login node n16:

    $ bsub -n4 -ext "SLURM[nodes=4]" -o output.out ./myscript

    This bsub command launches a request for four cores (from the -n4 option of the bsub command) across four nodes (from the -ext "SLURM[nodes=4]" option); the job is launched on those cores. The script, myscript, which is shown here, runs the job:

    #!/bin/sh
    hostname
    srun hostname
    mpirun -srun ./hellompi

  3. LSF-HPC schedules the job and monitors the state of the resources (compute nodes) in the SLURM lsf partition. When the LSF-HPC scheduler determines that the required resources are available, LSF-HPC allocates those resources in SLURM and obtains a SLURM job identifier (jobID) that corresponds to the allocation.

    In this example, four cores spread over four nodes (n1,n2,n3,n4) are allocated for myscript, and the SLURM job id of 53 is assigned to the allocation.

  4. LSF-HPC prepares the user environment for the job on the LSF execution host node and dispatches the job with the job_starter.sh script. This user environment includes standard LSF environment variables and two SLURM-specific environment variables: SLURM_JOBID and SLURM_NPROCS.

    SLURM_JOBID is the SLURM job ID of the job. Note that this is not the same as the LSF-HPC jobID. “Translating SLURM and LSF-HPC JOBIDs” describes the relationship between the SLURM_JOBID and the LSF-HPC JOBID.

    SLURM_NPROCS is the number of processes allocated.

    These environment variables are intended for use by the user's job, whether it is explicitly (user scripts may use these variables as necessary) or implicitly (the srun commands in the user’s job use these variables to determine its allocation of resources).

    The value for SLURM_NPROCS is 4 and the SLURM_JOBID is 53 in this example.

  5. The user job myscript begins execution on compute node n1.

    The first line in myscript is the hostname command. It executes locally and returns the name of node, n1.

  6. The second line in the myscript script is the srun hostname command. The srun command in myscript inherits SLURM_JOBID and SLURM_NPROCS from the environment and executes the hostname command on each compute node in the allocation.

  7. The output of the hostname tasks (n1, n2, n3, and n4). is aggregated back to the srun launch command (shown as dashed lines in Figure 10-1), and is ultimately returned to the srun command in the job starter script, where it is collected by LSF-HPC.

The last line in myscript is the mpirun -srun ./hellompi command. The srun command inside the mpirun command in myscript inherits the SLURM_JOBID and SLURM_NPROCS environment variables from the environment and executes hellompi on each compute node in the allocation.

The output of the hellompi tasks is aggregated back to the srun launch command where it is collected by LSF-HPC.

The command executes on the allocated compute nodes n1, n2, n3, and n4.

When the job finishes, LSF-HPC cancels the SLURM allocation, which frees the compute nodes for use by another job.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.