| United States-English |
|
|
|
![]() |
HP XC System Software: User's Guide > Chapter 5 Submitting JobsSubmitting a Parallel Job |
|
When submitting a parallel job, you can specify the use of HP-MPI. You can also opt to schedule the job by using SLURM. Depending on which submission method you choose, read the appropriate sections, as follows: Use the following format of the LSF bsub command to submit a parallel job that does not make use of HP-MPI to an LSF-HPC node allocation (compute nodes). An LSF-HPC node allocation is created by the -n num-procs parameter, which specifies the minimum number of cores the job requests.
The bsub command submits the job to LSF-HPC. The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores requested for the job. The num-procs parameter may be expressed as minprocs[,maxprocs] where minprocs specifies the minimum number of cores and the optional value maxprocs specifies the maximum number of cores. The SLURM srun command is required to run jobs on an LSF-HPC node allocation. The srun command is the user job launched by the LSF bsub command. SLURM launches the jobname in parallel on the reserved cores in the lsf partition. The jobname parameter is the name of an executable file or command to be run in parallel. Example 5-5 illustrates a non-MPI parallel job submission. The job output shows that the job “srun hostname” was launched from the LSF execution host lsfhost.localdomain, and that it ran on 4 cores from the compute nodes n1 and n2. Example 5-5 Submitting a Non-MPI Parallel Job
You can use the LSF-SLURM external scheduler to specify additional SLURM options on the command line. As shown in Example 5-6, it can be used to submit a job to run one task per compute node (on SMP nodes): Use the following format of the LSF bsub command to submit a parallel job that makes use of HP-MPI:
The bsub command submits the job to LSF-HPC. The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores requested for the job. The mpijob argument has the following format: See the mpirun(1) manpage for more information on this command. The mpirun command's -srun option is required if the MPI_USESRUN environment variable is not set or if you want to use additional srun options to execute your job. The srun command, used by the mpirun command to launch the MPI tasks in parallel in the lsf partition, determines the number of tasks to launch from the SLURM_NPROCS environment variable that was set by LSF-HPC; this environment variable is equivalent to the number provided by the -n option of the bsub command. Any additional SLURM srun options are job specific, not allocation-specific. The mpi-jobname is the executable file to be run. The mpi-jobname must be compiled with the appropriate HP-MPI compilation utility. For more information, see the section titled Compiling applications in the HP-MPI User's Guide. Example 5-7 shows an MPI job that runs a hello world program on 4 cores on 2 compute nodes. Example 5-7 Submitting an MPI Job
You can use the LSF-SLURM External Scheduler option to add capabilities at the job level and queue level by including several SLURM options in the command line. For example, you can use it to submit a job to run one task per node, or to submit a job to run on specific nodes. “LSF-SLURM External Scheduler” discusses this option. Example 5-8 shows an MPI job that uses the LSF-SLURM External Scheduler option to run the same hello world program on each of 4 compute nodes. Example 5-8 Submitting an MPI Job with the LSF-SLURM External Scheduler Option
Some preprocessing may need to be done:
This preprocessing should determine the node host names to which mpirun's standard task launcher should launch the tasks. In such scenarios, you need to write a batch script; there are several methods available for determining the nodes in an allocation. One method is to use the SLURM_JOBID environment variable with the squeue command to query the nodes. Another method is to use the LSF-HPC environment variables such as LSB_HOSTS and LSB_MCPU_HOSTS, which are prepared by the HP XC job starter script. The LSF-SLURM external scheduler option provides additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options in the LSF-HPC command line. With LSF-HPC integrated with SLURM, you can use the LSF-SLURM External Scheduler to specify SLURM options that specify the minimum number of nodes required for the job, specific nodes for the job, and so on. The format of this option is shown here:
The bsub command format to submit a parallel job to an LSF-HPC allocation of compute nodes using the external scheduler option is as follows: bsub -n num-procs -ext "SLURM[slurm-arguments]" [bsub-options] The slurm-arguments parameter can be one or more of the following srun options, separated by semicolons, as described in Table 5-1. Table 5-1 Arguments for the SLURM External Scheduler
The Platform Computing LSF-HPC documentation provides more information on general external scheduler support. Consider an HP XC system configuration in which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two cores, providing 20 cores for use by LSF-HPC jobs. Example 5-9 shows one way to submit a parallel job to run on a specific node or nodes. Example 5-9 Using the External Scheduler to Submit a Job to Run on Specific Nodes
In the previous example, the job output shows that the job was launched from the LSF execution host lsfhost.localdomain, and it ran on four cores on the specified nodes, n6 and n8. Example 5-10 shows one way to submit a parallel job to run one task per node. Example 5-10 Using the External Scheduler to Submit a Job to Run One Task per Node
In the previous example, the job output shows that the job was launched from the LSF execution host lsfhost.localdomain, and it ran on four cores on four different nodes (one task per node). Example 5-11 shows one way to submit a parallel job to avoid running on a particular node. Example 5-11 Using the External Scheduler to Submit a Job That Excludes One or More Nodes
This example runs the job exactly the same as in Example 5-10 “Using the External Scheduler to Submit a Job to Run One Task per Node”, but additionally requests that node n3 is not to be used to run the job. Note that this command could have been written to exclude additional nodes. Example 5-12 launches the hostname command once on nodes n1 through n10 (n[1-10]): Example 5-12 Using the External Scheduler to Launch a Command in Parallel on Ten Nodes
Example 5-13 launches the hostname command on 10 cores on nodes with a dualcore SLURM feature assigned to them: Example 5-13 Using the External Scheduler to Constrain Launching to Nodes with a Given Feature
You can use the bqueues command to determine the SLURM scheduler options that apply to jobs submitted to a specific LSF-HPC queue, for example:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||