Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-MPI User's Guide > Chapter 3 Understanding HP-MPI

CPU binding

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The mpirun option -cpu_bind binds a rank to an ldom to prevent a process from moving to a different ldom after startup. The binding occurs before the MPI application is executed.

To accomplish this, a shared library is loaded at startup which does the following for each rank:

  • Spins for a short time in a tight loop to let the operating system distribute processes to CPUs evenly. This duration can be changed by setting the MPI_CPU_SPIN environment variable which controls the number of spins in the initial loop. Default is 3 seconds.

  • Determines the current CPU and ldom

  • Checks with other ranks in the MPI job on the host for oversubscription by using a "shm" segment created by mpirun and a lock to communicate with other ranks. If no oversubscription occurs on the current CPU, then lock the process to the ldom of that CPU. If there is already a rank reserved on the current CPU, then find a new CPU based on least loaded free CPUs and lock the process to the ldom of that CPU.

Similar results can be accomplished using "mpsched" but the procedure outlined above has the advantage of being a more load-based distribution, and works well in psets and across multiple machines.

HP-MPI supports CPU binding with a variety of binding strategies (see below). The option -cpu_bind is supported in appfile, command line, and srun modes.

% mpirun -cpu_bind[_mt]=[v,][option][,v] -np \ 4 a.out

Where _mt implies thread aware CPU binding; v, and ,v request verbose information on threads binding to CPUs; and [option] is one of:

rank — Schedule ranks on CPUs according to packed rank id.

map_cpu — Schedule ranks on CPUs in cyclic distribution through MAP variable.

mask_cpu — Schedule ranks on CPU masks in cyclic distribution through MAP variable.

ll — least loaded (ll) Bind each rank to the CPU it is currently running on.

For NUMA-based systems, the following options are also available:

ldom — Schedule ranks on ldoms according to packed rank id.

cyclic — Cyclic dist on each ldom according to packed rank id.

block — Block dist on each ldom according to packed rank id.

rr — round robin (rr) Same as cyclic, but consider ldom load average.

fill — Same as block, but consider ldom load average.

packed — Bind all ranks to same ldom as lowest rank.

slurm — slurm binding.

ll — least loaded (ll) Bind each rank to ldoms it is currently running on.

map_ldom — Schedule ranks on ldoms in cyclic distribution through MAP variable.

To generate the current supported options:

% mpirun -cpu_bind=help ./a.out

Environment variables for CPU binding:

  • MPI_BIND_MAP allows specification of the integer CPU numbers, ldom numbers, or CPU masks. These are a list of integers separated by commas (,).

  • MPI_CPU_AFFINITY is an alternative method to using -cpu_bind on the command line for specifying binding strategy. The possible settings are LL, RANK, MAP_CPU, MASK_CPU, LDOM, CYCLIC, BLOCK, RR, FILL, PACKED, SLURM, AND MAP_LDOM.

  • MPI_CPU_SPIN allows selection of spin value. The default is 2 seconds. This value is used to let processes busy spin such that the operating system schedules processes to processors. The the processes bind themselves to the appropriate processor, or core, or ldom as appropriate.

    For example, the following selects a 4 second spin period to allow 32 MPI ranks (processes) to settle into place and then bind to the appropriate processor/core/ldom.

    % mpirun -e MPI_CPU_SPIN=4 -cpu_bind -np\ 32 ./linpack

  • MPI_FLUSH_FCACHE Can be set to a threshold percent of memory (0-100) which, if the file cache currently in use meets or exceeds, initiates a flush attempt after binding and essentially before the user’s MPI program starts. Refer to MPI_FLUSH_FCACHE for more information.

  • MPI_THREAD_AFFINITY controls thread affinity. Possible values are:

    none — Schedule threads to run on all cores/ldoms. This is the default.

    cyclic — Schedule threads on ldoms in cyclic manner starting after parent.

    cyclic_cpu — Schedule threads on cores in cyclic manner starting after parent.

    block — Schedule threads on ldoms in block manner starting after parent.

    packed — Schedule threads on same ldom as parent.

    empty — No changes to thread affinity are made.

  • MPI_THREAD_IGNSELF When set to 'yes', parent is not included in scheduling consideration of threads across remaining cores/ldoms. This method of thread control can be used for explicit pthreads or OpenMP threads.

Three -cpu_bind options require the specification of a map/mask description. This allows for very explicit binding of ranks to processors. The three options are map_ldom, map_cpu, and mask_cpu.

Syntax:
-cpu_bind=[map_ldom,map_cpu,mask_cpu] [:<settings>, =<settings>, -e MPI_BIND_MAP=<settings>]

Examples:

-cpu_bind=MAP_LDOM -e MPI_BIND_MAP=0,2,1,3
# map rank 0 to ldom 0, rank 1 to ldom 2, rank 2 to ldom1 and rank 3 to ldom 3.

-cpu_bind=MAP_LDOM=0,2,3,1
# map rank 0 to ldom 0, rank 1 to ldom 2, rank 2 to ldom 3 and rank 3 to ldom 1.

-cpu_bind=MAP_CPU:0,6,5
# map rank 0 to cpu 0, rank 1 to cpu 6, rank 2 to cpu 5.

-cpu_bind=MASK_CPU:1,4,6
# map rank 0 to cpu 0 (0001), rank 1 to cpu 2 (0100), rank 2 to cpu 1 or 2 (0110).

A rank binding on a clustered system uses the number of ranks and the number of nodes combined with the rank count to determine the CPU binding. Cyclic or blocked launch is taken into account.

On a cell-based system with multiple users, the LL strategy is recommended rather than RANK. LL allows the operating system to schedule the computational ranks. Then the -cpu_bind capability locks the ranks to the CPU as selected by the operating system scheduler.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1979-2007 Hewlett-Packard Development Company, L.P.