 |
» |
|
|
 |
This section introduces the methods to run your HP-MPI application
on HP-UX and Linux. Using one of the mpirun methods is required. The examples below demonstrate
six basic methods. Refer to “mpirun ” for all the mpirun command line options. HP-MPI includes -mpi32 and -mpi64 options
for the launch utility mpirun on Opteron and Intel®64.
These options should be used to indicate the bitness of the application
to be invoked so that the availability of interconnect libraries
can be properly determined by the HP-MPI utilities mpirun and mpid. The default is -mpi64. There are six methods you can use to start your application,
depending on what kind of system you are using: Use mpirun with the -np # option and the name of your program. For example, % $MPI_ROOT/bin/mpirun -np 4 hello_world starts an executable file named hello_world with four processes. This is the recommended method
to run applications on a single host with a single executable file. Use mpirun with an appfile. For example, % $MPI_ROOT/bin/mpirun -f appfile where -f appfile specifies a text file (appfile) that is parsed
by mpirun and contains process counts and a list of programs.Although
you can use an appfile when you run a single executable file on
a single host, it is best used when a job is to be run across a cluster
of machines which does not have its own dedicated launching method
such as srun or prun (which are described below), or when using multiple executables.
For details about building your appfile, refer to “Creating
an appfile”. Use mpirun with -prun using the Quadrics Elan communication processor on Linux.
For example, % $MPI_ROOT/bin/mpirun [mpirun options] -prun \ <prun options> <program> <args> This method is only supported when linking with shared libraries. Some features like mpirun -stdio processing are unavailable. Rank assignments within HP-MPI are determined by the way prun chooses mapping at runtime. The -np option is not allowed with -prun.
The following mpirun options are allowed with -prun: % $MPI_ROOT/bin/mpirun [-help] [-version] [-jv] [-i <spec>] [-universe_size=#] [-sp <paths>] [-T] [-prot] [-spawn] [-1sided] [-tv] [-e var[=val]] -prun <prun options> <program> [<args>] For more information on prun usage: % man prun The following examples assume the system has the Quadrics
Elan interconnect and is a collection of 2-CPU nodes. % $MPI_ROOT/bin/mpirun -prun -N4 ./a.out will run a.out with 4 ranks, one per node, ranks are cyclically allocated. n00 rank1 n01 rank2 n02 rank3 n03 rank4 |
% $MPI_ROOT/bin/mpirun -prun -n4 ./a.out (assuming nodes have 2 processors/cores each) will run a.out with 4 ranks, 2 ranks per node, ranks are block allocated.
Two nodes used. n00 rank1 n00 rank2 n01 rank3 n01 rank4 |
Other forms of usage include allocating the nodes you wish
to use, which creates a subshell. Then jobsteps can be launched
within that subshell until the subshell is exited. % $MPI_ROOT/bin/mpirun -prun -A -N6 This allocates 6 nodes and creates a subshell. % $MPI_ROOT/bin/mpirun -prun -n4 -m block ./a.out This uses 4 ranks on 4 nodes from the existing allocation.
Note that we asked for block. n00 rank1 n00 rank2 n02 rank3 n03 rank4 |
Use mpirun with -srun on HP XC clusters. For example, % $MPI_ROOT/bin/mpirun <mpirun options> -srun \ <srun options> <program> <args> Some features like mpirun -stdio processing are unavailable. The -np option is not allowed with -srun.
The following options are allowed with -srun: % $MPI_ROOT/bin/mpirun [-help] [-version] [-jv] [-i <spec>] [-universe_size=#] [-sp <paths>] [-T] [-prot] [-spawn] [-tv] [-1sided] [-e var[=val]] -srun <srun options> <program> [<args>] For more information on srun usage: % man srun The following examples assume the system has the Quadrics
Elan interconnect, SLURM is configured to use Elan, and the system
is a collection of 2-CPU nodes. % $MPI_ROOT/bin/mpirun -srun -N4 ./a.out will run a.out with 4 ranks, one per node, ranks are cyclically allocated. n00 rank1 n01 rank2 n02 rank3 n03 rank4 |
% $MPI_ROOT/bin/mpirun -srun -n4 ./a.out will run a.out with 4 ranks, 2 ranks per node, ranks are block allocated.
Two nodes used. Other forms of usage include allocating the nodes you wish
to use, which creates a subshell. Then jobsteps can be launched
within that subshell until the subshell is exited. % srun -A -n4 This allocates 2 nodes with 2 ranks each and creates a subshell. % $MPI_ROOT/bin/mpirun -srun ./a.out This runs on the previously allocated 2 nodes cyclically. n00 rank1 n00 rank2 n01 rank3 n01 rank4
|
Use XC LSF
and HP-MPI HP-MPI jobs can be submitted using LSF. LSF uses the
SLURM srun launching mechanism. Because of this, HP-MPI jobs need
to specify the -srun option whether LSF is used
or srun is used. % bsub -I -n2 $MPI_ROOT/bin/mpirun -srun ./a.out LSF creates an allocation of 2 processors and srun attaches to it. % bsub -I -n12 $MPI_ROOT/bin/mpirun -srun -n6 \ -N6 ./a.out LSF creates an allocation of 12 processors and srun uses 1
CPU per node (6 nodes). Here, we assume 2 CPUs per node. LSF jobs can be submitted without the -I (interactive)
option. An alternative mechanism for achieving the one rank per node
which uses the -ext option to LSF: % bsub -I -n3 -ext "SLURM[nodes=3]" \ $MPI_ROOT/bin/mpirun -srun ./a.out The -ext option can also be used to specifically
request a node. The command line would look something like the following: % bsub -I -n2 -ext "SLURM[nodelist=n10]" mpirun -srun \ ./hello_world Job <1883> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Hello world! I'm 0 of 2 on n10 Hello world! I'm 1 of 2 on n10 |
Including and excluding specific nodes can be accomplished
by passing arguments to SLURM as well. For example, to make sure a job includes a specific
node and excludes others, use something like the following. In this
case, n9 is a required node and n10 is specifically excluded: % bsub -I -n8 -ext "SLURM[nodelist=n9;exclude=n10]" \ mpirun -srun ./hello_world Job <1892> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Hello world! I'm 0 of 8 on n8 Hello world! I'm 1 of 8 on n8 Hello world! I’m 6 of 8 on n12 Hello world! I’m 2 of 8 on n9 Hello world! I’m 4 of 8 on n11 Hello world! I’m 7 of 8 on n12 Hello world! I’m 3 of 8 on n9 Hello world! I’m 5 of 8 on n11 |
In addition to displaying interconnect selection information,
the mpirun -prot option can be used to verify that
application ranks have been allocated in the desired manner: % bsub -I -n12 $MPI_ROOT/bin/mpirun -prot -srun \ -n6 -N6 ./a.out Job <1472> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Host 0 -- ip 172.20.0.8 -- ranks 0 Host 1 -- ip 172.20.0.9 -- ranks 1 Host 2 -- ip 172.20.0.10 -- ranks 2 Host 3 -- ip 172.20.0.11 -- ranks 3 Host 4 -- ip 172.20.0.12 -- ranks 4 Host 5 -- ip 172.20.0.13 -- ranks 5host | 0 1 2 3 4 5 ======|=============================== 0 : SHM VAPI VAPI VAPI VAPI VAPI 1 : VAPI SHM VAPI VAPI VAPI VAPI 2 : VAPI VAPI SHM VAPI VAPI VAPI 3 : VAPI VAPI VAPI SHM VAPI VAPI 4 : VAPI VAPI VAPI VAPI SHM VAPI 5 : VAPI VAPI VAPI VAPI VAPI SHMHello world! I'm 0 of 6 on n8 Hello world! I'm 3of 6 on n11 Hello world! I’m 5 of 6 on n13 Hello world! I’m 4 of 6 on n12 Hello world! I’m 2 of 6 on n10 Hello world! I’m 1 of 6 on n9 |
Use LSF on
non-XC systems On non-XC systems, to invoke the Parallel Application
Manager (PAM) feature of LSF for applications where all processes
execute the same program on the same host: % bsub <lsf_options> pam -mpi mpirun \ <mpirun_options> program <args> In this case, LSF assigns a host to the MPI job. For example: % bsub pam -mpi $MPI_ROOT/bin/mpirun -np 4 compute_pi requests a host assignment from LSF and runs the compute_pi application with four processes. The load-sharing facility (LSF) allocates one or more hosts
to run an MPI job. In general, LSF improves resource utilization
for MPI jobs that run in multihost environments. LSF handles the
job scheduling and the allocation of the necessary hosts and HP-MPI
handles the task of starting up the application's processes on the
hosts selected by LSF. By default mpirun starts the MPI processes on the hosts specified
by the user, in effect handling the direct mapping of host names
to IP addresses. When you use LSF to start MPI applications, the
host names, specified to mpirun or implicit when the -h option is not used, are
treated as symbolic variables that refer to the IP addresses that LSF
assigns. Use LSF to do this mapping by specifying a variant of mpirun to execute your job. To invoke LSF for applications that run on multiple hosts: % bsub [lsf_options] pam -mpi mpirun [mpirun_options] -f appfile [-- extra_args_for_appfile] In this case, each host specified in the appfile is treated
as a symbolic name, referring to the host that LSF assigns to the
MPI job. For example: % bsub pam -mpi $MPI_ROOT/bin/mpirun -f my_appfile runs an appfile named my_appfile and requests host assignments
for all remote and local hosts specified in my_appfile. If my_appfile contains
the following items: -h voyager -np 10 send_receive -h enterprise -np 8 compute_pi |
Host assignments are returned for the two symbolic links voyager and
enterprise. When requesting a host from LSF, you must ensure that the
path to your executable file is accessible by all machines in the
resource pool.
More
information about appfile runs |  |
This example teaches you to run the hello_world.c application that you built in “Examples
of building on HP-UX and Linux” (above) using two hosts to achieve four-way parallelism.
For this example, the local host is named jawbone and a remote host
is named wizard. To run hello_world.c on two hosts, use the following procedure, replacing jawbone
and wizard with the names of your machines: Edit the .rhosts file on jawbone and wizard. Add an entry for wizard in the .rhosts file on jawbone and
an entry for jawbone in the .rhosts file on wizard. In addition
to the entries in the .rhosts file, ensure that the correct commands
and permissions are set up on all hosts so that you can start your
remote processes. Refer to “Setting
shell” for details. Ensure that the executable is accessible from each host
either by placing it in a shared directory or by copying it to a
local directory on each host. Create an appfile. An appfile is a text file that contains process counts and
a list of programs. In this example, create an appfile named my_appfile containing
the following two lines: -h jawbone -np 2 /path/to/hello_world -h wizard -np 2 /path/to/hello_world |
The appfile file should
contain a separate line for each host. Each line specifies the name
of the executable file and the number of processes to run on the
host. The -h option is followed by the name of the host where the
specified processes must be run. Instead of using the host name,
you may use its IP address. Run the hello_world executable file: % $MPI_ROOT/bin/mpirun -f my_appfile The -f option specifies the filename that follows it
is an appfile. mpirun parses the appfile, line by line, for the information
to run the program. In this example, mpirun runs the hello_world program with two processes on
the local machine, jawbone, and two processes on the remote machine, wizard,
as dictated by the -np 2 option on each line of the appfile. Analyze hello_world output. HP-MPI prints the output from running the hello_world executable
in non-deterministic order. The following is an example of the output: Hello world! I'm 2 of 4 on wizard Hello world! I'm 0 of 4 on jawbone Hello world! I'm 3 of 4 on wizard Hello world! I'm 1 of 4 on jawbone |
Notice that processes 0 and 1 run on jawbone, the local host,
while processes 2 and 3 run on wizard. HP-MPI guarantees that the
ranks of the processes in MPI_COMM_WORLD are assigned
and sequentially ordered according to the order the programs appear
in the appfile. The appfile in this example, my_appfile, describes
the local host on the first line and the remote host on the second
line.
Running
MPMD applications |  |
A multiple program multiple data (MPMD) application uses two
or more separate programs to functionally decompose a problem. This
style can be used to simplify the application source and reduce
the size of spawned processes. Each process can execute a different
program. To run an MPMD application, the mpirun command must reference an appfile that contains
the list of programs to be run and the number of processes to be
created for each program. A simple invocation of an MPMD application looks like this: % $MPI_ROOT/bin/mpirun -f appfile where appfile is the text file parsed by mpirun and contains a list of programs and process counts. Suppose you decompose the poisson application into two source
files: poisson_master (uses a single master process) and poisson_child
(uses four child processes). The appfile for the example application contains the two lines
shown below (refer to “Creating
an appfile” for details). -np 1 poisson_master -np 4 poisson_child To build and run the example application, use the following
command sequence: % $MPI_ROOT/bin/mpicc -o poisson_master poisson_master.c % $MPI_ROOT/bin/mpicc -o poisson_child poisson_child.c % $MPI_ROOT/bin/mpirun -f appfile See “Creating
an appfile” for
more information about using appfiles. prun also supports running applications with MPMD using
procfiles. Please refer to the prun documentation at http://www.quadrics.com. MPMD is not directly supported with srun. However, users can write custom wrapper scripts to their
application to emulate this functionality. This can be accomplished
by using the environment variables SLURM_PROCID and SLURM_NPROCS as
keys to selecting the appropriate executable. Modules
on Linux |  |
Modules are a convenient tool for managing environment settings
for various packages. HP-MPI for Linux provides an hp-mpi module at /opt/hpmpi/modulefiles/hp-mpi which sets MPI_ROOT and
adds to PATH and MANPATH. To use it, either copy this file to a
system-wide module directory, or append /opt/hpmpi/modulefiles to
the MODULEPATH environment variable. Some useful module-related commands are: - % module avail
what modules can be loaded - % module load hp-mpi
load the hp-mpi module - % module list
list currently loaded modules - % module unload hp-mpi
unload the hp-mpi module
Modules are only supported on Linux.  |  |  |  |  | NOTE: On XC Linux, the HP-MPI module is named mpi/hp/default and can be abbreviated 'mpi'. |  |  |  |  |
Runtime utility commands |  |
HP-MPI provides a set of utility
commands to supplement the MPI library routines. These commands
are listed below and described in the following sections: This section includes a discussion
of mpirun syntax formats, mpirun options, appfiles, the multipurpose daemon process, and
generating multihost instrumentation profiles. The HP-MPI start-up mpirun requires that MPI be installed in the same directory
on every execution host. The default is the location from which mpirun is executed. This can be overridden with the MPI_ROOT environment variable. Set the MPI_ROOT environment variable prior to starting mpirun. See “Configuring
your environment”. mpirun syntax has six formats: To run on a single host, the -np option
to mpirun can be used. For example: % $MPI_ROOT/bin/mpirun -np 4 ./a.out will run 4 ranks on the local host.
For applications that consist of multiple
programs or that run on multiple hosts, here is a list of the most
common options. For a complete list, see the mpirun man page: mpirun [-help] [-version] [-djpv] [-ck] [-t spec] [-i spec] [-commd] [-tv] -f appfile [-- extra_args_for_ appfile] Where -- extra_args_for_appfile specifies extra arguments to be applied to the programs
listed in the appfile—A space separated list of arguments.
Use this option at the end of your command line to append extra
arguments to each line of your appfile. Refer to the example in “Adding
program arguments to your appfile” for details. These
extra args also apply to spawned ne applications if specified on
the mpirun command line. In this case, each program in the application is listed in
a file called an appfile. Refer to “Appfiles” for more information. For example: % $MPI_ROOT/bin/mpirun -f my_appfile runs using an appfile named my_appfile, which might have contents such
as: -h hostA -np 2 /path/to/a.out -h hostB -np 2 /path/to/a.out which specify that two ranks are to run on hostA and two on
hostB.
Use the -prun option for applications that run on the Quadrics Elan interconnect.
When using the -prun option, mpirun sets environment variables and invokes prun utilities. Refer to “Runtime
environment variables” for more information about prun environment variables. The -prun argument to mpirun specifies that the prun command is to be used for launching. All arguments following -prun are
passed unmodified to the prun command. % $MPI_ROOT/bin/mpirun <mpirun options> -prun \ <prun options> The -np option is not allowed with prun. Some features like mpirun -stdio processing are unavailable. % $MPI_ROOT/bin/mpirun -prun -n 2 ./a.out launches a.out on two processors. % $MPI_ROOT/bin/mpirun -prot -prun -n 6 -N 6 ./a.out turns on the print protocol option (-prot is
an mpirun option, and therefore is listed before -prun)
and runs on 6 machines, one CPU per node. For more details about using prun, refer to “Running
applications on HP-UX and Linux”. HP-MPI also provides implied prun mode. The implied prun mode allows the user to omit the -prun argument from the mpirun command line with the use of the environment variable MPI_USEPRUN.
For more information about the implied prun mode see Appendix C “mpirun
using implied prun or srun”.
Applications that
run on XC clusters require the -srun option. Startup directly from srun is not supported. When using this option, mpirun sets environment variables and invokes srun utilities. Refer to “Runtime
environment variables” for more information about srun environment variables. The -srun argument to mpirun specifies that the srun command is to be used for launching. All arguments following -srun are
passed unmodified to the srun command. % $MPI_ROOT/bin/mpirun <mpirun options> -srun \ <srun options> The -np option is not allowed with srun.
Some features like mpirun -stdio processing are unavailable. % $MPI_ROOT/bin/mpirun -srun -n 2 ./a.out launches a.out on two processors. % $MPI_ROOT/bin/mpirun -prot -srun -n 6 -N 6 ./a.out turns on the print protocol option (-prot is
an mpirun option, and therefore is listed before -srun)
and runs on 6 machines, one CPU per node. For more details about using srun, refer to “Running
applications on HP-UX and Linux”. HP-MPI also provides implied srun mode. The implied srun mode allows the user to omit the -srun argument from the mpirun command line with the use of the environment variable MPI_USESRUN.
For more information about the implied srun mode see Appendix C “mpirun
using implied prun or srun”.
HP-MPI jobs can be submitted using LSF. LSF uses the SLURM srun launching mechanism. Because of this, HP-MPI jobs need
to specify the -srun option whether LSF is used
or srun is used. % bsub -I -n2 $MPI_ROOT/bin/mpirun -srun ./a.out For more details on using LSF on XC systems, refer to “Running
applications on HP-UX and Linux”. On non-XC systems, to invoke the Parallel Application Manager
(PAM) feature of LSF for applications where all processes execute
the same program on the same host: % bsub <lsf_options> pam -mpi mpirun \ <mpirun_options> program <args> For more details on using LSF on non-XC systems, refer to “Running
applications on HP-UX and Linux”. An appfile is a text file that contains process counts and
a list of programs. When you invoke mpirun with the name of the appfile, mpirun parses the appfile to get information for the
run. The format of entries in an
appfile is line oriented. Lines that end with the backslash (\)
character are continued on the next line, forming a single logical
line. A logical line starting with the pound (#) character is treated as
a comment. Each program, along with its arguments, is listed on
a separate logical line. The general form of an appfile entry is: [-h remote_host] [-e var[=val] [...]] [-l user] [-sp paths] [-np #] program [args] where - -h remote_host
Specifies the remote host where a remote executable
file is stored. The default is to search the local host. remote_host is either a host name or an IP address. - -e var=val
Sets the environment variable var for the program and gives it the value val. The default is not to set environment variables.
When you use -e with the -h option, the environment variable is set to val on the remote host. - -l user
Specifies the user name on the target host. The
default is the current user name. - -sp paths
Sets the target shell PATH environment variable
to paths. Search paths are separated by a colon. Both -sp
path and -e PATH=path do the same thing. If both are specified,
the -e PATH=path setting is used. - -np #
Specifies the number of processes to run. The default
value for # is 1. - program
Specifies the name of the executable to run. mpirun searches for the executable in the paths defined
in the PATH environment variable. - args
Specifies command line arguments to the program.
Options following a program name in your appfile are treated as program
arguments and are not processed by mpirun.
Adding
program arguments to your appfileWhen you invoke mpirun using an appfile, arguments for your program are
supplied on each line of your appfile—Refer to “Creating
an appfile”. HP-MPI also provides an option
on your mpirun command line to provide additional program arguments
to those in your appfile. This is useful if you wish to specify
extra arguments for each program listed in your appfile, but do
not wish to edit your appfile. To use an appfile when you invoke mpirun, use one of the following as described in “mpirun ”:
mpirun [mpirun_options] -f appfile \ [-- extra_args_for_appfile] bsub [lsf_options] pam -mpi mpirun [mpirun_options] -f appfile \ [-- extra_args_for_appfile]
The -- extra_args_for_appfile option is placed at the end of your command line,
after appfile, to add options to each line of your appfile.  |  |  |  |  | CAUTION: Arguments placed after - - are treated as program
arguments, and are not processed by mpirun. Use this option when you want to specify program arguments
for each line of the appfile, but want to avoid editing the appfile. |  |  |  |  |
For example, suppose your appfile contains -h voyager -np 10 send_receive arg1 arg2 -h enterprise -np 8 compute_pi |
If you invoke mpirun using the following command line: mpirun -f appfile -- arg3 -arg4 arg5 The send_receive command line for machine voyager becomes: send_receive arg1 arg2 arg3 -arg4 arg5 The compute_pi command line for machine enterprise becomes: compute_pi arg3 -arg4 arg5
When you use the -- extra_args_for_appfile option, it must be specified at the end of the mpirun command line. Assigning
ranks and improving communicationThe ranks of the processes in MPI_COMM_WORLD are
assigned and sequentially ordered according to the order the programs
appear in the appfile. For example, if your appfile contains -h voyager -np 10 send_receive -h enterprise -np 8 compute_pi |
HP-MPI assigns ranks 0 through 9 to the 10 processes running
send_receive and ranks 10 through 17 to the 8 processes running
compute_pi. You can use this sequential
ordering of process ranks to your advantage when you optimize for
performance on multihost systems. You can split process groups according
to communication patterns to reduce or remove interhost communication
hot spots. For example, if you have the following: A multi-host run of four processes Two processes per host on two hosts There is higher communication traffic between ranks
0—2 and 1—3.
You could use an appfile that contains the following: -h hosta -np 2 program1 -h hostb -np 2 program2 |
However, this places processes 0 and 1 on hosta and processes
2 and 3 on hostb, resulting in interhost communication between the
ranks identified as having slow communication: 
A more optimal appfile for this example would be -h hosta -np 1 program1 -h hostb -np 1 program2 -h hosta -np 1 program1 -h hostb -np 1 program2 |
This places ranks 0 and 2 on hosta and ranks 1 and 3 on hostb.
This placement allows intrahost communication between ranks that
are identified as communication hot spots. Intrahost communication
yields better performance than interhost communication. Generating
multihost instrumentation profilesWhen you enable instrumentation
for multihost runs, and invoke mpirun either on a host where at least one MPI process
is running, or on a host remote from all your MPI processes, HP-MPI
writes the instrumentation output file (prefix.instr) to the working directory on the host that
is running rank 0. We recommend using the mpirun launch utility. However, for HP-UX and PA-RISC
systems, HP-MPI provides a self-contained launch utility, mpirun.all that allows HP-MPI to be used without installing
it on all hosts. The restrictions for mpirun.all include Applications
must be linked statically TotalView® is unavailable to executables
launched with mpirun.all Files will be copied to a
temporary directory on target hosts The remote shell must accept
stdin
mpirun.all is not available on HP-MPI for Linux or Windows. The MPI-2 standard defines mpiexec as a simple method to start MPI applications.
It supports fewer features than mpirun, but it is portable. mpiexec syntax has three formats: mpiexec offers arguments similar to a MPI_Comm_spawn call, with arguments as shown in the following
form: mpiexec [-n maxprocs][-soft ranges][-host host][-arch arch][-wdir dir][-path dirs][-file file]command-args For example: % $MPI_ROOT/bin/mpiexec -n 8 ./myprog.x 1 2 3 creates an 8 rank MPI job on the local host consisting of
8 copies of the program myprog.x, each with the command line arguments
1, 2, and 3. It also allows arguments like a MPI_Comm_spawn_multiple call,
with a colon separated list of arguments, where each component is
like the form above. For example: % $MPI_ROOT/bin/mpiexec -n 4 ./myprog.x : -host host2 -n \ 4 /path/to/myprog.x creates a MPI job with 4 ranks on the local host and 4 on
host2. Finally, the third form allows the user to specify
a file containing lines of data like the arguments in the first
form. mpiexec [-configfile file] For example: % $MPI_ROOT/bin/mpiexec -configfile cfile gives the same results as in the second example, but using
the -configfile option (assuming the file cfile contains -n 4 ./myprog.x -host
host2 -n 4 -wdir /some/path ./myprog.x)
where mpiexec options are: - -n maxprocs
Create maxprocs MPI ranks on the specified host. - -soft range-list
Ignored in HP-MPI. - -host host
Specifies the host on which to start the ranks. - -arch arch
Ignored in HP-MPI. - -wdir dir
Working directory for the created ranks. - -path dirs
PATH environment variable for the created ranks. - -file file
Ignored in HP-MPI.
This last option is used separately from the options above. - -configfile file
Specify a file of lines containing the above options.
mpiexec does not support prun or srun startup. mpiexec is not available on HP-MPI V1.0 for Windows. mpijob lists the HP-MPI jobs running
on the system. mpijob can only be used for jobs started in appfile mode.
Invoke mpijob on the same host as you initiated mpirun. mpijob syntax is shown below: mpijob [-help] [-a] [-u] [-j id] [id id ...]] where - -help
Prints usage information for the utility. - -a
Lists jobs for all users. - -u
Sorts jobs by user name. - -j id
Provides process status for job id. You can list a number of job IDs in a space-separated
list.
When you invoke mpijob, it reports the following information for each job: - JOB
HP-MPI job identifier. - USER
User name of the owner. - NPROCS
Number of processes. - PROGNAME
Program names used in the HP-MPI application.
By default, your jobs are listed by job ID in increasing order.
However, you can specify the -a and -u options to change the default behavior. An mpijob output using the -a and -u options is shown below listing jobs for all users
and sorting them by user name. JOB USER NPROCS PROGNAME 22623 charlie 12 /home/watts 22573 keith 14 /home/richards 22617 mick 100 /home/jagger 22677 ron 4 /home/wood |
When you specify the -j option, mpijob reports the following for each job: - RANK
Rank for each process in the job. - HOST
Host where the job is running. - PID
Process identifier for each process in the job. - LIVE
Indicates whether the process is running (an x is
used) or has been terminated. - PROGNAME
Program names used in the HP-MPI application.
mpijob does not support prun or srun startup. mpijob is not available on HP-MPI V1.0 for Windows. mpiclean kills processes in HP-MPI applications started
in appfile mode. Invoke mpiclean on the host on which you initiated mpirun. The MPI library checks for abnormal termination of processes
while your application is running. In some cases, application bugs
can cause processes to deadlock and linger in the system. When this
occurs, you can use mpijob to identify hung jobs and mpiclean to kill all processes in the hung application. mpiclean syntax has two forms: mpiclean [-help] [-v] -j id [id id ....] mpiclean [-help] [-v] -m
where - -help
Prints usage information for the utility. - -v
Turns on verbose mode. - -m
Cleans up your shared-memory segments. - -j id
Kills the processes of job number id. You can specify multiple job IDs in a space-separated
list. Obtain the job ID using the -j option when you invoke mpirun.
You can only kill jobs that are your own. The second syntax is used when an application aborts during MPI_Init, and the termination of processes does not destroy
the allocated shared-memory segments. mpiclean does not support prun or srun startup. mpiclean is not available on HP-MPI V1.0 for Windows. Interconnect
support |  |
HP-MPI supports a variety of high-speed interconnects. On
HP-UX and Linux, HP-MPI will attempt to identify and use the fastest
available high-speed interconnect by default. On Windows, the selection
must be made explicitly by the user. On HP-UX and Linux, the search order for the interconnect
is determined by the environment variable MPI_IC_ORDER (which
is a colon separated list of interconnect names), and by command
line options which take higher precedence. Table 3-3 Interconnect command line options | command line option | protocol specified | applies to OS |
|---|
-ibv / -IBV | IBV— OpenFabrics InfiniBand | Linux | -vapi / -VAPI | VAPI— Mellanox Verbs API | Linux | -udapl / -UDAPL | uDAPL—InfiniBand and some others | Linux | -psm / -PSM | PSM—QLogic InfiniBand | Linux | -mx / -MX | MX—Myrinet | Linux | -gm / -GM | GM—Myrinet | Linux | -elan / -ELAN | Quadrics Elan3 or Elan4 | Linux | -itapi / -ITAPI | ITAPI—InfiniBand | HP-UX | -ibal / -IBAL | IBAL—Windows IB Access Layer | Windows | -TCP | TCP/IP | All |
The interconnect names used in MPI_IC_ORDER are
like the command line options above, but without the dash. On Linux,
the default value of MPI_IC_ORDER is ibv:vapi:udapl:psm:mx:gm:elan:tcp If command line options from the above table are used, the
effect is that the specified setting is implicitly prepended to
the MPI_IC_ORDER list, thus taking higher precedence
in the search. Interconnects specified in the command line or in the MPI_IC_ORDER variable
can be lower or upper case. Lower case means the interconnect will
be used if available. Upper case instructs HP-MPI to abort if the specified
interconnect is unavailable. The availability of an interconnect is determined based on
whether the relevant libraries can be dlopened / shl_loaded, and on whether a recognized module is loaded in Linux.
If either condition is not met, the interconnect is determined to
be unavailable. On Linux, the names and locations of the libraries to be opened,
and the names of the recognized interconnect module names are specified
by a collection of environment variables which are contained in $MPI_ROOT/etc/hpmpi.conf. The hpmpi.conf file can be used for any environment variables, but arguably
its most important use is to consolidate the environment variables
related to interconnect selection. The default value of MPI_IC_ORDER is specified
there, along with a collection of variables of the form MPI_ICLIB_XXX__YYY MPI_ICMOD_XXX__YYY where XXX is one of the interconnects (IBV,
VAPI, etc.) and YYY is an arbitrary suffix. The MPI_ICLIB_* variables
specify names of libraries to be dlopened. The MPI_ICMOD_* variables specify regular
expressions for names of modules to search for. An example is the following two pairs of variables for PSM: MPI_ICLIB_PSM__PSM_MAIN = libpsm_infinipath.so.1 MPI_ICMOD_PSM__PSM_MAIN
= "^ib_ipath " and MPI_ICLIB_PSM__PSM_PATH = /usr/lib64/libpsm_infinipath.so.1 MPI_ICMOD_PSM__PSM_PATH
= "^ib_ipath " The suffixes PSM_MAIN and PSM_PATH are
arbitrary, and represent two different attempts that will be made
when determining if the PSM interconnect is available. The list of suffixes is contained in the variable MPI_IC_SUFFIXES which is
also set in the hpmpi.conf file. So, when HP-MPI is determining the availability of the PSM interconnect,
it will first look at MPI_ICLIB_PSM__PSM_MAIN MPI_ICMOD_PSM__PSM_MAIN for the library to dlopen and module name to look for. Then, if that fails, it
will continue on to the next pair MPI_ICLIB_PSM__PSM_PATH MPI_ICMOD_PSM__PSM_PATH which in this case specifies a full path to the PSM library. The MPI_ICMOD_* variables allow relatively
complex values to specify what module names will be considered as
evidence that the specified interconnect is available. Consider
the example MPI_ICMOD_VAPI__VAPI_MAIN = \ "^mod_vapi " || "^mod_vip " || "^ib_core " This means any of those three names will be accepted as evidence
that VAPI is available. Each of the three strings individually is
a regular expression that will be grepped for in the output from /sbin/lsmod. In many cases, if a system has a high-speed interconnect that
is not found by HP-MPI due to changes in library names and locations
or module names, the problem can be fixed by simple edits to the hpmpi.conf file. Contacting HP-MPI support for assistance is encouraged. Protocol-specific
options and informationThis section briefly describes the available interconnects
and illustrates some of the more frequently used interconnects options. The environment variables and command line options mentioned
below are described in more detail in “mpirun
options”, and “List
of runtime environment variables”. TCP/IP is supported on many types of cards.Machines often
have more than one IP address, and a user may wish to specify which
interface is to be used to get the best performance. HP-MPI does not inherently know which IP address corresponds
to the fastest available interconnect card.By default IP addresses
are selected based on the list returned by gethostbyname(). The mpirun option -netaddr can be used to gain more
explicit control over which interface is used. IBAL is only supported on Windows. Lazy deregistration is
not supported with IBAL. HP-MPI claims support for OpenFabrics V1.0 and V1.1. OpenFabrics
is not supported on Itanium2 platforms. In order to use OpenFabrics on Linux, the memory size for
locking must be specified. It is controlled by the /etc/security/limits.conf file for Red Hat and the /etc/syscntl.conf file for SuSE. * soft memlock 524288 * hard memlock 524288 The example above uses the max locked-in-memory address space
in KB units. The recommendation is to set the value to half of the
physical memory. Machines can have multiple InfiniBand cards. By default each
HP-MPI rank selects one card for its communication, and the ranks
cycle through the available cards on the system, so the first rank
uses the first card, the second rank uses the second card, etc. The environment variable MPI_IB_CARD_ORDER can
be used to control which card the different ranks select. Or, for
increased potential bandwidth and greater traffic balance between
cards, each rank can be instructed to use multiple cards by using
the variable MPI_IB_MULTIRAIL. Lazy deregistration is a performance enhancement used by HP-MPI
on several of the high speed interconnects on Linux. This option
is turned on by default, and requires the application to be linked
in such a way that HP-MPI is able to intercept calls to malloc, munmap, etc. Most applications are linked that way, but if one
is not then HP-MPI's lazy deregistration can be turned off with
the command line -ndd. Some applications decline to directly link against libmpi and instead link against a wrapper library which is
in turn linked against libmpi. In this case it is still possible for HP-MPI's malloc etc. interception to be used by supplying the --auxiliary option
to the linker when creating the wrapper library, by using a compiler
flag such as -Wl, --auxiliary, libmpi.so. Note that dynamic linking is required with all InfiniBand
use on Linux. HP-MPI does not use the Connection Manager (CM) library with OpenFabrics. The MPI_IB_CARD_ORDER card selection option
and the -ndd option described above for IBV applies
to VAPI. The -ndd option described above for IBV applies
to uDAPL. The -ndd option described above for IBV applies
to GM. HP-MPI supports the Elan3 and Elan4 protocols for Quadrics. By default HP-MPI uses Elan collectives for broadcast and
barrier.If messages are outstanding at the time the Elan collective
is entered and the other side of the message enters a completion
routine on the outstanding message before entering the collective
call, it is possible for the application to hang due to lack of
message progression while inside the Elan collective. This is actually
a rather uncommon situation in real applications. But if such hangs
are observed, then the use of Elan collectives can be disabled by
using the environment variable MPI_USE_LIBELAN=0. On HP-UX InfiniBand is available by using the ITAPI protocol, which
requires MLOCK privileges. When setting up InfiniBand on an HP-UX
system, all users (other than root) who wish to use InfiniBand need
to have their group id in the /etc/privgroup file and the permissions for
access must be enabled via: % setprivgrp -f /etc/privgroup The above may be done automatically at boot time, but should
also be performed once manually after setup of the InfiniBand drivers
to ensure access. For example: % grep user /etc/passwd user:UJqaKNCCsESLo,O.fQ:836:1007:User Name:/home/user:/bin/tcsh |
% grep 1007 /etc/group % cat /etc/privgroup ibusers MLOCK #add entries to /etc/privgroup |
% setprivgrp -f /etc/privgroup A one-time setting can also be done using: /usr/sbin/setprivgrp <group> MLOCK The above setting will not survive a reboot. Interconnect
selection examplesThe default MPI_IC_ORDER will generally result
in the fastest available protocol being used. The following example
uses the default ordering and also supplies a -netaddr setting,
in case TCP/IP is the only interconnect available. % echo MPI_IC_ORDER ibv:vapi:udapl:psm:mx:gm:elan:tcp % export MPIRUN_SYSTEM_OPTIONS="-netaddr 192.168.1.0/24" % export MPIRUN_OPTIONS="-prot" % $MPI_ROOT/bin/mpirun -srun -n4 ./a.out The command line for the above will appear to mpirun as $MPI_ROOT/bin/mpirun -netaddr 192.168.1.0/24 -prot -srun -n4 ./a.out and the interconnect decision will look for IBV, then
VAPI, etc. down to TCP/IP. If TCP/IP is chosen, it will use the
192.168.1.* subnet. If TCP/IP is desired on a machine where other protocols are
available, the -TCP option can be used. This example is like the previous, except TCP is searched
for and found first. (TCP should always be available.) So TCP/IP
would be used instead of IBV or Elan, etc. % $MPI_ROOT/bin/mpirun -TCP -srun -n4 ./a.out The following example output shows three runs on an Elan system;
first using Elan as the protocol, then using TCP/IP over GigE, then
using TCP/IP over the Quadrics card. This
runs on Elan [user@opte10 user]$ bsub -I -n3 -ext "SLURM[nodes=3]" $MPI_ROOT/bin/mpirun -prot -srun ./a.out Job <59304> is submitted to default queue <normal>.<<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Host 0 -- ELAN node 0 -- ranks 0 Host 1 -- ELAN node 1 -- ranks 1 Host 2 -- ELAN node 2 -- ranks 2 host | 0 1 2 ======|================ 0 : SHM ELAN ELAN 1 : ELAN SHM ELAN 2 : ELAN ELAN SHM Hello world! I'm 0 of 3 on opte6 Hello world! I'm 1 of 3 on opte7 Hello world! I'm 2 of 3 on opte8 |
This runs on TCP/IP over the GigE network configured
as 172.20.x.x on eth0 [user@opte10 user]$ bsub -I -n3 -ext "SLURM[nodes=3]" $MPI_ROOT/bin/mpirun -prot -TCP -srun ./a.out Job <59305> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Host 0 -- ip 172.20.0.6 -- ranks 0 Host 1 -- ip 172.20.0.7 -- ranks 1 Host 2 -- ip 172.20.0.8 -- ranks 2 host | 0 1 2 ======|================ 0 : SHM TCP TCP 1 : TCP SHM TCP 2 : TCP TCP SHMHello world! I'm 0 of 3 on opte6 Hello world! I'm 1 of 3 on opte7 Hello world! I'm 2 of 3 on opte8 |
This uses TCP/IP over the Elan subnet using the -TCP option
in combination with the -netaddr option for the
Elan interface 172.22.x.x [user@opte10 user]$ bsub -I -n3 -ext "SLURM[nodes=3]" $MPI_ROOT/bin/mpirun -prot -TCP -netaddr 172.22.0.10 -srun ./a.out Job <59307> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Host 0 -- ip 172.22.0.2 -- ranks 0 Host 1 -- ip 172.22.0.3 -- ranks 1 Host 2 -- ip 172.22.0.4 -- ranks 2 host | 0 1 2 ======|================ 0 : SHM TCP TCP 1 : TCP SHM TCP 2 : TCP TCP SHMHello world! I'm 0 of 3 on opte2 Hello world! I'm 1 of 3 on opte3 Hello world! I'm 2 of 3 on opte4 |
Elan interface [user@opte10 user]$ /sbin/ifconfig eip0 eip0 Link encap:Ethernet HWaddr 00:00:00:00:00:0F inet addr:172.22.0.10 Bcast:172.22.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:65264 Metric:1 RX packets:38 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:3 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1596 (1.5 Kb) TX bytes:252 (252.0 b) |
GigE interface [user@opte10 user]$ /sbin/ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:00:1A:19:30:80 inet addr:172.20.0.10 Bcast:172.20.255.255 Mask:255.0.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:133469120 errors:0 dropped:0 overruns:0 frame:0 TX packets:135950325 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:24498382931 (23363.4 Mb) TX bytes:29823673137 (28442.0Mb) Interrupt:31 |
|