 |
» |
|
|
 |
The following sections provide definitions of mpirun options and runtime environment variables. mpirun
options |  |
This section describes the specific options included in <mpirun_options> for all of the preceding examples. They
are listed by the categories: Debugging and informational
Interconnect
selection options- -elan/-ELAN
Explicit command line interconnect selection to
use Quadrics Elan (available on Linux only). The lower case option
is taken as advisory and indicates that the interconnect should
be used if it is available. The upper case option is taken as mandatory
and instructs MPI to abort if the interconnect is unavailable. The
interaction between these options and the related MPI_IC_ORDER variable
is that any command line interconnect selection here is implicitly
prepended to MPI_IC_ORDER. - -gm/-GM
Explicit command line interconnect selection to
use Myrinet GM (available on Linux only). The lower and upper case
options are analogous to the Elan options (explained above). - -ibal/-IBAL
Explicit command line interconnect selection to
use the Windows IB Access Layer (available on Windows only). The
lower and upper case options are analogous to the Elan options (explained
above). - -ibv/-IBV
Explicit command line interconnect selection to
use OpenFabrics InfiniBand (available on Linux only). The lower
and upper case options are analogous to the Elan options (explained
above). - -itapi/-ITAPI
Explicit command line interconnect selection to
use ITAPI (available on HP-UX only). The lower and upper case options
are analogous to the Elan options (explained above). - -mx/-MX
Explicit command line interconnect selection to
use Myrinet MX (available on Linux only). The lower and upper case
options are analogous to the Elan options (explained above). - -psm/-PSM
Explicit command line interconnect selection to
use QLogic InfiniBand (available on Linux only). The lower and upper
case options are analogous to the Elan options (explained above). - -TCP
Specifies that TCP/IP should be used instead of another
high-speed interconnect. If you have multiple TCP/IP interconnects,
use -netaddr to specify which one to use. Use -prot to
see which one was selected. Example: % $MPI_ROOT/bin/mpirun -TCP -srun -N8 ./a.out - -udapl/-UDAPL
Explicit command line interconnect selection to
use uDAPL (available on Linux only). The lower and upper case options
are analogous to the Elan options (explained above). Dynamic linking is required with uDAPL. Do not link -static. - -vapi/-VAPI
Explicit command line interconnect selection to
use Mellanox Verbs API (available on Linux only). The lower and
upper case options are analogous to the Elan options (explained
above). Dynamic linking is required with VAPI. Do not link -static. - -commd
Routes all off-host communication through daemons rather
than between processes.
Local
host communication method- -intra=mix
This same functionality is available through the environment
variable MPI_INTRA which can be set to shm, nic,
or mix. Use shared memory for small messages. The
default is 256k bytes, or what is set by MPI_RDMA_INTRALEN.
For larger messages, the interconnect is used for better bandwidth. This option does not work with TCP, Elan, MX, or PSM. - -intra=nic
Use the interconnect for all intra-host data transfers. (Not
recommended for high performance solutions.) - -intra=shm
Use shared memory for all intra-host data transfers. This
is the default.
- -netaddr
This option is similar to -subnet,
but allows finer control of the selection process for TCP/IP connections. MPI
has two main sets of connections: those between ranks and/or daemons
where all the real message traffic occurs, and connections between mpirun and the daemons where little traffic occurs (but are
still necessary). The -netaddr option can be used to specify
a single IP/mask to use for both of these purposes, or specify them
individually. The latter might be needed if mpirun happens to be run on a remote machine that doesn't have
access to the same ethernet network as the rest of the cluster.
To specify both, the syntax would be -netaddr IP-specification[/mask]. To specify them individually it would be -netaddr mpirun:spec,rank:spec. The string launch: can be used in place of mpirun:. The IP-specification can be a numeric IP address like 172.20.0.1
or it can be a hostname. If a hostname is used, the value will be
the first IP address returned by gethostbyname(). The optional mask can be specified as a dotted quad,
or can be given as a number representing how many bits are to be
matched. So, for example, a mask of "11" would be equivalent to
a mask of "255.224.0.0". If an IP and mask are given, then it is expected that one
and only one IP will match at each lookup. An error or warning is
printed as appropriate if there are no matches, or too many. If
no mask is specified, then the IP matching will simply be done by
the longest matching prefix. This functionality can also be accessed using the environment
variable MPI_NETADDR. - -subnet
Allows the user to select which default interconnect should
be used for communication for TCP/IP. The interconnect is chosen
by using the subnet associated with the hostname or IP address specified
with -subnet. % $MPI_ROOT/bin/mpirun -subnet \ <hostname-or-IP-address> This option will be deprecated in favor of -netaddr in
a future release.
Launching
specifications optionsOptions for LSF users These options launch ranks as they are in appfile mode on
the hosts specified in the environment variable. - -lsb_hosts
Launches the same executable across multiple hosts. Uses
the list of hosts in the environment variable $LSB_HOSTS.
Can be used with -np option. - -lsb_mcpu_hosts
Launches the same executable across multiple hosts. Uses
the list of hosts in the environment variable $LSB_MCPU_HOSTS.
Can be used with -np option.
Options for prun users - -prun
Enables start-up with Elan usage. Only supported when
linking with shared libraries. Some features like mpirun -stdio processing are unavailable. The -np option
is not allowed with -prun. Any arguments on the mpirun command line that follow -prun are passed down to the prun command.
Options for SLURM users - -srun
Enables start-up on XC clusters. Some features like mpirun -stdio processing are unavailable. The -np option
is not allowed with -srun. Any arguments on the mpirun command line that follow -srun are passed to the srun command. Start-up directly from the srun command is not supported.
- -f appfile
Specifies the appfile that mpirun parses to get program and process count information
for the run. Refer to “Creating
an appfile” for details about setting up your appfile. - -hostfile <filename>
Launches the same executable across multiple hosts. Filename
is a text file with hostnames separated by spaces or new lines.
Can be used with the -np option. - -hostlist <list>
Launches the same executable across multiple hosts. Can
be used with the -np option. This hostlist may
be delimited with spaces or commas. Hosts can be followed with an
optional rank count, which is delimited from the hostname with either
a space or colon. If spaces are used as delimiters anywhere in the hostlist,
it may be necessary to place the entire hostlist inside quotes to
prevent the command shell from interpreting it as multiple options. - -h host
Specifies a host on which to start the processes
(default is local_host). Only applicable when running in single host mode
(mpirun -np ...). Refer to the -hostlist option which
provides more flexibility. - -l user
Specifies the username on the target host (default
is local username). -l is not available on HP-MPI for Windows. - -np #
Specifies the number of processes to run. Generally used
in single host mode, but also valid with -hostfile, -hostlist, -lsb_hosts,
and -lsb_mcpu_hosts. - -stdio=[options]
Specifies standard IO options. Refer to “External
input and output” for more information on standard
IO, as well as a complete list of stdio options. This applies to
appfiles only.
- -cpu_bind
Binds a rank to an ldom to prevent a process from moving
to a different ldom after startup. Refer to “CPU
binding” for details on how to use this option.
Application
bitness specification- -mpi32
Option for running on Opteron and Intel®64. Should be used to indicate
the bitness of the application to be invoked so that the availability
of interconnect libraries can be properly determined by the HP-MPI utilities mpirun and mpid. The default is -mpi64. - -mpi64
Option for running on Opteron and Intel®64. Should be used to indicate
the bitness of the application to be invoked so that the availability
of interconnect libraries can be properly determined by the HP-MPI utilities mpirun and mpid. The default is -mpi64.
Debugging
and informational options- -help
Prints usage information for mpirun. - -version
Prints the major and minor version numbers. - -prot
Prints the communication protocol between each host (e.g.
TCP/IP or shared memory). The exact format and content presented
by this option is subject to change as new interconnects and communication
protocols are added to HP-MPI. - -ck
Behaves like the -p option, but supports two additional checks of
your MPI application; it checks if the specified host machines and
programs are available, and also checks for access or permission
problems. This option is only supported when using appfile mode. - -d
Debug mode. Prints additional information about application
launch. - -j
Prints the HP-MPI job ID. - -p
Turns on pretend mode. That is, the system goes through
the motions of starting an HP-MPI application but does not create
processes. This is useful for debugging and checking whether the
appfile is set up correctly. This option is for appfiles only. - -v
Turns on verbose mode. - -i spec
Enables runtime instrumentation profiling for all processes. spec specifies options used when profiling. The options
are the same as those for the environment variable MPI_INSTR. For example, the following is a valid command
line: % $MPI_ROOT/bin/mpirun -i mytrace:l:nc \ -f appfile Refer to “MPI_INSTR” for
an explanation of -i options. - -T
Prints user and system times for each MPI rank. - -tv
Specifies that the application runs with the TotalView® debugger for LSF launched
applications. TV is only supported on XC systems.
- -dd
Use deferred deregistration when registering and deregistering
memory for RDMA message transfers. The default is to use deferred
deregistration. Note that using this option also produces a statistical
summary of the deferred deregistration activity when MPI_Finalize is
called. The option is ignored if the underlying interconnect does
not use an RDMA transfer mechanism, or if the deferred deregistration
is managed directly by the interconnect library. Occasionally deferred deregistration is incompatible with
a particular application or negatively impacts performance. Use -ndd to
disable this feature if necessary. Deferred deregistration of memory on RDMA networks is not
supported on HP-MPI for Windows. - -ndd
Disable the use of deferred deregistration. Refer
to the -dd option for more information. - -rdma
Specifies the use of envelope pairs for short message transfer.
The pre-pinned memory will increase continuously with the job size. - -srq
Specifies use of the shared receiving queue protocol when
OpenFabrics, Myrinet GM, ITAPI, Mellanox VAPI or uDAPL V1.2 interfaces
are used. This protocol uses less pre-pinned memory for short message transfers.
For more information, refer to “Scalability”.
MPI-2
functionality options- -1sided
Enables one-sided communication. Extends the communication
mechanism of HP-MPI by allowing one process to specify all communication
parameters, both for the sending side and for the receiving side. The best performance is achieved if an RDMA enabled interconnect,
like InfiniBand, is used. With this interconnect, the memory for
the one-sided windows can come from MPI_Alloc_mem or
from malloc. If TCP/IP is used, the performance will be lower, and
in that case the memory for the one-sided windows must come from MPI_Alloc_mem. - -spawn
Enables dynamic processes. See “Dynamic
Processes” for more information.
Environment
control options- -e var[=val]
Sets the environment variable var for the program and gives it the value val if provided. Environment variable substitutions (for
example, $FOO) are supported in the val argument. In order to append additional settings
to an existing variable, %VAR can be used as in the example in “Setting
remote environment variables”. - -sp paths
Sets the target shell PATH environment variable
to paths. Search paths are separated by a colon.
Special
HP-MPI mode option- -ha
Eliminates a teardown when ranks exit abnormally. Further
communications involved with ranks that are unreachable return error
class MPI_ERR_EXITED, but do not force the application
to teardown, as long as the MPI_Errhandler is set
to MPI_ERRORS_RETURN. Some restrictions apply: Communication is done via
TCP/IP (Does not use shared memory for intranode communication.) Cannot be used with the diagnostic
library. Cannot be used with -i option
The following are specific mpirun command line options for Windows CCP users. - -ccp
Indicates that the job is being submitted through
the Windows CCP job scheduler/launcher. This is the recommended
method for launching jobs. Required when the user doesn’t
provide an appfile. - -ccperr <filename>
Assigns the job’s standard error file to
the given filename when starting a job through the Windows CCP automatic
job scheduler/launcher feature of HP-MPI. This flag has no effect
if used for an existing CCP job. - -ccpin <filename>
Assigns the job’s standard input file to
the given filename when starting a job through the Windows CCP automatic
job scheduler/launcher feature of HP-MPI. This flag has no effect
if used for an existing CCP job. - -ccpout <filename>
Assigns the job’s standard output file
to the given filename when starting a job through the Windows CCP
automatic job scheduler/launcher feature of HP-MPI. This flag has
no effect if used for an existing CCP job. - -ccpwait
Causes the mpirun command to wait for the CCP job to finish before returning
to the command prompt when starting a job through automatic job
submittal feature of HP-MPI. By default, mpirun automatic jobs will not wait. This flag has no effect
if used for an existing CCP job. - -headnode <headnode>
This option is used on Windows CCP to indicate the headnode
to submit the mpirun job. If omitted, localhost is used. This option can
only be used as a command line option when using the mpirun automatic submittal functionality. - -hosts
This option used on Windows CCP allows you to specify a
node list to HP-MPI. Ranks are scheduled according to the host list.
The nodes in the list must be in the job allocation or a scheduler
error will occur. The HP-MPI program %MPI_ROOT%\bin\mpi_nodes.exe returns a string in the proper -hosts format with scheduled
job resources. - -jobid <job-id>
This flag used on Windows CCP will schedule
an HP-MPI job as a task to an existing job. It will submit the command
as a single CPU mpirun task to the existing job indicated by the parameter
job-id. This option can only be used as a command line option when using
the mpirun automatic submittal functionality.
- -nodex
Used on Windows CCP in addition to -ccp to
indicate that only one rank is to be used per node, regardless of the
number of CPU’s allocated with each host.
Runtime
environment variables |  |
Environment variables are used to alter the way HP-MPI executes
an application. The variable settings determine how an application
behaves and how an application allocates internal resources at runtime. Many applications run without setting any environment variables. However,
applications that use a large number of nonblocking messaging requests,
require debugging support, or need to control process placement
may need a more customized configuration. Launching methods influence how environment variables are propagated.
To ensure propagating environment variables to remote hosts, specify
each variable in an appfile using the -e option. See “Creating
an appfile” for more information. Setting
environment variables on the command line for HP-UX and LinuxEnvironment variables can be set globally on the mpirun command line. Command line options take precedence
over environment variables. For example, on HP-UX and Linux: % $MPI_ROOT/bin/mpirun -e MPI_FLAGS=y -f appfile In the above example, if some MPI_FLAGS setting
was specified in the appfile, then the global setting on the command
line would override the setting in the appfile. To add to an environment
variable rather than replacing it, use %VAR as in the following command: % $MPI_ROOT/bin/mpirun -e MPI_FLAGS=%MPI_FLAGS,y -f appfile In the above example, if the appfile specified MPI_FLAGS=z,
then the resulting MPI_FLAGS seen by the application
would be z, y. % $MPI_ROOT/bin/mpirun -e \ LD_LIBRARY_PATH=%LD_LIBRARY_PATH:/path/to/third/party/lib \ -f appfile In the above example, the user is appending to LD_LIBRARY_PATH. List
of runtime environment variables |  |
The environment variables that affect the behavior of HP-MPI
at runtime are described in the following sections categorized by
the following functions: All environment variables are listed below in alphabetical
order. General
environment variablesMPIRUN_OPTIONS is a mechanism for specifying additional
command line arguments to mpirun. If this environment variable is set, then any mpirun command will behave as if the arguments in MPIRUN_OPTIONS had
been specified on the mpirun command line. For example: % export MPIRUN_OPTIONS="-v -prot" % $MPI_ROOT/bin/mpirun -np 2 /path/to/program.x would be equivalent to running % $MPI_ROOT/bin/mpirun -v -prot -np 2 /path/to/program.x When settings are supplied on the command line, in the MPIRUN_OPTIONS variable,
and in an hpmpi.conf file, the resulting command line is as if the hpmpi.conf settings had appeared first, followed by the MPIRUN_OPTIONS,
followed by the actual command line. And since the settings are
parsed left to right, this means the later settings have higher
precedence than the earlier ones. MPI_FLAGS modifies the
general behavior of HP-MPI. The MPI_FLAGS syntax is a comma separated list as follows: [edde,][exdb,][egdb,][eadb,][ewdb,][l,][f,][i,] [s[a|p][#],][y[#],][o,][+E2,][C,][D,][E,][T,][z] where - edde
Starts the application under the dde debugger. The debugger
must be in the command search path. See “Debugging
HP-MPI applications” for more information. - exdb
Starts the application under the xdb debugger. The debugger
must be in the command search path. See “Debugging
HP-MPI applications” for more information. - egdb
Starts the application under the gdb debugger. The debugger
must be in the command search path. See “Debugging
HP-MPI applications” for more information. - eadb
Starts the application under adb—the absolute debugger.
The debugger must be in the command search path. See “Debugging
HP-MPI applications” for more information. - ewdb
Starts the application under the wdb debugger. The debugger
must be in the command search path. See “Debugging
HP-MPI applications” for more information. - epathdb
Starts the application under the path debugger.
The debugger must be in the command search path. See “Debugging
HP-MPI applications” for more information. - l
Reports memory leaks caused by not freeing memory allocated
when an HP-MPI job is run. For example, when you create a new communicator
or user-defined datatype after you call MPI_Init, you must free the memory allocated to these objects
before you call MPI_Finalize. In C, this is analogous to making calls to malloc()
and free() for each object created during program execution. Setting the l option may decrease application performance. - f
Forces MPI errors to be fatal. Using the f option
sets the MPI_ERRORS_ARE_FATAL error handler, ignoring
the programmer’s choice of error handlers. This option
can help you detect nondeterministic error problems in your code. If your code has a customized error handler that does not
report that an MPI call failed, you will not know that a failure
occurred. Thus your application could be catching an error with
a user-written error handler (or with MPI_ERRORS_RETURN)
which masks a problem. - i
Turns on language interoperability concerning the MPI_BOTTOM constant. MPI_BOTTOM Language Interoperability—Previous versions
of HP-MPI were not compliant with Section 4.12.6.1 of the MPI-2
Standard which requires that sends/receives based at MPI_BOTTOM on
a data type created with absolute addresses must access the same data
regardless of the language in which the data type was created. If
compliance with the standard is desired, set MPI_FLAGS=i
to turn on language interoperability concerning the MPI_BOTTOM constant. Compliance with the standard can break source compatibility
with some MPICH code. - s[a|p][#]
Selects signal and maximum time delay for guaranteed message
progression. The sa option selects SIGALRM. The sp option selects SIGPROF. The # option is the number of seconds to wait before issuing
a signal to trigger message progression. The default value for the MPI
library is sp0, which never issues a progression related signal.
If the application uses both signals for its own purposes, you cannot
enable the heart-beat signals. This mechanism may be used to guarantee message progression
in applications that use nonblocking messaging requests followed
by prolonged periods of time in which HP-MPI routines are not called. Generating a UNIX signal introduces a performance penalty
every time the application processes are interrupted. As a result,
while some applications will benefit from it, others may experience
a decrease in performance. As part of tuning the performance of
an application, you can control the behavior of the heart-beat signals
by changing their time period or by turning them off. This is accomplished
by setting the time period of the s option in the MPI_FLAGS environment
variable (for example: s600). Time is in seconds. You can use the s[a][p]# option with the thread-compliant library as well
as the standard non thread-compliant library. Setting s[a][p]# for
the thread-compliant library has the same effect as setting MPI_MT_FLAGS=ct when you use a value greater than 0 for #. The
default value for the thread-compliant library is sp0. MPI_MT_FLAGS=ct takes priority over the default MPI_FLAGS=sp0. Refer to “MPI_MT_FLAGS” and “Thread-compliant
library” for additional information. Set MPI_FLAGS=sa1 to guarantee that MPI_Cancel works
for canceling sends. To use gprof on XC systems, set to environment variables: MPI_FLAGS=s0 GMON_OUT_PREFIX=/tmp/app/name These options are ignored on HP-MPI for Windows. - y[#]
Enables spin-yield logic. # is the spin value and is an integer between zero
and 10,000. The spin value specifies the number of milliseconds
a process should block waiting for a message before yielding the
CPU to another process. How you apply spin-yield logic depends on how well synchronized
your processes are. For example, if you have a process that wastes
CPU time blocked, waiting for messages, you can use spin-yield to
ensure that the process relinquishes the CPU to other processes.
Do this in your appfile, by setting y[#] to y0 for the process in question. This specifies zero
milliseconds of spin (that is, immediate yield). If you are running an application stand-alone on a dedicated
system, the default setting which is MPI_FLAGS=y
allows MPI to busy spin, thereby improving latency. To avoid unnecessary
CPU consumption when using more ranks than cores, consider using
a setting such as MPI_FLAGS=y40. Specifying y without a spin value is equivalent to MPI_FLAGS=y10000, which is the default.  |  |  |  |  | NOTE: Except when using srun or prun to launch, if the ranks under a single mpid exceed the
number of CPUs on the node and a value of MPI_FLAGS=y is not specified, the default is changed to MPI_FLAGS=y0. |  |  |  |  |
If the time a process is blocked waiting for messages is short,
you can possibly improve performance by setting a spin value (between
0 and 10,000) that ensures the process does not relinquish the CPU
until after the message is received, thereby reducing latency. The system treats a nonzero spin value as a recommendation
only. It does not guarantee that the value you specify is used. - o
The option
writes an optimization report to stdout. MPI_Cart_create and MPI_Graph_create optimize the mapping of processes onto the virtual
topology only if rank reordering is enabled (set reorder=1). In the declaration statement below, see reorder=1 int numtasks, rank, source, dest, outbuf, i, tag=1, inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,}, nbrs[4], dims[2]={4,4}, periods[2]={0,0}, reorder=1, coords[2]; |
For example:  |
/opt/mpi/bin/mpirun -np 16 -e MPI_FLAGS=o ./a.out Reordering ranks for the call MPI_Cart_create(comm(size=16), ndims=2, dims=[4 4], periods=[false false], reorder=true) Default mapping of processes would result communication paths between hosts : 0 between subcomplexes : 0 between hypernodes : 0 between CPUs within a hypernode/SMP: 24 Reordered mapping results communication paths between hosts : 0 between subcomplexes : 0 between hypernodes : 0 between CPUs within a hypernode/SMP: 24 Reordering will not reduce overall communication cost. Void the optimization and adopted unreordered mapping. rank= 2 coords= 0 2 neighbors(u,d,l,r)= -1 6 1 3 rank= 0 coords= 0 0 neighbors(u,d,l,r)= -1 4 -1 1 rank= 1 coords= 0 1 neighbors(u,d,l,r)= -1 5 0 2 rank= 10 coords= 2 2 neighbors(u,d,l,r)= 6 14 9 11 rank= 2 inbuf(u,d,l,r)= -1 6 1 3 rank= 6 coords= 1 2 neighbors(u,d,l,r)= 2 10 5 7 rank= 7 coords= 1 3 neighbors(u,d,l,r)= 3 11 6 -1 rank= 4 coords= 1 0 neighbors(u,d,l,r)= 0 8 -1 5 rank= 0 inbuf(u,d,l,r)= -1 4 -1 1 rank= 5 coords= 1 1 neighbors(u,d,l,r)= 1 9 4 6 rank= 11 coords= 2 3 neighbors(u,d,l,r)= 7 15 10 -1 rank= 1 inbuf(u,d,l,r)= -1 5 0 2 rank= 14 coords= 3 2 neighbors(u,d,l,r)= 10 -1 13 15 rank= 9 coords= 2 1 neighbors(u,d,l,r)= 5 13 8 10 rank= 13 coords= 3 1 neighbors(u,d,l,r)= 9 -1 12 14 rank= 15 coords= 3 3 neighbors(u,d,l,r)= 11 -1 14 -1 rank= 10 inbuf(u,d,l,r)= 6 14 9 11 rank= 12 coords= 3 0 neighbors(u,d,l,r)= 8 -1 -1 13 rank= 8 coords= 2 0 neighbors(u,d,l,r)= 4 12 -1 9 rank= 3 coords= 0 3 neighbors(u,d,l,r)= -1 7 2 -1 rank= 6 inbuf(u,d,l,r)= 2 10 5 7 rank= 7 inbuf(u,d,l,r)= 3 11 6 -1 rank= 4 inbuf(u,d,l,r)= 0 8 -1 5 rank= 5 inbuf(u,d,l,r)= 1 9 4 6 rank= 11 inbuf(u,d,l,r)= 7 15 10 -1 rank= 14 inbuf(u,d,l,r)= 10 -1 13 15 rank= 9 inbuf(u,d,l,r)= 5 13 8 10 rank= 13 inbuf(u,d,l,r)= 9 -1 12 14 rank= 15 inbuf(u,d,l,r)= 11 -1 14 -1 rank= 8 inbuf(u,d,l,r)= 4 12 -1 9 rank= 12 inbuf(u,d,l,r)= 8 -1 -1 13 rank= 3 inbuf(u,d,l,r)= -1 7 2 -
|
 |
- +E2
Sets -1 as the value of .TRUE. and 0 as the value for .FALSE.
when returning logical values from HP-MPI routines called within
Fortran 77 applications. - C
Disables
ccNUMA support. Allows you to treat the system as a symmetric multiprocessor.
(SMP) - D
Dumps
shared memory configuration information. Use this option to get
shared memory values that are useful when you want to set the MPI_SHMEMCNTL flag. - E[on|off]
Function parameter error checking is turned off by default.
It can be turned on by setting MPI_FLAGS=Eon. - T
Prints the user and system times for each MPI rank. - z
Enables zero-buffering mode. Set this flag to convert MPI_Send and MPI_Rsend calls in your code to MPI_Ssend, without rewriting your code.
MPI_MT_FLAGS controls runtime options when you use the thread-compliant
version of HP-MPI. The MPI_MT_FLAGS syntax is a comma separated list as follows: [ct,][single,][fun,][serial,][mult] where - ct
Creates a hidden communication thread for each rank in
the job. When you enable this option, be careful not to oversubscribe
your system. For example, if you enable ct for a 16-process application running on a 16-way
machine, the result will be a 32-way job. - single
Asserts that only one thread executes. - fun
Asserts that a process can be multithreaded, but
only the main thread makes MPI calls (that is, all calls are funneled
to the main thread). - serial
Asserts that a process can be multithreaded, and multiple
threads can make MPI calls, but calls are serialized (that is, only
one call is made at a time). - mult
Asserts that multiple threads can call MPI at any
time with no restrictions.
Setting MPI_MT_FLAGS=ct has the same effect as setting MPI_FLAGS=s[a][p]#, when the value of # that is greater than 0. MPI_MT_FLAGS=ct takes priority over the default MPI_FLAGS=sp0 setting. Refer to “MPI_FLAGS”. The single, fun, serial, and mult options are mutually exclusive. For example, if
you specify the serial and mult options in MPI_MT_FLAGS, only the last option
specified is processed (in this case, the mult option). If no runtime option is specified, the
default is mult. For more information about using MPI_MT_FLAGS with the thread-compliant library, refer
to “Thread-compliant
library”. MPI_ROOT indicates the location of the HP-MPI tree. If
you move the HP-MPI installation directory from its default location
in /opt/mpi for HP-UX and /opt/hpmpi for Linux, set the MPI_ROOT environment
variable to point to the new location. See “Directory
structure for HP-UX and Linux” for more information. MPI_WORKDIR changes the
execution directory. This variable is ignored when srun or prun is used. CPU
Bind environment variablesMPI_BIND_MAP allows specification of the integer
CPU numbers, ldom numbers, or CPU masks. These are a list of integers separated
by commas (,). MPI_CPU_AFFINITY is an alternative method to using
-cpu_bind on the command line for specifying binding strategy. The possible
settings are LL, RANK, MAP_CPU, MASK_CPU, LDOM, CYCLIC, BLOCK, RR,
FILL, PACKED, SLURM, and MAP_LDOM. MPI_CPU_SPIN allows selection of spin value.
The default is 2 seconds. MPI_FLUSH_FCACHE clears the file-cache (buffer-cache).
Add "-e MPI_FLUSH_FCACHE[=x]" to the mpirun command line and the file-cache will be flushed before
the code starts; where =x is an optional percent
of memory at which to flush. If the memory in the file-cache is
greater than x, the memory is flushed. The default value is 0 (in
which case a flush is always performed). Only the lowest rank# on
each host flushes the file-cache; limited to one flush per host/job. Setting this environment variable saves time if, for example,
the file-cache is currently using 8% of the memory and =x is
set to 10. In this case, no flush is performed. Example output: MPI_FLUSH_FCACHE set to 0, fcache pct = 22, attempting to flush fcache on host opteron2 |
MPI_FLUSH_FCACHE set to 10, fcache pct = 3, no fcache flush required on host opteron2 |
Memory is allocated with mmap, then munmap'd afterwards. MP_GANG enables
gang scheduling on HP-UX systems only. Gang scheduling improves
the latency for synchronization by ensuring that all runable processes
in a gang are scheduled simultaneously. Processes waiting at a barrier,
for example, do not have to wait for processes that are not currently
scheduled. This proves most beneficial for applications with frequent
synchronization operations. Applications with infrequent synchronization,
however, may perform better if gang scheduling is disabled. Process priorities for gangs are managed identically to timeshare policies.
The timeshare priority scheduler determines when to schedule a gang
for execution. While it is likely that scheduling a gang will preempt one
or more higher priority timeshare processes, the gang-schedule policy
is fair overall. In addition, gangs are scheduled for a single time slice,
which is the same for all processes in the system. MPI processes are allocated statically at the beginning of
execution. As an MPI process creates new threads, they are all added
to the same gang if MP_GANG is enabled. The MP_GANG syntax is as follows: [ON|OFF] where - ON
Enables gang scheduling. - OFF
Disables gang scheduling.
For multihost configurations, you need to set MP_GANG for each appfile entry. Refer to the -e option in “Creating
an appfile”. You can also use the HP-UX utility
mpsched to enable gang scheduling. Refer to the HP-UX gang_sched
and mpsched man pages for more information.  |  |  |  |  | NOTE: The MP_GANG feature will be deprecated
in a future release. |  |  |  |  |
Miscellaneous
environment variablesPoint-to-point bcopy() is disabled by setting MPI_2BCOPY to
1. Valid on PA-RISC only. MPI_MAX_WINDOW is used for one-sided applications.
It specifies the maximum number of windows a rank can have at the
same time. It tells HP-MPI to allocate enough table entries. The
default is 5. % export MPI_MAX_WINDOW=10 The above example allows 10 windows to be established for
one-sided communication. Diagnostic/debug
environment variablesMPI_DLIB_FLAGS controls runtime options when you use the diagnostics
library. The MPI_DLIB_FLAGS syntax is a comma separated list as follows: [ns,][h,][strict,][nmsg,][nwarn,][dump:prefix,] [dumpf:prefix][xNUM]
where - ns
Disables message signature analysis. - h
Disables default behavior in the diagnostic library
that ignores user specified error handlers. The default considers
all errors to be fatal. - strict
Enables MPI object-space corruption detection. Setting this
option for applications that make calls to routines in the MPI-2
standard may produce false error messages. - nmsg
Disables detection of multiple buffer writes during receive
operations and detection of send buffer corruptions. - nwarn
Disables the warning messages that the diagnostic library
generates by default when it identifies a receive that expected
more bytes than were sent. - dump:prefix
Dumps (unformatted) all sent and received messages
to prefix.msgs.rank where rank is the rank of a specific process. - dumpf:prefix
Dumps (formatted) all sent and received messages
to prefix.msgs.rank where rank is the rank of a specific process. - xNUM
Defines a type-signature packing size. NUM is an unsigned integer that specifies the number
of signature leaf elements. For programs with diverse derived datatypes
the default value may be too small. If NUM is too small, the diagnostic library issues a warning during
the MPI_Finalize operation.
Refer to “Using
the diagnostics library” for
more information. On PA-RISC systems, a stack trace is printed when the following
signals occur within an application: In the event one of these signals is not caught by a user
signal handler, HP-MPI will display a brief stack trace that can
be used to locate the signal in the code. Signal 10: bus error PROCEDURE TRACEBACK: |
(0) 0x0000489c bar + 0xc [././a.out] (1) 0x000048c4 foo + 0x1c [,/,/a.out] (2) 0x000049d4 main + 0xa4 [././a.out] (3) 0xc013750c _start + 0xa8 [/usr/lib/libc.2] (4) 0x0003b50 $START$ + 0x1a0 [././a.out] |
This feature can be disabled for an individual signal handler
by declaring a user-level signal handler for the signal. To disable
for all signals, set the environment variable MPI_NOBACKTRACE: % setenv MPI_NOBACKTRACE See “Backtrace
functionality” for
more information. MPI_INSTR enables counter instrumentation for
profiling HP-MPI applications. The MPI_INSTR syntax is a colon-separated list (no spaces between
options) as follows: prefix[:l][:nc][:off] where - prefix
Specifies the instrumentation output file prefix.
The rank zero process writes the application’s measurement
data to prefix.instr in ASCII. If the prefix does not represent
an absolute pathname, the instrumentation output file is opened
in the working directory of the rank zero process when MPI_Init is called. - l
Locks ranks to CPUs and uses the CPU’s
cycle counter for less invasive timing. If used with gang scheduling, the
:l is ignored. - nc
Specifies no clobber. If the instrumentation output
file exists, MPI_Init aborts. - off
Specifies counter instrumentation is initially turned
off and only begins after all processes collectively call MPIHP_Trace_on.
Refer to “Using
counter instrumentation” for
more information. Even though you can specify profiling options through the MPI_INSTR environment variable, the recommended approach
is to use the mpirun command with the -i option instead. Using mpirun to specify profiling options guarantees that multihost
applications do profiling in a consistent manner. Refer to “mpirun ” for more information.
Counter instrumentation and trace-file generation are mutually exclusive
profiling techniques.  |  |  |  |  | NOTE: When you enable instrumentation for multihost runs,
and invoke mpirun either on a host where at least one MPI process
is running, or on a host remote from all your MPI processes, HP-MPI
writes the instrumentation output file (prefix.instr) to the working directory on the host that
is running rank 0. |  |  |  |  |
When you use the TotalView debugger,
HP-MPI uses your PATH variable to find TotalView. You can also set
the absolute path and TotalView specific options in the TOTALVIEW
environment variable. This environment variable is used by mpirun. % setenv TOTALVIEW /opt/totalview/bin/totalview Interconnect
selection environment variablesMPI_IC_ORDER is an environment variable whose default
contents are "ibv:vapi:udapl:psm:mx:gm:elan:itapi:TCP" and instructs
HP-MPI to search in a specific order for the presence of an interconnect.
Lowercase selections imply use if detected, otherwise keep searching.
An uppercase option demands that the interconnect option be used,
and if it cannot be selected the application will terminate with
an error. For example: % export MPI_IC_ORDER="ibv:vapi:udapl:psm:mx:gm:elan: \ itapi:TCP" % export MPIRUN_OPTIONS="-prot" % $MPI_ROOT/bin/mpirun -srun -n4 ./a.out The command line for the above will appear to mpirun as $MPI_ROOT/bin/mpirun -prot -srun -n4 ./a.out and the interconnect decision will look for the presence
of Elan and use it if found. Otherwise, interconnects will be tried
in the order specified by MPI_IC_ORDER. The following is an example of using TCP over GigE, assuming
GigE is installed and 192.168.1.1 corresponds to the ethernet interface
with GigE. Note the implicit use of -netaddr 192.168.1.1
is required to effectively get TCP over the proper subnet. % export MPI_IC_ORDER="ibv:vapi:udapl:psm:mx:gm:elan: \ itapi:TCP" % export MPIRUN_SYSTEM_OPTIONS="-netaddr 192.168.1.1" % $MPI_ROOT/bin/mpirun -prot -TCP -srun -n4 ./a.out On an XC system, the cluster installation will define the
MPI interconnect search order based on what is present on the system. When HP-MPI is determining the availability of a given interconnect
on Linux, it tries to open libraries and find loaded modules based
on a collection of variables of the form This is described in more detail in “Interconnect
support”. The use of interconnect environment variables MPI_ICLIB_ELAN, MPI_ICLIB_GM, MPI_ICLIB_ITAPI, MPI_ICLIB_MX, MPI_ICLIB_UDAPL, MPI_ICLIB_VAPI,
and MPI_ICLIB_VAPIDIR has been deprecated. Refer to “Interconnect
support” for more information
on interconnect environment variables. MPI_COMMD routes all off-host communication through daemons
rather than between processes. The MPI_COMMD syntax is as follows: out_frags,in_frags where - out_frags
Specifies the number of 16Kbyte fragments available
in shared memory for outbound messages. Outbound messages are sent
from processes on a given host to processes on other hosts using
the communication daemon. The default value for out_frags is 64. Increasing the number of fragments for applications
with a large number of processes improves system throughput. - in_frags
Specifies the number of 16Kbyte fragments available
in shared memory for inbound messages. Inbound messages are sent
from processes on one or more hosts to processes on a given host
using the communication daemon. The default value for in_frags is 64. Increasing the number of fragments for applications
with a large number of processes improves system throughput.
Only works with the -commd option. When -commd is used, MPI_COMMD specifies
daemon communication fragments. InfiniBand
environment variablesDefines mapping of ranks to IB cards. % setenv MPI_IB_CARD_ORDER <card#>[:port#] Where: - card#
ranges from 0 to N-1 - port#
ranges from 0 to 1
Card:port can be a comma separated list which drives the assignment
of ranks to cards and ports within the cards. Note that HP-MPI numbers the ports on a card from 0 to N-1,
whereas utilities such as vstat display ports numbered 1 to N. Examples: To use the 2nd IB card: % mpirun -e MPI_IB_CARD_ORDER=1 ... To use the 2nd port of the 2nd card: % mpirun -e MPI_IB_CARD_ORDER=1:1 ... To use the 1st IB card: % mpirun -e MPI_IB_CARD_ORDER=0 ... To assign ranks to multiple cards: % mpirun -e MPI_IB_CARD_ORDER=0,1,2 will assign the local ranks per node in order to each
card. % mpirun -hostlist "host0 4 host1 4" creates ranks 0-3 on host0 and ranks 4-7 on
host1. Will assign rank 0 to card 0, rank 1 to card 1, rank 2 to
card 2, rank 3 to card 0 all on host0. And will assign rank 4 to
card 0, rank 5 to card 1, rank 6 to card 2, rank 7 to card 0 all
on host1. % mpirun -hostlist -np 8 "host0 host1" creates ranks 0 through 7 alternatingly on
host0, host1, host0, host1, etc. Will assign rank 0 to card 0, rank
2 to card 1, rank 4 to card 2, rank 6 to card 0 all on host0. And
will assign rank 1 to card 0, rank 3 to card 1, rank 5 to card 2,
rank 7 to card 0 all on host1. HP-MPI supports IB partitioning via Mellanox VAPI and OpenFabrics
Verbs API. By default, HP-MPI will search the unique full membership
partition key from the port partition key table used. If no such
pkey is found, an error is issued. If multiple pkeys are found,
all such pkeys are printed and an error message is issued. If the environment variable MPI_IB_PKEY has
been set to a value, either in hex or decimal, the value is treated
as the pkey, and the pkey table is searched for the same pkey. If
the pkey is not found, an error message is issued. When a rank selects a pkey to use, a check is made to make
sure all ranks are using the same pkey. If ranks are not using the
same pkey, and error message is issued. MPI_IBV_QPPARAMS=a,b,c,d,e Specifies QP settings
for IBV where: - a
Time-out value for IBV retry if no response from
target. Minimum is 1. Maximum is 31. Default is 18. - b
The retry count after time-out before error is issued. Minimum
is 0. Maximum is 7. Default is 7. - c
The minimum Receiver Not Ready (RNR) NAK timer. After
this time, an RNR NAK is sent back to the sender. Values: 1(0.01ms)
- 31(491.52ms); 0(655.36ms). The default is 24(40.96ms). - d
RNR retry count before error is issued. Minimum
is 0. Maximum is 7. Default is 7 (infinite). - e
The max inline data size. Default is 128 bytes.
MPI_VAPI_QPPARAMS=a,b,c,d specifies time-out setting
for VAPI where: - a
Time out value for VAPI retry if no response from target.
Minimum is 1. Maximum is 31. Default is 18. - b
The retry count after time-out before error is issued. Minimum
is 0. Maximum is 7. Default is 7. - c
The minimum Receiver Not Ready (RNR) NAK timer. After
this time, an RNR NAK is set back to the sender. Values: 1(0.01ms)
- 31(491.52ms); 0(655.36ms). The default is 24(40.96ms). - d
RNR retry count before error is issued. Minimum
is 0. Maximum is 7. Default is 7 (infinite).
Memory
usage environment variablesMPI_GLOBMEMSIZE=e Where e is the total bytes
of shared memory of the job. If the job size is N, then each rank
has e/N bytes of shared memory. 12.5% is used as generic. 87.5%
is used as fragments. The only way to change this ratio is to use MPI_SHMEMCNTL. Set MPI_NO_MALLOCLIB to avoid using HP-MPI’s ptmalloc
implementation and instead use the standard libc implementation
(or perhaps a malloc implementation contained in the application). See “Improved
deregistration via ptmalloc (Linux only)” for more
information. MPI_PAGE_ALIGN_MEM causes the HP-MPI library to
page align and page pad memory. This is for multi-threaded InfiniBand
support. % export MPI_PAGE_ALIGN_MEM=1 MPI_PHYSICAL_MEMORY allows the user to specify the
amount of physical memory in kilobytes available on the system. MPI
normally attempts to determine the amount of physical memory for the
purpose of determining how much memory to pin for RDMA message transfers
on InfiniBand and Myrinet GM. The value determined by HP-MPI can
be displayed using the -dd option. If HP-MPI specifies
an incorrect value for physical memory, this environment variable
can be used to specify the value explicitly: % export MPI_PHYSICAL_MEMORY=1048576 The above example specifies that the system has 1GB of physical memory. MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY are
ignored unless InfiniBand or Myrinet GM is in use. MPI_RANKMEMSIZE=d Where d is the total bytes
of shared memory of the rank. Specifies the shared memory for each
rank. 12.5% is used as generic. 87.5% is used as fragments. The
only way to change this ratio is to use MPI_SHMEMCNTL. MPI_RANKMEMSIZE differs from MPI_GLOBMEMSIZE,
which is the total shared memory across all the ranks on the host. MPI_RANKMEMSIZE takes
precedence over MPI_GLOBMEMSIZE if both are set.
Both MPI_RANKMEMSIZE and MPI_GLOBMEMSIZE are
mutually exclusive to MPI_SMEMCNTL. If MPI_SHMEMCNTL is
set, then the user cannot set the other two, and vice versa. MPI_PIN_PERCENTAGE communicates the maximum
percentage of physical memory (see MPI_PHYSICAL_MEMORY) that
can be pinned at any time. The default is 20%. % export MPI_PIN_PERCENTAGE=30 The above example permits the HP-MPI library to pin (lock
in memory) up to 30% of physical memory. The pinned memory is shared
between ranks of the host that were started as part of the same mpirun invocation. Running multiple MPI applications on the
same host can cumulatively cause more than one application’s MPI_PIN_PERCENTAGE to be
pinned. Increasing MPI_PIN_PERCENTAGE can improve
communication performance for communication intensive applications
in which nodes send and receive multiple large messages at a time,
such as is common with collective operations. Increasing MPI_PIN_PERCENTAGE allows
more large messages to be progressed in parallel using RDMA transfers, however
pinning too much of physical memory may negatively impact computation
performance. MPI_PIN_PERCENTAGE and MPI_PHYSICAL_MEMORY are
ignored unless InfiniBand or Myrinet GM is in use. MPI_SHMEMCNTL controls
the subdivision of each process’s shared memory for the
purposes of point-to-point and collective communications. It cannot
be used in conjunction with MPI_GLOBMEMSIZE. The MPI_SHMEMCNTL syntax is a comma separated list as follows: nenv, frag, generic where - nenv
Specifies the number of envelopes per process pair.
The default is 8. - frag
Denotes the size in bytes of the message-passing fragments
region. The default is 87.5 percent of shared memory after mailbox
and envelope allocation. - generic
Specifies the size in bytes of the generic-shared memory
region. The default is 12.5 percent of shared memory after mailbox
and envelope allocation. The generic region is typically used for
collective communication.
MPI_SHMEMCNTL=a,b,c where: - a
The number of envelopes for shared memory communication.
The default is 8. - b
The bytes of shared memory to be used as fragments for
messages. - c
The bytes of shared memory for other generic use,
such as MPI_Alloc_mem() call.
MPI_USE_MALLOPT_AVOID_MMAPInstructs the underlying malloc implementation to avoid mmaps
and instead use sbrk() to get all the memory used. The default is MPI_USE_MALLOPT_AVOID_MMAP=0. Connection
related environment variablesMPI_LOCALIP specifies the host IP address that is assigned
throughout a session. Ordinarily, mpirun determines the IP address of the host it is running
on by calling gethostbyaddr. However, when a host uses a SLIP or PPP protocol,
the host’s IP address is dynamically assigned only when
the network connection is established. In this case, gethostbyaddr may not return the correct IP address. The MPI_LOCALIP syntax is as follows: xxx.xxx.xxx.xxx where xxx.xxx.xxx.xxx specifies the host IP address. MPI_MAX_REMSH=N HP-MPI includes a startup scalability
enhancement when using the -f option to mpirun. This enhancement allows a large number of HP-MPI daemons
(mpid) to be created without requiring mpirun to maintain a large number of remote shell connections. When running with a very large number of nodes, the number
of remote shells normally required to start all of the daemons can
exhaust the available file descriptors. To create the necessary
daemons, mpirun uses the remote shell specified with MPI_REMSH to
create up to 20 daemons only, by default. This number can be changed
using the environment variable MPI_MAX_REMSH. When
the number of daemons required is greater than MPI_MAX_REMSH, mpirun will create only MPI_MAX_REMSH number
of remote daemons directly. The directly created daemons will then
create the remaining daemons using an n-ary tree, where n is the value
of MPI_MAX_REMSH. Although this process is generally
transparent to the user, the new startup requires that each node
in the cluster is able to use the specified MPI_REMSH command
(e.g. rsh, ssh) to each node in the cluster without a password.
The value of MPI_MAX_REMSH is used on a per-world
basis. Therefore, applications which spawn a large number of worlds
may need to use a small value for MPI_MAX_REMSH. MPI_MAX_REMSH is
only relevant when using the -f option to mpirun. The default value is 20. Allows control of the selection process for TCP/IP connections.
The same functionality can be accessed by using the -netaddr option
to mpirun. See “mpirun
options” for
more information. By default, HP-MPI attempts to use ssh on Linux and remsh on HP-UX. On Linux, we recommend that ssh users set StrictHostKeyChecking=no in their ~/.ssh/config. To use rsh on Linux instead, the following script needs to be run
as root on each node in the cluster: % /opt/hpmpi/etc/mpi.remsh.default Or, to use rsh on Linux, use the alternative method of manually populating
the files /etc/profile.d/hpmpi.csh and /etc/profile.d/hpmpi.sh with the following settings respectively: setenv MPI_REMSH rsh export MPI_REMSH=rsh On HP-UX, MPI_REMSH specifies a command other than the default remsh to start remote processes. The mpirun, mpijob, and mpiclean utilities support MPI_REMSH. For example, you can set the environment variable
to use a secure shell: % setenv MPI_REMSH /bin/ssh HP-MPI allows users to specify the remote execution tool to
use when HP-MPI needs to start processes on remote hosts. The tool
specified must have a call interface similar to that of the standard
utilities: rsh, remsh and ssh. An alternate remote execution tool, such as ssh, can be used on HP-UX by setting the environment variable MPI_REMSH to
the name or full path of the tool to use: % export MPI_REMSH=ssh % $MPI_ROOT/bin/mpirun <options> -f <appfile> HP-MPI also supports setting MPI_REMSH using
the -e option to mpirun: % $MPI_ROOT/bin/mpirun -e MPI_REMSH=ssh <options> -f \ <appfile> This release also supports setting MPI_REMSH to
a command which includes additional arguments: % $MPI_ROOT/bin/mpirun -e MPI_REMSH="ssh -x" <options> \ -f <appfile> When using ssh, first ensure that it is possible to use ssh from the host where mpirun is executed to the other nodes without ssh requiring any interaction from the user. RDMA
tunable environment variables-e MPI_RDMA_INTRALEN=262144 Specifies
the size (in bytes) of the transition from shared memory to interconnect
when -intra=mix is used. For messages less than
or equal to the specified size, shared memory will be used. For
messages greater than that size, the interconnect will be used.
TCP/IP, Elan, MX, and PSM do not have mixed mode. MPI_RDMA_MSGSIZE=a,b,c Specifies
message protocol length where: - a
Short message protocol threshold. If the message length
is bigger than this value, middle or long message protocol is used.
The default is 16384 bytes, but on HP-UX 32768 bytes is used. - b
Middle message protocol. If the message length is
less than or equal to b, consecutive short messages are used to
send the whole message. By default, b is set to 16384 bytes, the
same as a, to effectively turn off middle message protocol. On IBAL,
the default is 131072 bytes. - c
Long message fragment size. If the message is greater than
b, the message is fragmented into pieces up to c in length (or actual
length if less than c) and the corresponding piece of the user’s
buffer is pinned directly. The default is 4194304 bytes, but on
Myrinet GM and IBAL the default is 1048576 bytes.
MPI_RDMA_NENVELOPE=N Specifies
the number of short message envelope pairs for each connection if
RDMA protocol is used, where N is the number of envelope pairs.
The default is between 8 and 128 depending on the number of ranks. MPI_RDMA_NFRAGMENT=N Specifies
the number of long message fragments that can be concurrently pinned
down for each process, either sending or receiving. The max number
of fragments that can be pinned down for a process is 2*N. The default
value of N is 128. MPI_RDMA_NONESIDED=N Specifies
the number of one-sided operations that can be posted concurrently
for each rank, no matter the destination. The default is 8. MPI_RDMA_NSRQRECV=K Specifies
the number of receiving buffers used when the shared receiving queue
is used, where K is the number of receiving buffers. If N is the
number of offhost connection from a rank, then the default value
can be calculated as: The smaller of the values Nx8 and 2048. In the above example, the number of receiving buffers are
calculated as 8 times the number of offhost connections. If this
number is greater than 2048, then 2048 is used as the maximum number. prun/srun
environment variablesAllows prun options to be implicitly added to the launch command
when SPAWN functionality is used to create new ranks with prun. Allows srun options to be implicitly added to the launch command
when SPAWN functionality is used to create new ranks with srun. Allows additional srun options to be specified such as --label. % setenv MPI_SRUNOPTIONS <option> HP-MPI provides the capability to automatically assume that prun is the default launching mechanism. This mode of operation automatically
classifies arguments into 'prun' and 'mpirun' arguments and correctly places them on the command line.The
assumed prun mode also allows appfiles to be interpreted for command
line arguments and translated into prun mode. The implied prun method of launching is useful for applications which
embed or generate their mpirun invocations deeply within the application. See Appendix C for more information. Provides an easy way to modify the arguments contained in
an appfile by supplying a list of space-separated arguments that mpirun should ignore. % setenv MPI_USEPRUN_IGNORE_ARGS <option> HP-MPI provides the capability to automatically assume that srun is the default launching mechanism. This mode of operation automatically
classifies arguments into 'srun' and 'mpirun' arguments and correctly places them on the command line.The
assumed srun mode also allows appfiles to be interpreted for command
line arguments and translated into srun mode. The implied srun method of launching is useful for applications which
embed or generate their mpirun invocations deeply within the application. This allows
existing ports of an application from an HP-MPI supported platform
to XC. See Appendix C for more information. Provides an easy way to modify the arguments contained in
an appfile by supplying a list of space-separated arguments that mpirun should ignore. % setenv MPI_USESRUN_IGNORE_ARGS <option> In the example below, the command line contains a reference
to -stdio=bnone which is filtered out because it
is set in the ignore list. % setenv MPI_USESRUN_VERBOSE 1 % setenv MPI_USESRUN_IGNORE_ARGS -stdio=bnone % setenv MPI_USESRUN 1 % setenv MPI_SRUNOPTION --label % bsub -I -n4 -ext "SLURM[nodes=4]" \ $MPI_ROOT/bin/mpirun -stdio=bnone -f appfile -- pingpong Job <369848> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> /opt/hpmpi/bin/mpirun unset MPI_USESRUN;/opt/hpmpi/bin/mpirun -srun ./pallas.x -npmin 4 pingpong |
Allows prun specific options to be added automatically to the mpirun command line. For example: % export MPI_PRUNOPTIONS="-m cyclic -x host0" % mpirun -prot -prun -n2 ./a.out is equivalent to: % mpirun -prot -prun -m cyclic -x host0 -n2 ./a.out TCP
environment variablesThe integer value indicates the number of simultaneous messages
larger than 16KB that may be transmitted to a single rank at once
via TCP/IP. Setting this variable to a larger value can allow HP-MPI
to utilize more parallelism during its low-level message transfers,
but can greatly reduce performance by causing switch congestion.
Setting MPI_TCP_CORECVLIMIT to zero will not limit
the number of simultaneous messages a rank may receive at once.
The default value is 0. Specifies, in bytes, the amount of system buffer space to
allocate for sockets when using the TCP/IP protocol for communication.
Setting MPI_SOCKBUFSIZE results in calls to setsockopt (..., SOL_SOCKET, SO_SNDBUF,
...) and setsockopt (..., SOL_SOCKET, SO_RCVBUF,
...). If unspecified, the system default (which on many systems
is 87380 bytes) is used. Elan
environment variablesBy default when Elan is in use, the HP-MPI library uses Elan’s
native collective operations for performing MPI_Bcast and MPI_
Barrier operations on MPI_COMM_WORLD sized
communicators. This behavior can be changed by setting MPI_USE_LIBELAN to “false” or “0”,
in which case these operations will be implemented using point-to-point
Elan messages. To turn off: % export MPI_USE_LIBELAN=0 The use of Elan’s native collective operations may
be extended to include communicators which are smaller than MPI_COMM_WORLD by
setting the MPI_USE_LIBELAN_SUB environment variable
to a positive integer. By default, this functionality is disabled due
to the fact that libelan memory resources are consumed and may eventually
cause runtime failures when too many sub-communicators are created. % export MPI_USE_LIBELAN_SUB=10 By default, HP-MPI only provides exclusive window locks via
Elan lock when using the Elan interconnect. In order to use HP-MPI
shared window locks, the user must turn off Elan lock and use window
locks via shared memory. In this way, both exclusive and shared locks
are from shared memory. To turn off Elan locks, set MPI_ELANLOCK to
zero. % export MPI_ELANLOCK=0 Rank
Identification Environment VariablesHP-MPI sets several environment variables to let the user
access information about the MPI rank layout prior to calling MPI_Init.
These variables differ from the others in this section in that the
user doesn’t set these to provide instructions to HP-MPI;
HP-MPI sets them to give information to the user’s application. - HPMPI=1
This variable is set so that
an application can conveniently tell if it is running under HP-MPI. - MPI_NRANKS
This is set to the number
of ranks in the MPI job. - MPI_RANKID
This is set to the rank number
of the current process. - MPI_LOCALNRANKS
This is set to the number
of ranks on the local host. - MPI_LOCALRANKID
This is set to the rank number
of the current process relative to the local host (0.. MPI_LOCALNRANKS-1). Note that these settings are not available when running
under srun or prun. However, similar information can be gathered from the
variables set by those systems; such as SLURM_NPROCS and SLURM_PROCID.
|