| United States-English |
|
|
|
![]() |
HP-MPI User's Guide > Chapter 1 IntroductionMPI concepts |
|
The primary goals of MPI are efficient communication and portability. Although several message-passing libraries exist on different systems, MPI is popular for the following reasons:
An MPI program consists of a set of processes and a logical communication medium connecting those processes. An MPI process cannot directly access memory in another MPI process. Inter-process communication requires calling MPI routines in both processes. MPI defines a library of routines through which MPI processes communicate. The MPI library routines provide a set of functions that support
Although the MPI library contains a large number of routines, you can design a large number of applications by using the six routines listed in Table 1-1 “Six commonly used MPI routines”. Table 1-1 Six commonly used MPI routines
You must call MPI_Finalize in your application to conform to the MPI Standard. HP-MPI issues a warning when a process exits without calling MPI_Finalize.
As your application grows in complexity, you can introduce other routines from the library. For example, MPI_Bcast is an often-used routine for sending or broadcasting data from one process to other processes in a single operation. Use broadcast transfers to get better performance than with point-to-point transfers. The latter use MPI_Send to send data from each sending process and MPI_Recv to receive it at each receiving process. The following sections briefly introduce the concepts underlying MPI library routines. For more detailed information refer to MPI: A Message-Passing Interface Standard. Point-to-point communication involves sending and receiving messages between two processes. This is the simplest form of data transfer in a message-passing model and is described in Chapter 3, “Point-to-Point Communication” in the MPI 1.0 standard. The performance of point-to-point communication is measured in terms of total transfer time. The total transfer time is defined as total_transfer_time = latency + (message_size/bandwidth) where
Low latencies and high bandwidths lead to better performance. A communicator is an object that represents a group of processes and their communication medium or context. These processes exchange messages to transfer data. Communicators encapsulate a group of processes such that communication is restricted to processes within that group. The default communicators provided by MPI are MPI_COMM_WORLD and MPI_COMM_SELF. MPI_COMM_WORLD contains all processes that are running when an application begins execution. Each process is the single member of its own MPI_COMM_SELF communicator. Communicators that allow processes within a group to exchange data are termed intracommunicators. Communicators that allow processes in two different groups to exchange data are called intercommunicators. Many MPI applications depend upon knowing the number of processes and the process rank within a given communicator. There are several communication management functions; two of the more widely used are MPI_Comm_size and MPI_Comm_rank. The process rank is a unique number assigned to each member process from the sequence 0 through (size-1), where size is the total number of processes in the communicator. To determine the number of processes in a communicator, use the following syntax: MPI_Comm_size (MPI_Comm comm, int *size); where
To determine the rank of each process in comm, use MPI_Comm_rank(MPI_Comm comm, int *rank); where
A communicator is an argument to all communication routines. The C code example, “communicator.c” displays the use MPI_Comm_dup, one of the communicator constructor functions, and MPI_Comm_free, the function that marks a communication object for deallocation. There are two methods for sending and receiving data: blocking and nonblocking. In blocking communications, the sending process does not return until the send buffer is available for reuse. In nonblocking communications, the sending process returns immediately, and may only have started the message transfer operation, not necessarily completed it. The application may not safely reuse the message buffer after a nonblocking routine returns until MPI_Wait indicates that the message transfer has completed. In nonblocking communications, the following sequence of events occurs: Blocking communication consists of four send modes and one receive mode. The four send modes are:
You can invoke any mode by using the appropriate routine name and passing the argument list. Arguments are the same for all modes. For example, to code a standard blocking send, use MPI_Send (void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm); where
To code a blocking receive, use MPI_Recv (void *buf, int count, MPI_datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status); where
Examples “send_receive.f”, “ping_pong.c”, and “master_worker.f90” all illustrate the use of standard blocking sends and receives.
MPI provides nonblocking counterparts for each of the four blocking send routines and for the receive routine. Table 1-2 “MPI blocking and nonblocking calls” lists blocking and nonblocking routine calls. Table 1-2 MPI blocking and nonblocking calls
Nonblocking calls have the same arguments, with the same meaning as their blocking counterparts, plus an additional argument for a request. To code a standard nonblocking send, use MPI_Isend(void *buf, int count, MPI_datatype dtype, int dest, int tag, MPI_Comm comm, MPI_Request *req); where To complete nonblocking sends and receives, you can use MPI_Wait or MPI_Test. The completion of a send indicates that the sending process is free to access the send buffer. The completion of a receive indicates that the receive buffer contains the message, the receiving process is free to access it, and the status object, that returns information about the received message, is set. Applications may require coordinated operations among multiple processes. For example, all processes need to cooperate to sum sets of numbers distributed among them. MPI provides a set of collective operations to coordinate operations among processes. These operations are implemented such that all processes call the same operation with the same arguments. Thus, when sending and receiving messages, one collective operation can replace multiple sends and receives, resulting in lower overhead and higher performance. Collective operations consist of routines for communication, computation, and synchronization. These routines all specify a communicator argument that defines the group of participating processes and the context of the operation. Collective operations are valid only for intracommunicators. Intercommunicators are not allowed as arguments. Collective communication involves the exchange of data among
all processes in a group. The communication can be one-to-many, The single originating process in the one-to-many routines or the single receiving process in the many-to-one routines is called the root. Collective communications have three basic patterns:
The syntax of the MPI collective functions is designed to be consistent with point-to-point communications, but collective functions are more restrictive than point-to-point functions. Some of the important restrictions to keep in mind are:
For detailed discussions of collective communications refer to Chapter 4, “Collective Communication” in the MPI 1.0 standard. The following examples demonstrate the syntax to code two collective operations; a broadcast and a scatter: MPI_Bcast(void *buf, int count, MPI_Datatype dtype, int root, MPI_Comm comm); where For example “compute_pi.f” uses MPI_BCAST to broadcast one integer from process 0 to every process in MPI_COMM_WORLD. MPI_Scatter (void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm); where
Computational operations do global reduction operations, such as sum, max, min, product, or user-defined functions across all members of a group. There are a number of global reduction functions:
Section 4.9, “Global Reduction Operations” in the MPI 1.0 standard describes each of these functions in detail. Reduction operations are binary and are only valid on numeric data. Reductions are always associative but may or may not be commutative. You can select a reduction operation from a predefined list (refer to section 4.9.2 in the MPI 1.0 standard) or define your own operation. The operations are invoked by placing the operation name, for example MPI_SUM or MPI_PROD, in op as described in the MPI_Reduce syntax below. MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype dtype, MPI_Op op, int root, MPI_Comm comm); where For example “compute_pi.f” uses MPI_REDUCE to sum the elements provided in the input buffer of each process in MPI_COMM_WORLD, using MPI_SUM, and returns the summed value in the output buffer of the root process (in this case, process 0). Collective routines return as soon as their participation in a communication is complete. However, the return of the calling process does not guarantee that the receiving processes have completed or even started the operation. To synchronize the execution of processes, call MPI_Barrier. MPI_Barrier blocks the calling process until all processes in the communicator have called it. This is a useful approach for separating two stages of a computation so messages from each stage do not overlap. where
For example, “cart.C” uses MPI_Barrier to synchronize data before printing. You can use predefined datatypes (for example, MPI_INT in C) to transfer data between two processes using point-to-point communication. This transfer is based on the assumption that the data transferred is stored in contiguous memory (for example, sending an array in a C or Fortran application). When you want to transfer data that is not homogeneous, such as a structure, or that is not contiguous in memory, such as an array section, you can use derived datatypes or packing and unpacking functions:
Using derived datatypes is more efficient than using MPI_Pack and MPI_Unpack. However, derived datatypes cannot handle the case where the data layout varies and is unknown by the receiver, for example, messages that embed their own layout description. Section 3.12, “Derived Datatypes” in the MPI 1.0 standard describes the construction and use of derived datatypes. The following is a summary of the types of constructor functions available in MPI: After you create a derived datatype, you must commit it by calling MPI_Type_commit. HP-MPI optimizes collection and communication of derived datatypes. Section 3.13, “Pack and unpack” in the MPI 1.0 standard describes the details of the pack and unpack functions for MPI. Used together, these routines allow you to transfer heterogeneous data in a single message, thus amortizing the fixed overhead of sending and receiving a message over the transmittal of many elements. Refer to Chapter 3, “User-Defined Datatypes and Packing” in MPI: The Complete Reference for a discussion of this topic and examples of construction of derived datatypes from the basic datatypes using the MPI constructor functions. By default, processes in an MPI application can only do one task at a time. Such processes are single-threaded processes. This means that each process has an address space together with a single program counter, a set of registers, and a stack. A process with multiple threads has one address space, but each process thread has its own counter, registers, and stack. Multilevel parallelism refers to MPI processes that have multiple threads. Processes become multi threaded through calls to multi threaded libraries, parallel directives and pragmas, or auto-compiler parallelism. Refer to “Thread-compliant library” for more information on linking with the thread-compliant library. Multilevel parallelism is beneficial for problems you can decompose into logical parts for parallel execution; for example, a looping construct that spawns multiple threads to do a computation and joins after the computation is complete. The example program, “multi_par.f” is an example of multilevel parallelism. This chapter only provides a brief introduction to basic MPI concepts. Advanced MPI topics include:
To learn more about the basic concepts discussed in this chapter and advanced MPI topics refer to MPI: The Complete Reference and MPI: A Message-Passing Interface Standard. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||