Often, clusters might have both ethernet and some form of
higher-speed interconnect such as InfiniBand. This section describes
how to use the ping_pong_ring.c example program to confirm that
you are able to run using the desired interconnect.
Running a test like this, especially on a new cluster, is
useful to ensure that the appropriate network drivers are installed
and that the network hardware is functioning properly. If any machine
has defective network cards or cables, this test can also be useful
for identifying which machine has the problem.
To compile the program, set the MPI_ROOT environment
variable to the location of HP-MPI. The default is
"C:\Program
Files (x86)\Hewlett-Packard\HP-MPI" for 64-bit
systems, and "C:\Program Files\Hewlett-Packard\HP-MPI"
for 32-bit systems. This may already be set by the HP-MPI install.
Open a command window for the compiler you plan on using.
This will include all libraries and compilers in path, and compile
the program using the mpicc wrappers:
> "%MPI_ROOT%\bin\mpicc" -mpi64 /out:pp.exe ^ "%MPI_ROOT%\help\ping_ping_ring.c"
Use the startup that is appropriate for your cluster. Your
situation should resemble one of the following:
If running on Windows CCS using automatic scheduling:
Submit the command to the scheduler, but include the total
number of processes needed on the nodes as the -np command.
This is NOT the rank count when used in this fashion. Also include
the -nodex flag to indicate only one rank/node.
Assume we have 4 CPUs/nodes in this cluster. The command would
be:
> "%MPI_ROOT%\bin\mpirun" -ccp -np 12 -IBAL -nodex -prot ^ ping_ping_ring.exe
> "%MPI_ROOT%\bin\mpirun" -ccp -np 12 -IBAL -nodex -prot ^ ping_ping_ring.exe 10000
In each case above, the first mpirun uses 0 bytes per message and is checking latency. The
second mpirun uses 1000000 bytes per message and is checking bandwidth.
 |
#include <stdio.h> #include <stdlib.h> #ifndef _WIN32 #include <unistd.h> #endif #include <string.h> #include <math.h> #include <mpi.h>#define NLOOPS 1000 #define ALIGN 4096#define SEND(t) MPI_Send(buf, nbytes, MPI_CHAR, partner, (t), \ MPI_COMM_WORLD) #define RECV(t) MPI_Recv(buf, nbytes, MPI_CHAR, partner, (t), \ MPI_COMM_WORLD, &status) #ifdef CHECK # define SETBUF() for (j=0; j<nbytes; j++) { \ buf[j] = (char) (j + i); \ }# define CLRBUF() memset(buf, 0, nbytes) # define CHKBUF() for (j = 0; j < nbytes; j++) { \ if (buf[j] != (char) (j + i)) { \ printf("error: buf[%d] = %d, " \ "not %d\n", \ j, buf[j], j + i); \ break; \ } \ }#else # define SETBUF() # define CLRBUF() # define CHKBUF() #endifint main(argc, argv)int argc; char *argv[];{ int i; #ifdef CHECK int j;#endif double start, stop; int n bytes = 0; int rank, size; int root; int partner; MPI_Status status; char *buf, *obuf; char myhost[MPI_MAX_PROCESSOR_NAME]; int len; char str[1024]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(myhost, &len); if (size < 2) { if ( ! rank) printf("rping: must have two+ processes\n"); MPI_Finalize(); exit(0); } nbytes = (argc > 1) ? atoi(argv[1]) : 0; if (nbytes < 0) nbytes = 0;/* * Page-align buffers and displace them in the cache to avoid collisions. */ buf = (char *) malloc(nbytes + 524288 + (ALIGN - 1)); obuf = buf; if (buf == 0) { MPI_Abort(MPI_COMM_WORLD, MPI_ERR_BUFFER); exit(1); } buf = (char *) ((((unsigned long) buf) + (ALIGN - 1)) & ~(ALIGN - 1)); if (rank > 0) buf += 524288; memset(buf, 0, nbytes); /* * Ping-pong. */ for (root=0; root<size; root++) { if (rank == root) { partner = (root + 1) % size; sprintf(str, "[%d:%s] ping-pong %d bytes ...\n", root, myhost, nbytes); /* * warm-up loop */ for (i = 0; i < 5; i++) { SEND(1); RECV(1); }/* * timing loop */ start = MPI_Wtime(); for (i = 0; i < NLOOPS; i++) { SETBUF(); SEND(1000 + i); CLRBUF(); RECV(2000 + i); CHKBUF(); } stop = MPI_Wtime(); sprintf(&str[strlen(str)], "%d bytes: %.2f usec/msg\n", nbytes, (stop - start) / NLOOPS / 2 * 1024 * 1024); if (nbytes > 0) { sprintf(&str[strlen(str)], "%d bytes: %.2f MB/sec\n", nbytes, nbytes / (1024. * 1024.) / ((stop - start) / NLOOPS / 2)); } fflush(stdout); } else if (rank == (root+1)%size) { /* * warm-up loop */ partner = root; for (i = 0; i < 5; i++) { RECV(1); SEND(1); } for (i = 0; i < NLOOPS; i++) { CLRBUF(); RECV(1000 + i); CHKBUF(); SETBUF(); SEND(2000 + i); } } MPI_Bcast(str, 1024, MPI_CHAR, root, MPI_COMM_WORLD); if (rank == 0) { printf("%s", str); } } free(obuf); MPI_Finalize(); exit(0);} |
 |
ping_pong_ring.c
output |
 |
Example output might look like:
Host 0 -- ip 172.16.159.3 -- ranks 0 Host 1 -- ip 172.16.150.23 -- ranks 1 Host 2 -- ip 172.16.150.24 -- ranks 2host | 0 1 2 =====|================ 0 : SHM IBAL IBAL 1 : IBAL SHM IBAL 2 : IBAL IBAL SHM[0:mpiccp3] ping-pong 1000000 bytes ... 1000000 bytes: 1089.29 usec/msg 1000000 bytes: 918.03 MB/sec [1:mpiccp4] ping-pong 1000000 bytes ... 1000000 bytes: 1091.99 usec/msg 1000000 bytes: 915.76 MB/sec [2:mpiccp5] ping-pong 1000000 bytes ... 1000000 bytes: 1084.63 usec/msg 1000000 bytes: 921.97 MB/sec |
The table showing SHM/IBAL is printed because of the -prot option (print
protocol) specified in the mpirun command.
It could show any of the following settings:
IBAL:
IBAL on InfiniBand
TCP: TCP/IP
MPID:
daemon communication mode
SHM:
shared memory (intra host only)
If one or more hosts show considerably worse performance than
another, it can often indicate a bad card or cable.
If the run aborts with some kind of error message, it is possible
that HP-MPI incorrectly determined which interconnect was available.