Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-MPI User's Guide > Appendix A Example applications

ping_pong_ring.c (Windows)

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Often, clusters might have both ethernet and some form of higher-speed interconnect such as InfiniBand. This section describes how to use the ping_pong_ring.c example program to confirm that you are able to run using the desired interconnect.

Running a test like this, especially on a new cluster, is useful to ensure that the appropriate network drivers are installed and that the network hardware is functioning properly. If any machine has defective network cards or cables, this test can also be useful for identifying which machine has the problem.

To compile the program, set the MPI_ROOT environment variable to the location of HP-MPI. The default is
"C:\Program Files (x86)\Hewlett-Packard\HP-MPI" for 64-bit systems, and "C:\Program Files\Hewlett-Packard\HP-MPI" for 32-bit systems. This may already be set by the HP-MPI install.

Open a command window for the compiler you plan on using. This will include all libraries and compilers in path, and compile the program using the mpicc wrappers:

> "%MPI_ROOT%\bin\mpicc" -mpi64 /out:pp.exe ^ "%MPI_ROOT%\help\ping_ping_ring.c"

Use the startup that is appropriate for your cluster. Your situation should resemble one of the following:

If running on Windows CCS using automatic scheduling:

Submit the command to the scheduler, but include the total number of processes needed on the nodes as the -np command. This is NOT the rank count when used in this fashion. Also include the -nodex flag to indicate only one rank/node.

Assume we have 4 CPUs/nodes in this cluster. The command would be:

> "%MPI_ROOT%\bin\mpirun" -ccp -np 12 -IBAL -nodex -prot ^ ping_ping_ring.exe

> "%MPI_ROOT%\bin\mpirun" -ccp -np 12 -IBAL -nodex -prot ^ ping_ping_ring.exe 10000

In each case above, the first mpirun uses 0 bytes per message and is checking latency. The second mpirun uses 1000000 bytes per message and is checking bandwidth.

#include <stdio.h>
#include <stdlib.h>
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string.h>
#include <math.h>
#include <mpi.h>#define NLOOPS      1000
#define ALIGN       4096#define SEND(t)    MPI_Send(buf, nbytes, MPI_CHAR, partner, (t), \
                 MPI_COMM_WORLD)
#define RECV(t)   MPI_Recv(buf, nbytes, MPI_CHAR, partner, (t), \                  MPI_COMM_WORLD, &status)
#ifdef CHECK
# define SETBUF() for (j=0; j<nbytes; j++) { \
                   buf[j] = (char) (j + i); \
              }# define CLRBUF() memset(buf, 0, nbytes)
# define CHKBUF() for (j = 0; j < nbytes; j++) { \
                   if (buf[j] != (char) (j + i)) { \
                        printf("error: buf[%d] = %d, " \
                             "not %d\n", \
                             j, buf[j], j + i); \
                       break; \                   } \              }#else
# define SETBUF()
# define CLRBUF()
# define CHKBUF()
#endifint
main(argc, argv)int               argc;
char             *argv[];{
     int          i;
#ifdef CHECK
      int          j;#endif
      double            start, stop;
      int n        bytes = 0;
      int          rank, size;
      int          root;
      int          partner;
      MPI_Status   status;
      char         *buf, *obuf;
      char         myhost[MPI_MAX_PROCESSOR_NAME];
      int          len;
      char         str[1024];      MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
      MPI_Comm_size(MPI_COMM_WORLD, &size);
      MPI_Get_processor_name(myhost, &len);      if (size < 2) {
         if ( ! rank) printf("rping: must have two+ processes\n");
        MPI_Finalize();
         exit(0);      }     nbytes = (argc > 1) ? atoi(argv[1]) : 0;
     if (nbytes < 0) nbytes = 0;/*
 * Page-align buffers and displace them in the cache to avoid      collisions.
 */
      buf = (char *) malloc(nbytes + 524288 + (ALIGN - 1));
      obuf = buf;
      if (buf == 0) {
            MPI_Abort(MPI_COMM_WORLD, MPI_ERR_BUFFER);
            exit(1);
      }

      buf = (char *) ((((unsigned long) buf) + (ALIGN - 1)) &         ~(ALIGN - 1));
      if (rank > 0) buf += 524288;
      memset(buf, 0, nbytes);
/*
 * Ping-pong.
 */
       for (root=0; root<size; root++) {
             if (rank == root) {
                 partner = (root + 1) % size;
                sprintf(str, "[%d:%s] ping-pong %d bytes ...\n",                      root, myhost, nbytes);
/*
 * warm-up loop
 */                  for (i = 0; i < 5; i++) {
                       SEND(1);
                        RECV(1);
               }/*
 * timing loop
 */
              start = MPI_Wtime();
             for (i = 0; i < NLOOPS; i++) {
                  SETBUF();
                  SEND(1000 + i);
                  CLRBUF();
                 RECV(2000 + i);
                 CHKBUF();
             }
             stop = MPI_Wtime();

             sprintf(&str[strlen(str)],
                  "%d bytes: %.2f usec/msg\n", nbytes,
                  (stop - start) / NLOOPS / 2 * 1024 * 1024);
             if (nbytes > 0) {
                  sprintf(&str[strlen(str)],
                      "%d bytes: %.2f MB/sec\n", nbytes,
                      nbytes / (1024. * 1024.) /
                     ((stop - start) / NLOOPS / 2));
             }
             fflush(stdout);
       } else if (rank == (root+1)%size) {
/*
 * warm-up loop
 */
             partner = root;
             for (i = 0; i < 5; i++) {
                  RECV(1);
                 SEND(1);
             }
            for (i = 0; i < NLOOPS; i++) {
                 CLRBUF();
                 RECV(1000 + i);
                 CHKBUF();
                SETBUF();
                SEND(2000 + i);
             }
        }
       
        MPI_Bcast(str, 1024, MPI_CHAR, root, MPI_COMM_WORLD);
        if (rank == 0) {
               printf("%s", str);
        }
     }

     free(obuf);
     MPI_Finalize();
     exit(0);}

ping_pong_ring.c output

Example output might look like:

Host 0 -- ip 172.16.159.3 -- ranks 0
Host 1 -- ip 172.16.150.23 -- ranks 1
Host 2 -- ip 172.16.150.24 -- ranks 2host | 0 1 2
=====|================
   0 : SHM IBAL IBAL
   1 : IBAL SHM IBAL
   2 : IBAL IBAL SHM[0:mpiccp3] ping-pong 1000000 bytes ...
1000000 bytes: 1089.29 usec/msg
1000000 bytes: 918.03 MB/sec
[1:mpiccp4] ping-pong 1000000 bytes ...
1000000 bytes: 1091.99 usec/msg
1000000 bytes: 915.76 MB/sec
[2:mpiccp5] ping-pong 1000000 bytes ...
1000000 bytes: 1084.63 usec/msg
1000000 bytes: 921.97 MB/sec

The table showing SHM/IBAL is printed because of the -prot option (print protocol) specified in the mpirun command.

It could show any of the following settings:
   IBAL: IBAL on InfiniBand
   TCP: TCP/IP
   MPID: daemon communication mode
   SHM: shared memory (intra host only)

If one or more hosts show considerably worse performance than another, it can often indicate a bad card or cable.

If the run aborts with some kind of error message, it is possible that HP-MPI incorrectly determined which interconnect was available.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1979-2007 Hewlett-Packard Development Company, L.P.