Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-MPI User's Guide > Appendix A Example applications

ping_pong_ring.c (HP-UX and Linux)

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Often a cluster might have both regular ethernet and some form of higher speed interconnect such as InfiniBand. This section describes how to use the ping_pong_ring.c example program to confirm that you are able to run using the desired interconnect.

Running a test like this, especially on a new cluster, is useful to ensure that the appropriate network drivers are installed and that the network hardware is functioning properly. If any machine has defective network cards or cables, this test can also be useful at identifying which machine has the problem.

To compile the program, set the MPI_ROOT environment variable (not required, but recommended) to a value such as /opt/hpmpi (Linux) or /opt/mpi (HP-UX), then run

% export MPI_CC=gcc (whatever compiler you want)

% $MPI_ROOT/bin/mpicc -o pp.x \ $MPI_ROOT/help/ping_pong_ring.c

Although mpicc will perform a search for what compiler to use if you don't specify MPI_CC, it is preferable to be explicit.

If you have a shared filesystem, it is easiest to put the resulting pp.x executable there, otherwise you will have to explicitly copy it to each machine in your cluster.

As discussed elsewhere, there are a variety of supported startup methods, and you need to know which is appropriate for your cluster. Your situation should resemble one of the following:

  • No srun, prun, or CCS job scheduler command is available

    For this case you can create an appfile such as the following:

    -h hostA -np 1 /path/to/pp.x
    -h hostB -np 1 /path/to/pp.x
    -h hostC -np 1 /path/to/pp.x
    ...
    -h hostZ -np 1 /path/to/pp.x

    And you can specify what remote shell command to use (Linux default is ssh) in the MPI_REMSH environment variable.

    For example you might want

    % export MPI_REMSH="rsh -x" (optional)

    Then run

    % $MPI_ROOT/bin/mpirun -prot -f appfile

    % $MPI_ROOT/bin/mpirun -prot -f appfile -- 1000000

    Or if LSF is being used, then the hostnames in the appfile wouldn't matter, and the command to run would be

    % bsub pam -mpi $MPI_ROOT/bin/mpirun -prot -f appfile

    % bsub pam -mpi $MPI_ROOT/bin/mpirun -prot -f appfile \
    -- 1000000

  • The srun command is available

    For this case then you would run a command like

    % $MPI_ROOT/bin/mpirun -prot -srun -N 8 -n 8 /path/to/pp.x

    % $MPI_ROOT/bin/mpirun -prot -srun -N 8 -n 8 /path/to/ \ pp.x 1000000

    replacing "8" with the number of hosts.

    Or if LSF is being used, then the command to run might be

    % bsub -I -n 16 $MPI_ROOT/bin/mpirun -prot -srun \ /path/to/pp.x

    % bsub -I -n 16 $MPI_ROOT/bin/mpirun -prot -srun \ /path/to/pp.x 1000000

  • The prun command is available

    This case is basically identical to the srun case with the obvious change of using prun in place of srun.

In each case above, the first mpirun uses 0-bytes of data per message and is for checking latency. The second mpirun uses 1000000 bytes per message and is for checking bandwidth.

#include <stdio.h>
#include <stdlib.h>
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string.h>
#include <math.h>
#include <mpi.h>#define NLOOPS      1000
#define ALIGN       4096#define SEND(t)    MPI_Send(buf, nbytes, MPI_CHAR, partner, (t), \
                 MPI_COMM_WORLD)
#define RECV(t)   MPI_Recv(buf, nbytes, MPI_CHAR, partner, (t), \                  MPI_COMM_WORLD, &status)
#ifdef CHECK
# define SETBUF() for (j=0; j<nbytes; j++) { \
                   buf[j] = (char) (j + i); \
              }# define CLRBUF() memset(buf, 0, nbytes)
# define CHKBUF() for (j = 0; j < nbytes; j++) { \
                   if (buf[j] != (char) (j + i)) { \
                        printf("error: buf[%d] = %d, " \
                             "not %d\n", \
                             j, buf[j], j + i); \
                       break; \                   } \              }#else
# define SETBUF()
# define CLRBUF()
# define CHKBUF()
#endifint
main(argc, argv)int               argc;
char             *argv[];{
     int          i;
#ifdef CHECK
      int          j;#endif
      double            start, stop;
      int n        bytes = 0;
      int          rank, size;
      int          root;
      int          partner;
      MPI_Status   status;
      char         *buf, *obuf;
      char         myhost[MPI_MAX_PROCESSOR_NAME];
      int          len;
      char         str[1024];      MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
      MPI_Comm_size(MPI_COMM_WORLD, &size);
      MPI_Get_processor_name(myhost, &len);      if (size < 2) {
         if ( ! rank) printf("rping: must have two+ processes\n");
        MPI_Finalize();
         exit(0);      }     nbytes = (argc > 1) ? atoi(argv[1]) : 0;
     if (nbytes < 0) nbytes = 0;/*
 * Page-align buffers and displace them in the cache to avoid      collisions.
 */
      buf = (char *) malloc(nbytes + 524288 + (ALIGN - 1));
      obuf = buf;
      if (buf == 0) {
            MPI_Abort(MPI_COMM_WORLD, MPI_ERR_BUFFER);
            exit(1);
      }

      buf = (char *) ((((unsigned long) buf) + (ALIGN - 1)) &         ~(ALIGN - 1));
      if (rank > 0) buf += 524288;
      memset(buf, 0, nbytes);
/*
 * Ping-pong.
 */
       for (root=0; root<size; root++) {
             if (rank == root) {
                 partner = (root + 1) % size;
                sprintf(str, "[%d:%s] ping-pong %d bytes ...\n",                      root, myhost, nbytes);
/*
 * warm-up loop
 */                  for (i = 0; i < 5; i++) {
                       SEND(1);
                        RECV(1);
               }/*
 * timing loop
 */
              start = MPI_Wtime();
             for (i = 0; i < NLOOPS; i++) {
                  SETBUF();
                  SEND(1000 + i);
                  CLRBUF();
                 RECV(2000 + i);
                 CHKBUF();
             }
             stop = MPI_Wtime();

             sprintf(&str[strlen(str)],
                  "%d bytes: %.2f usec/msg\n", nbytes,
                  (stop - start) / NLOOPS / 2 * 1024 * 1024);
             if (nbytes > 0) {
                  sprintf(&str[strlen(str)],
                      "%d bytes: %.2f MB/sec\n", nbytes,
                      nbytes / (1024. * 1024.) /
                     ((stop - start) / NLOOPS / 2));
             }
             fflush(stdout);
       } else if (rank == (root+1)%size) {
/*
 * warm-up loop
 */
             partner = root;
             for (i = 0; i < 5; i++) {
                  RECV(1);
                 SEND(1);
             }
            for (i = 0; i < NLOOPS; i++) {
                 CLRBUF();
                 RECV(1000 + i);
                 CHKBUF();
                SETBUF();
                SEND(2000 + i);
             }
        }
       
        MPI_Bcast(str, 1024, MPI_CHAR, root, MPI_COMM_WORLD);
        if (rank == 0) {
               printf("%s", str);
        }
     }

     free(obuf);
     MPI_Finalize();
     exit(0);}

ping_pong_ring.c output

Example output might look like:

> Host 0 -- ip 192.168.9.10 -- ranks 0
> Host 1 -- ip 192.168.9.11 -- ranks 1
> Host 2 -- ip 192.168.9.12 -- ranks 2
> Host 3 -- ip 192.168.9.13 -- ranks 3
>
> host | 0 1 2 3
> ======|=====================
> 0 : SHM VAPI VAPI VAPI
> 1 : VAPI SHM VAPI VAPI
> 2 : VAPI VAPI SHM VAPI
> 3 : VAPI VAPI VAPI SHM
>
> [0:hostA] ping-pong 0 bytes ...
> 0 bytes: 4.57 usec/msg
> [1:hostB] ping-pong 0 bytes ...
> 0 bytes: 4.38 usec/msg
> [2:hostC] ping-pong 0 bytes ...
> 0 bytes: 4.42 usec/msg
> [3:hostD] ping-pong 0 bytes ...
> 0 bytes: 4.42 usec/msg

The table showing SHM/VAPI is printed because of the "-prot" option (print protocol) specified in the mpirun command. In general, it could show any of the following settings:

VAPI: InfiniBand

UDAPL: InfiniBand

IBV: InfiniBand

PSM: InfiniBand

MX: Myrinet MX

IBAL: InfiniBand (on Windows only)

IT: IT-API on InfiniBand

GM: Myrinet GM2

ELAN: Quadrics Elan4

TCP: TCP/IP

MPID: commd

SHM: Shared Memory (intra host only)

If the table shows TCP/IP for one or more hosts, it is possible that the host doesn't have the appropriate network drivers installed.

If one or more hosts show considerably worse performance than another, it can often indicate a bad card or cable.

If the run aborts with some kind of error message, it is possible that HP-MPI determined incorrectly what interconnect was available. One common way to encounter this problem is to run a 32-bit application on a 64-bit machine like an Opteron or Intel®64. It is not uncommon for the network vendors for InfiniBand and others to only provide 64-bit libraries for their network.

HP-MPI makes its decision about what interconnect to use before it even knows the application's bitness. In order to have proper network selection in that case, one must specify if the app is 32-bit when running on Opteron and Intel®64 machines:

% $MPI_ROOT/bin/mpirun -mpi32 ...

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1979-2007 Hewlett-Packard Development Company, L.P.