Introduction to MPI

This tutorial will provide an introduction to the Message Passing Interface (MPI) library. The goal of this tutorial is to enable people to write, compile, and execute simple MPI programs on parallel computers such as the IBM SP and Origin 2000 supercomputers.

1. Introduction

2. MPI Programs

To get a feel for writing MPI programs, we'll focus on the simplest program imaginable, the standard Hello world program. The basic outline of an MPI program follows these general steps:

MPI has over 125 functions. However, a beginning programmer usually can make do with only six of these functions. These six functions are illustrated in our sample program.

2.1. Format of MPI calls

First, we'll look at the actual calling formats used by MPI.

2.2. An MPI sample program (Fortran)

As you look at the code below, note the various calls to MPI routines. Click on the name of each MPI routine to read a detailed description of that routine's purpose and syntax.

C version



          program hello
          include 'mpif.h'
          integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
          character(12) message
        
          call MPI_INIT(ierror)
          call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
          call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
          tag = 100
          if(rank .eq. 0) then
        
            message = 'Hello, world'
            do i=1, size-1
        
               call MPI_SEND(message, 12, MPI_CHARACTER, i, tag, 
        
        &                    MPI_COMM_WORLD, ierror)
            enddo
          else
        
               call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag,
        
        &                    MPI_COMM_WORLD, status, ierror)
          endif
          print*, 'node', rank, ':', message
        
          call MPI_FINALIZE(ierror)
          end

To summarize the program: This is a SPMD code, so copies of this program are running on multiple nodes. Each process initializes itself with MPI (MPI_INIT), determines the number of processes (MPI_COMM_SIZE), and learns its rank (MPI_COMM_RANK). Then one process (with rank 0) sends messages in a loop (MPI_SEND), setting the destination argument to the loop index to ensure that each of the other processes is sent one message. The remaining processes receive one message (MPI_RECV). All processes then print the message, and exit from MPI (MPI_FINALIZE).

3. MPI messages

MPI messages consist of two basic parts: the actual data that you want to send/receive, and an envelope of information that helps to route the data. There are usually three calling parameters in MPI message-passing calls that describe the data, and another three parameters that specify the routing:

Message = data(3 parameters) + envelope(3 parameters)

   startbuf,count,datatype,  dest,tag,comm
       \      |    /           \   |   /
        \---DATA--/             ENVELOPE
                     
Let's look at the data and envelope in more detail. We'll describe each parameter, and discuss whether these must be coordinated between the sender and receiver.

3.1. Data

The buffer (location in your program where data are to be sent from or stored to) is specified by three calling parameters:

MPI Basic Datatypes for C

        ---------------------------------------
        MPI Datatype         C datatype          
        ---------------------------------------
        MPI_CHAR             signed char         
        MPI_SHORT            signed short int    
        MPI_INT              signed int          
        MPI_LONG             signed long int     
        MPI_UNSIGNED_CHAR    unsigned char       
        MPI_UNSIGNED_SHORT   unsigned short int  
        MPI_UNSIGNED         unsigned int        
        MPI_UNSIGNED_LONG    unsigned long int   
        MPI_FLOAT            float               
        MPI_DOUBLE           double              
        MPI_LONG_DOUBLE      long double         
        MPI_BYTE                                 
        MPI_PACKED                               
        ---------------------------------------

MPI Basic Datatypes for Fortran

        ---------------------------------------
        MPI Datatype           Fortran Datatype  
        ---------------------------------------
        MPI_INTEGER            INTEGER           
        MPI_REAL               REAL              
        MPI_DOUBLE_PRECISION   DOUBLE PRECISION  
        MPI_COMPLEX            COMPLEX           
        MPI_LOGICAL            LOGICAL           
        MPI_CHARACTER          CHARACTER(1)      
        MPI_BYTE                                 
        MPI_PACKED                               
        ---------------------------------------

3.2 Envelope

As mentioned earlier, a message consists of the actual data and the message envelope. The envelope provides information on how to match sends to receives. The three parameters used to specify the message envelope are:

4. Communicators

4.1 Why have communicators?

As promised earlier, we are now going to explain a little more about communicators. We will not go into great detail -- only enough that you will have some understanding of how they are used.

As described above, a message's eligibility to be picked up by a specific receive call depends on its source, tag, and communicator. Tag allows the program to distinguish between types of messages. Source simplifies programming. Instead of having a unique tag for each message, each process sending the same information can use the same tag. But why is a communicator needed?

An example

Suppose you are sending messages between your processes, but you are also calling a set of libraries you obtained elsewhere, which also runs on multiple nodes and communicates within itself using MPI. In this case, you want to make sure that messages you send go to your processes, and do not get confused with the messages being sent internally within the library routines.

In this example, we have three processes communicating with each other. Each process also calls a library routine, and the three library routines communicate with each other. We want to have two different message "spaces" here, one for our messages, and one for the library's messages. We do not want any intermingling of the messages.

The diagram below shows what we would like to happen. The white boxes represent our own routines ("caller"). The shaded boxes represent the library routines ("callee"). Sends and receives are indicated with their destination or source. In this case, everything works as intended.

However, there is no guarantee that things will occur in this order, given the fact that the relative scheduling of processes on different nodes can occur non-deterministically. Suppose, for example, that we change the third process by adding some computation at the beginning. The sequence of events might then occur as follows:

In this case, communications do not occur as intended. The first "receive" in process 0 now receives the "send" from the library routine in process 1, not the intended (and now delayed) "send" from process 2. As a result, all three processes hang.

This problem is solved by the library developer defining a new communicator, and specifying this in all send and receive calls made by the library. This would create a library ("callee") message space separate from the user's ("caller") message space. Tags can't be used to separate the message spaces because they are chosen by the programmer. How can the programmer avoid using the same tags as the library, when they don't know what tags the library uses? Communicators are returned by the system and incorporate a system-wide unique identifier.

4.2 Communicators and process groups

In addition to development of parallel libraries, communicators are useful in organizing communication within an application.

Up to this point, we've looked at communicators that include all processes in the application. But the programmer can also define a subset of processes, called a process group, and attach one or more communicators to the process group. Communication specifying that communicator is now restricted to those processes.

This also ties directly into use of collective communications.

To revisit the bill collection analogy: one person may have an account with the electric and phone companies (2 communicators) but not with the water company. The electric communicator may contain different people than the phone communicator. A person's ID number (rank) may vary with the utility (communicator). So, it is critical to note that the rank given as message source or destination is the rank in the specified communicator.

5. Compiling and Running MPI jobs on the IBM SP

6. Compiling and Running MPI jobs on the Origin

7. Summary

Although MPI provides for an extensive, and sometimes complex, set of calls, you can begin with just the six basic calls:

However, for programming convenience and optimization of your code, you should consider using other calls, such as those described in the more advanced references.

MPI Messages

MPI messages consist of two parts:

data (startbuf, count, datatype)
envelope (destination/source, tag, communicator)

The data defines the information to be sent or received. The envelope is used in routing messages to the receiver, and in matching send calls to receive calls.

Communicators

Communicators guarantee unique message spaces. In conjunction with process groups, they can be used to limit communication to a subset of processes.

8. Acknowledgements, References and Online Examples

This talk was based on the "Basics of MPI Programming" developed at the Cornell Theory center and on the tutorial by Ewing Lusk.

Good MPI references are:

Using MPI by Wiliam Gropp, Ewing Lusk, Anthony Skjellum

Message Passing Interface Forum (1995) MPI: A Message Passing Interface Standard. June 12, 1995.

RS/6000 SP: Practical MPI Programming by Yukiya Aoyama and Jun Nakano.

Online Example Programs: