There are several classes of problems that require faster processing:
The data parallel model is defined as:
The compiler converts the program into standard code and calls to a message passing library to distribute the data to all the processes.
That covers array optimization, loop optimization, tuning for arithmetic operations, etc.
For example, if there are many tasks of varying sizes, it may be more efficient to maintain a task pool and distribute to processors as each finishes
DO 500 J = MYSTART,MYEND
A(J) = A(J-1) * 2.0
500 CONTINUE
If Task 2 has A(J) and Task 1 has A(J-1), the value of A(J) is
dependent on:
Task 2 obtaining the value of A(J-1) from Task 1
Whether Task 2 reads A(J-1) before or after Task 1 updates it
task 1 task 2
------ ------
X = 2 X = 4
. .
. .
Y = X**2 Y = X**3
The value of Y is dependent on:
If and/or when the value of X is communicated between the tasks.
Which task last stores the value of X.
Communicate required data at synchronization points.
Synchronize read/write operations between tasks.
program hello
include 'mpif.h'
integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
character(12) message
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
tag = 100
if(rank .eq. 0) then
message = 'Hello, world'
do i=1, size-1
call MPI_SEND(message, 12, MPI_CHARACTER, i, tag,
& MPI_COMM_WORLD, ierror)
enddo
else
call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag,
& MPI_COMM_WORLD, status, ierror)
endif
print*, 'node', rank, ':', message
call MPI_FINALIZE(ierror)
end
f77 program.f -lmpi (for Fortran77)
f90 program.f -lmpi (for Fortran90)
cc program.c -lmpi (for C)
CC program.C -lmpi (for C++)
mpirun -np 4 a.out
These scripts are: mpxlf - Fortran mpxlf options program.f mpcc - C mpcc options program.c mpCC - C++ mpCC options program.C
poe exec -rmpool 1 -procs 4 -euidevice css0 -euilib us