The definition of a processor is no longer as clear as it once was. Vendors are using the word core to refer to an independent processing element that is physically on the same chip with one or more other independent processing elements. The cores are still independent processing elements and it is up to the user of the system to write code to run on multiple cores. Each dual-core AMD Opteron processor has two cores. Therefore, within an IBM LS21 BladeCenter node, one can run what should be called a 4-core job. Unfortunately, to add to the confusion, this is still referred to as a 4-processor job.
Each node:
Scratch directories are not backed up. All files in the scratch directories that have not been modified for 14 days will be deleted.
The ACML library (AMD Core Math Library) is available in /usr/local/acml/pathscale64/lib.
The FFTW library is in /usr/local/fftw/3.1.2_pathscale/lib for the PathScale compiler. (For the Intel compiler, it is under /usr/local/fftw/3.1.2_intel/lib.)
The GOTO library is available in /usr/local/goto, which is an optimised implementation of basic linear algebra subroutines (BLAS) routines. One can link to the library using pathf90 -o a.out my_program.o /usr/local/goto/libgoto_opteron64p-r1.00.so -lpthread The GOTO library routines are threaded. One can control the number of thread that are used with the GOTO_NUM_THREADS environment variable. export GOTO_NUM_THREADS=1
The LAPACK library is in /usr/local/lapack/LAPACK.
The ScaLAPACK (or Scalable LAPACK) library is also available in /usr/local/scalapack.
Nodes: 16 Cores: 64 Wallclock maximum: 1 hour
qsub -q devel
Nodes: 268 Cores: 1072 Wallclock maximum: 24 hoursThis is the default queue for production jobs. Jobs that do not request the devel queue will go in this queue.
Nodes: 15 Cores: 60 Wallclock maximum: 24 hoursThese blades have no Infiniband connections. It is intended for non-parallel jobs. MPICH1 using ethernet is available to allow mpi jobs. Code can be compiled to use MPICH1 in the standard way:
module load mpich1 mpicc -o mycode mycode.cTo submit a job to this bladejr queue, one needs to explicitly request it by:
qsub -q bladejr
Jobs in violation of these limits will not be considered for scheduling.
ssh -X -l username blade.msi.umn.eduOne only needs to use -l if his/her login name is different than the login name one uses on his/her own workstation.
Three compilers are available on the BladeCenter. Besides pathscale, one can also use intel and portlane group compilers.
To use the Intel compiler, you do the following:
module delete pathmpi
module add intelmpi
To use PGI compiler, you do the following:
module delete pathmpi
module add pgmpi
The default configuration is to load the pathmpi module. This makes the user's environment
ready to compile with the pathscale compilers.
To build with PathScale, the following compilers are ready to compile FORTRAN codes, C codes,
or C++ codes respectively:
pathf90
pathcc
pathCC
To build with Intel compilers,
module load intel
ifort
icc
icpc
Example:
pathf90 -O3 ./mm2_blas.f -L/home/bc1/haoyu/MM2/ -lgoto_opteron64p-r1.00 -lpthread
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/bc1/haoyu/MM2/
(The above command is necessary in order to run the compiled file "a.out" because
the library file, which is located at /home/bc1/haoyu/MM2/,is used to compile
the "a.out".)
export OMP_NUM_THREADS=1
For the same example, one could also try to use some other compiling options to see
if the code performance could be improved. For example, try the following:
pathf90 -O3 -Ofast -fno-math-errno ./mm2_blas.f -L/home/bc1/haoyu/MM2/ -lgoto_opteron64p-r1.00 -lpthread
Parallel codes:
To build with PathScale, module unload intelmpi module load pathmpi mpicc mpicxx mpif90 To build with Intel, module unload pathmpi module load intelmpi mpicc mpicxx mpif90 Note: One cannot have both pathmpi and intelmpi loaded at the same time. To build with MPICH1, module unload pathmpi module unload intelmpi module load mpich1 mpicc -o mycode mycode.c
To build with PathScale, module load pathscale pathcc pathCC pathf90 Note: It is important to use the option: -mp To build with Intel, module load intel icc icpc ifort Note: It is important to use the option: -openmpExamples:
Code with MPI: module load pathmpi mpif90 -o test mycode.f mpicc -o test mycode.c mpif90 -O3 -Ofast -fno-math-errno ./mpi_test_code.f module unload pathmpi module load intelmpi mpif90 -O3 -ipo -no-prec-div ./mpi_test_code.f
Code with OpenMp directives: module load pathscale pathf90 -O3 -mp -ipa mycode.f
module load pathmpi
mpirun -np 2 blade285 blade285 ./a.out > output
This will run two processes on the interactive node.
Please note only 4 cores are available on the interactive
node.
To run with OpenMP:
export OMP_NUM_THREADS= #
Example:
module load pathscale
export OMP_NUM_THREADS= 2
./a.out
In your .bashrc file, add the following
ulimit -s unlimited
ulimit -n 4096
to maximize the stack size and to provide control over the resources available.
Create a script file for PBS (see examples below) and submit the script file to PBS by: qsub myscript To check the status of your own jobs in the queue, use the command: showq -u userid Examples of PBS script files:
#!/bin/bash -l #PBS -l walltime=1:00:00,mem=2gb,nodes=1:ppn=1 #PBS -m abe cd /home/bc1/TEST/mytest_pbs ./a.out # # ==== end of the sample script file ==== This script is for a single processor job that uses 2 GB of memory and can run for up to 1 hour. Because of the -m switch, PBS will send email to you if the job is aborted (a), when the job begins running (b), and when the job terminates (e). The lines with #PBS lines are the lines that start this job.
#!/bin/bash -l #PBS -l walltime=10:00:00,pmem=500mb,nodes=2:ppn=4 #PBS -m abe cd /home/bc1/TEST/mytest_pbs module load pathmpi mpirun -np 8 -hostfile $PBS_NODEFILE ./a.out # # ==== end of the sample script file ==== This is a PBS script again for a 8-processor MPI job that will run up to 10 hours on 2 nodes using 4 processors per node. This time the script is using "pmem=500mb" to request 500 MB of memory per-task.
#!/bin/bash -l #PBS -l walltime=10:00,mem=1gb,nodes=2:ppn=4 #PBS -m abe cd /home/bc1/TEST/mpich1/examples module load mpich1 mpirun -machinefile $PBS_NODEFILE -np 8 ./cpi # # ==== end of the sample script file ==== In this PBS script, it is using MPICH1 for a job that will run up to 10 hours on 2 nodes using 4 processors per node.
Parameters that change various MPI behavior can be set at runtime with the -paramfile flag, e.g.;
mpirun -np 800 -paramfile ./my_defaults.param -hostfile $PBS_NODEFILE ./a.out
Some of the useful parameters that may be used in ./my_defaults.param are:
VIADEV_ENABLE_ADAPTIVE_FAST_PATH = 1
VIADEV_ADAPTIVE_RDMA_LIMIT = 20
VIADEV_NUM_RDMA_BUFFER = 8
Some of others may be also useful:
VIADEV_PREPOST_DEPTH = 8
VBUF_TOTAL_SIZE = 2048
VIADEV_MAX_TOTAL_REG_MEM_SIZE = 200
With the PathScale compilers:
Recommended options:
-i8 -O3 -OPT:Ofast -fno-math-errno
Other useful flags:
-O3 basic optimizations
-Ofast this turns on -O3 as well as -ipa -fno-math-errno -ffast-math
the -ipa portion will greatly increase compile times
-LNO:simd=2 this turns on the Loop Nest Optimizer
With the Intel compilers:
Recommended options:
-ipo -O3 -no-prec-div
Note: Not all static libraries are available for MPI codes. the "-static" flag will not work for MPI codes.