Each node:
Local scratch is available on 86 of the compute nodes. To use local scratch, you need to request the scratch feature in your submission script, e.g.; #PBS -l nodes=4:scratch:ppn=8
Scratch directories are not backed up. All files in the scratch directories that have not been modified for 14 days will be deleted.
The FFTW library version 3 is in /soft/fftw/lib. When compile, one might link with the library by: module load intel module load fftw icc -o mycode mycode.c -L/soft/fftw/lib -lfftw3 -lm To use the FFTW library version 2, one could do the following: module load intel module load fftw2 icc -o mycode mycode.c -L/soft/fftw/fftw-2.1.5/lib -lfftw -lm To use FFTW with MPI: module load intel module load vmpi module load fftw2 icc -o mycode mycode.c -L/soft/fftw/fftw-2.1.5/lib -lfftw_mpi -lm
The MKL library is in /cluster/apps/intel/cmkl/9.1.021/lib/em64t/.
The LAPACK library is available through the MKL library in /cluster/apps/intel/cmkl/9.1.021/lib/em64t/. For example when compile, one could link with the LAPACK library by -L/cluster/apps/intel/cmkl/9.1.021/lib/em64t -lmkl_lapack -lmkl -lpthread
The ScaLAPACK library is in /soft/scalapack/scalapack_vmpi/lib/.
ssh -X -l username calhoun.msi.umn.eduOne only needs to use -l if his/her login name is different than the login name one uses on his/her own workstation.
The Intel and GNU compilers are available on Calhoun.
To use the Intel compiler, you do the following:
module add intel
To build with Intel compilers,
module load intel
ifort
icc
icpc
Example:
module load intel
ifort -O3 ./mm2_blas.f -L/cluster/apps/intel/cmkl/9.1.021/lib/em64t -lmkl -lpthread
Parallel codes:
MPI Compiler Module
-------------------------------------------------------
Voltaire MPI Intel vmpi
GNU vmpi/gnu
IntelMPI Intel impi
GNU impi/gnu
OpenMPI Intel ompi
GNU ompi/gnu
The recommended MPI for use on Calhoun is Voltaire MPI.
To build with Intel and Voltaire MPI, module load intel module load vmpi mpicc mpicxx mpif90
To build with Intel, module load intel icc icpc ifort Note: It is important to use the option: -openmpExamples:
Code with MPI: module load intel module load vmpi mpif90 -o test mycode.f mpicc -o test mycode.c mpif90 -O3 -fast -xT ./mpi_test_code.f For more information about the compiler options, see the man page: man ifort
Code with OpenMP directives: module load intel ifort -O3 -openmp mycode.f
#PBS -l walltime=1:00:00,mem=14gb,nodes=4:ppn=8 #PBS -m abe module load intel module load vmpi /usr/bin/time mpirun -np 32 -hostfile $PBS_NODEFILE ./a.out >& run.outTo run with OpenMP:
export OMP_NUM_THREADS= #
Example:
module load intel
export OMP_NUM_THREADS= 2
./a.out
In your .bashrc file, add the following
ulimit -s unlimited
ulimit -n 4096
to maximize the stack size and to provide control over the resources available.
Create a script file for PBS (see examples below) and submit the script file to PBS by: qsub myscript To check the status of your own jobs in the queue, use the command: showq -u userid Examples of PBS script files:
#!/bin/bash -l #PBS -l walltime=10:00:00,pmem=500mb,nodes=2:ppn=8 #PBS -m abe cd /home/xe1/fred/TEST/mytest_pbs module load intel module load vmpi /usr/bin/time mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out >& run.out # # ==== end of the sample script file ==== This is a PBS script again for a 16-processor MPI job that will run up to 10 hours on 2 nodes using 8 processors per node. This time the script is using "pmem=500mb" to request 500 MB of memory per-task.
Parameters that change various MPI behavior can be set at runtime with the -paramfile flag, e.g.;
With the Intel compilers: Recommended options: -ipo -O3 -no-prec-div (or just use the option: -fast)
Calhoun has 512 Intel Xeon 5355 "Clovertown" multi-chip modules. (MCMs) Each MCM is composed of 2 dies. These dies are 2 seperate pieces of silicon connected to each other and arranged on a single module that plugs into a socket on the board. Each die has 2 processor cores that share a 8 MB L2 cache. Each MCM communicates with the memory via a 1333MHz FSB. A diagram below illustrates how the logical processor numbering corresponds to the physical locations of the processor cores: Diagram of Clovertown Processor