- Traditional Compilers
- Tesla Compilers
- NVIDIA CUDA
- Portland Group Compiler
- Additional Information
1. Traditional Compilers
All compilers and libraries that are available on abe, are available for use
on the lincoln nodes; please refer to the abe documentation for details. Note
that availability on lincoln does not imply GPU capability; for GPU-aware
compilers, see below.
2. Tesla Compilers
2.1 NVDIA CUDA
- CUDA environment
Add the appropriate softenv keys to your $HOME/.soft file, for example:
+cuda-3.2
+nvidia-sdk-cuda-3.2
(For concreteness here, and below, we are using version 3.2; other versions can be listed with softenv.)
The NVIDIA compiler is nvcc, in the path defined by the CUDA
softenv key.
Notes:
- You can compile on the head nodes, but access to nodes with the
Tesla devices is available only via PBS batch jobs.
- There is no support for Fortran in the standard NVIDIA CUDA distribution,
however, PGI and NVIDIA have collaborated on CUDA Fortran; cf. below.
- CUDA SDK
Example code, documentation and several utilities can be found under /usr/local/NVIDIA_GPU_Computing_SDK-3.2.
When porting code to CUDA, the examples in the SDK can be quite useful,
both as illustration and as templates for certain algorithms (marching cubes, Monte Carlo). In addition, there are further examples and tutorials
available on the NVIDIA site at the link below.
To use the examples, one should copy the NVIDIA_GPU_Computing_SDK-3.2
directory to your home directory, calling it, say, "nvidia_examples":
cd $HOME
cp -r /usr/local/NVIDIA_GPU_Computing_SDK-3.2 nvidia_examples
Some examples will require libraries resident only on the compute nodes to compile; this can be done within an interactive batch job, obtained thusly:
qsub -I -V -q lincoln -lwalltime=00:30:00,nodes=1:ppn=8
One can then build the examples as follows:
setenv CUDA_INSTALL_PATH /usr/local/cuda-3.2
cd $HOME/nvidia_examples/C
make
Note: For building the simpleMPI example, also add the
+openmpi-1.3.2-intel to use Open MPI.
The source code for the examples is in $HOME/nvidia_examples/C/src;
the
built executables will be in $HOME/nvidia_examples/C/bin/linux/release.
To run the examples, one can start an interactive batch job as mentioned
above, or call them from a job script with the lincoln queue specified:
#PBS -q lincoln
The utility deviceQuery (applicable on the Tesla nodes only)
can be used to examine the characteristics of
the Tesla devices; sample output below:
There are 2 devices supporting CUDA
Device 0: "Tesla C1060"
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
Device 1: "Tesla C1060"
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
- CUDA Visual Profiler
The CUDA Toolkit contains a visual profiler that can be invoked from within
a batch session to view performance statistics; a list of counters can be found
in the (rather terse) README.
To invoke the profiler, one must enable X-forwarding by firstly logging into
abe with "ssh -Y abe" (for trusted X-forwarding), and then launching a batch
job with option "-X"; cf. the man pages for ssh, respectively, qsub.
A CUDA softenv key, say,
+cuda-3.2
will add "cudaprof" to the search path; the executable to be profiled should
be launched from the cudaprof window.
- CUDA HOME
2.2 Portland Group Compiler
The Portland Group compilers currently support the NVIDIA Tesla in two ways,
the first having an implicit programming model; the second, explicit. These are
separate efforts, with differing objectives.
- Accelerator model:
Starting with version 9.0, the PGI compiler has introduced
accelerator directives that may be added to existing code,
without restructuring. This model and syntax are reminiscent of OpenMP.
This model is currently available on x64 and NVIDIA GPUs supporting CUDA,
and, as a general approach to acceleration, planned for ATI, Cell, and
Larrabee.
- CUDA Fortran:
Beginning with version 9.0.4, there is a beta version of CUDA Fortran,
analogous to NVIDIA's CUDA C; this is available solely for x64+NVIDIA GPUs
supporting CUDA.
It is recommended that one add the softenv key for the explicit version of the
compiler, given the pace of releases; the details below refer to version
10.9, but of course are applicable to other versions, mutatis mutandis.
To invoke the PGI compilers, add the softenv key, say,
@pgi-10.9
to the top of your .soft file.
Examples of use may be found in
/usr/local/pgi/linux86-64/10.9/etc/samples
As of version 9.0.3, a requested feature was added to the accelerator model:
compiling with the flag
-ta=nvidia,keepgpu
will output the gpu kernel code generated.
As the makefile currently compiles only those examples of the accelerator model, one should use the following to compile the CUDA Fortran example:
pgfortran -o cufinfo cufinfo.cuf
pgfortran -o sgemm sgemm.cuf
In all cases, the resulting executables should be run on a compute node of
Lincoln, accessed through the batch system.
2.3 Additional Information
|