- Compilers and languages
- Introduction
- CUDA C and CUDA Fortran
- Accelerator model
- OpenCL
- Host-only compilers
- Libraries
- Debuggers and Profilers
1. Compilers and languages
1.1. Introduction
NCSA supports the NVIDIA and PGI compilers for the Tesla Fermi GPUs. As these compilers each
provide multiple and differing capabilities, they are summarized below in terms of the
language or API provided.
1.2. CUDA C and CUDA Fortran
CUDA C is the computing architecture developed, by NVIDIA, for NVIDIA GPUs; it defines
extensions to the C language for negotiating execution on the GPU and communication
between host and GPU.
CUDA Fortran is an analogous extension to the Fortran language; it was developed as a
collaboration between NVIDIA and the Portland Group.
CUDA-x86 is the PGI CUDA C/C++ compiler for x86; it provides a unified programming model
for both multi-core and many-core architectures. Executables may be run either on the GPU,
or on a non-GPU multi-core x86 architecture.
1.2.1 CUDA C
- CUDA C environment
The module for NVIDIA's CUDA C is loaded by default upon login. The NVIDIA compiler is nvcc.
One can compile on the head node, but execution on the Tesla GPUs is available only via PBS batch jobs.
- CUDA C SDK
Example code, documentation and several utilities can be found in the NVIDIA SDK.
When porting code to CUDA C, the examples in the SDK can be quite useful,
both as illustration and as templates for certain algorithms (e.g., marching cubes, Monte Carlo). In addition, there are further examples and tutorials
available on the NVIDIA site at the link below.
To use the examples, one should copy the installer to one's home directory, and run
with the prompted-for defaults:
cp /uf/ncsa/consult/nvidia_sdk/gpucomputingsdk_*_linux.run $HOME
cd $HOME
sh ./gpucomputingsdk_*_linux.run
To build, cd into the "C" subdirectory and run make.
In all cases, the resulting executables should be run on a compute node, accessed through the batch system.
The utility deviceQuery will list characteristics of the Tesla devices.
- NVIDIA CUDA C site
1.2.2 CUDA Fortran
- CUDA Fortran environment
To use CUDA Fortran, one need only load the module for the PGI compilers:
module load pgi/2011
- CUDA Fortran SDK
Examples and makefile may be found in
/usr/local/pgi/linux86-64/2011/cuda/cudaFortranSDK
In all cases, the resulting executables should be run on a compute node, accessed through the batch system.
- PGI CUDA Fortran site
1.2.3 CUDA-x86
- CUDA-x86 environment
To use CUDA-x86, one need only load the module for the PGI compilers:
module load pgi/2011
- CUDA-x86 SDK
Examples and makefile may be found in
/usr/local/pgi/linux86-64/2011/cuda/cudaX86SDK
In all cases, the resulting executables "should" be run on a compute node, out of deference to other users on the login node; as noted in the introduction, however, any multi-core x86 architecture is supported.
- PGI CUDA-x86 site
1.2.4 Additional Information
1.3. Accelerator model
In addition to CUDA Fortran, the PGI compilers support an API referred to as
the "Accelerator Programming Model", which is similar in practice to OpenMP.
In this model, user directives may be added to existing C or Fortran code that
will automatically "accelerate" regions of code, by executing on the GPU.
Examples of use may be found in
/usr/local/pgi/linux86-64/2011/etc/samples
The makefile therein will build examples of accelerating C and Fortran code.
A summary and references may be found here: PGI Accelerator Compilers
1.4. OpenCL
OpenCL is supported by the NVIDIA CUDA distribution; cf. examples in the NVIDIA_GPU_Computing_SDK mentioned above, and the discussion here:
OpenCL
1.5. Host-only compilers
In addition to the compilers mentioned above, the GNU and Intel compilers are available on forge, and loaded by default.
Several implementations of MPI built with these compilers are available
; mvapich2 is loaded by default, and versions of openmpi can be seen with "module avail".
In all of these cases, the compilers themselves have standard names, for example, "mpicc", "mpicxx", "mpif77", "mpif90".
2. Libraries
2.1 Traditional libraries
For host-based coda, the Intel Math Kernel Library (MKL) contains the complete set of functions from the basic linear algebra subprograms (BLAS), the extended BLAS (sparse), the complete set of LAPACK routines, and a set of fast Fourier transforms; it is loaded by default with the Intel compilers.
2.2 Accelerated libraries
NVIDIA provides accelerated versions of certain of the above routines, namely:
The SDK contains examples of use for each of these libraries.
The NVIDIA Performance Primitives library (NPP) is a collection of basic algorithms accelerated for the GPU (arithmetic, filter, image, geometric...):
3. Debuggers and Profilers
NVIDIA provides
cuda-gdb for debugging CUDA C; it is essentially a port of
gdb,
with appropriate extensions, and consequentially will be familiar to users
of
gdb.
NVIDIA provides a "Visual Profiler", computeprof for CUDA C and OpenCL. To invoke the profiler, one must enable X-forwarding by firstly logging into forge with "ssh -X forge", and then launching a batch job with option "-X"; cf. the man pages for ssh, respectively, qsub.
PGI's pgprof enables profiling of CUDA Fortran and the PGI Accelerator directives; as with
NVIDIA's visual profiler, one should enable X-forwarding as described above.
An article on debugging CUDA-x86 applications may be found here:
The TAU profiling and tracing toolkit is available on forge, and has GPU support;
for information, see here: