NCSA Home
Contact Us Intranet

Innovative Systems Lab Software

About Us
Organization and Leaders
ISL Home
ISL Staff
ISL Resources
ISL Projects
ISL Software
ISL Community Engagement
ISL Publications

CUDA SDK wrapper library

The CUDA SDK wrapper library provides means for an efficient resource sharing and resource protection on multi-user GPU clusters, such as NCSA's 32-node 128-GPU system. It implements the following functionality:

  • Virtualization of the physical GPU devices. The virtual devices visible to the user map to a consistent set of physical devices, which accomplishes "user fencing" on shared systems and prevents users from accidentally trampling one another.
  • Rotation of the virtual to physical mapping for each new process that requests a GPU resource. This provides a method for large parallel tasks to use common startup parameters and still use multiple device targets. I.e., when each new process calls for gpu0, the underlying physical device gets shifted allowing for the next process calling for gpu0 to get the next allocated physical device.
  • Ensuring NUMA affinity for GPUs on systems that have multiple memory controllers. NUMA affinity can be mapped between CPU cores and GPU devices. This has been shown to have as much as 6-8% improvement in host to device memory bandwidth.
  • Memory-scrubbing to wipe the user's GPU memory after use for security from subsequent users.

When installed, the CUDA SDK wrapper library is forced preload and intercepts the device allocation calls to CUDA libraries in order to provide the above-mentioned functionality.

Download the CUDA wrapper library

CUDA memory test

The CUDA GPU Memory Test adopts memory test methodology used in Memtest86 utility for GPU device memory. The main idea of the memory test utility is to write a test pattern to memory, read it back and verify if it is the same as written, write the pattern's complement to the memory, read and verify it again. All 10 tests from Memtest86, plus one additional test are implemented. These tests are designed to catch both permanent hardware errors due to manufacturing defects and prolonged use of memory chips and "soft errors" due to cosmic radiation. The GPU memory test can be continuously used to monitor the system for appearance of new hardware faults and to collect statistics about the rate of soft errors.

Download the CUDA memory tester

Cell/BE task library

This task library is built upon libspe2 from IBM Cell SDK. It gives a clean task interface to Cell programmers to start jobs in SPUs by hiding the tedious context/pthread creation, mailbox/signal/interrupt mailbox communication, etc.

Download the Cell/BE task library