High Performance Computing with Accelerators

My research aims at understanding how HPC applications can be mapped to newly emerging accelerator-based architectures, such as those that employ FPGAs, GPGPUs, Cell/B.E., and other types of processors. My short-term interests include helping computational scientists to design and implement scientific codes for accelerator-based systems. My long-term goal is to develop a formal methodology, guidelines, and recipes that other researchers can use when implementing applications on similar accelerator-based architectures and devise methodology for analyzing and characterizing existing applications with respect to their portability to novel computing architectures. This work is funded by NSF grants 0810563 and 0626354, NASA grant NNG06GH15G, NARA grant and is part of IACAT's Center for Extreme-scale Computing and NCSA's Innovative Systems Laboratory efforts.


NARA grant: Innovative Systems and Software: Applications to NARA Research Problems

We investigate the suitability of the Graphics Processing Unit (GPU) technology for the acceleration of image characterization algorithms used to find similarities between documents with embedded images. We have ported image characterization algorithm used in doc2learn to GPUs using both CUDA C targeting NVIDIA GPUs and OpenCL targeting NVIDIA and AMD architectures and conducted an extensive study of the impact of GPU acceleration for documents with varying number of images and image sizes.

IACAT Project: Implementation of MILC on computational accelerators

The MIMD Lattice Computation (MILC) code, a Quantum Chromodynamics (QCD) application used to simulate four-dimensional SU(3) lattice gauge theory, is one of the largest compute cycle users at many supercomputing centers. Previously we have investigated how one of MILC applications can be accelerated on the Cell Broadband Engine. We currently investigate how this code can take advantage of the newly emerging GPU computing architecture.

NASA grant NNG06GH15G: Advanced Astrophysical Algorithms to Novel Supercomputing Hardware

We consider a class of cosmology applications that are based on a common algorithm—multidimensional distance calculations. Examples of such applications include n-point correlation functions, instance-based learning algorithms, and power spectrum estimation—all of these algorithms are used in a number of different scientific and engineering domains. Specifically, in this investigation we focus on the 2-point angular correlation function (TPACF) used to characterize the clustering of sources on the celestial sphere. TPACF serves as one of the main tools in studying the distribution of the matter in the Universe. Due to the large size of datasets produced by modern astronomical instruments and the O(N^2) computational complexity of the algorithm, significant computing resources are required to perform the calculations for modern datasets. We have implemented a reference TPACF algorithm and ported it to SRC-6 and SGI RC100 reconfigurable computers. We have also conducted a preliminary investigation of its amenability for GPU implementation. We implemented several instance-based learning algorithms and ported artificial neural network based probability density functions code to SRC-7 reconfigurable computer.

NSF grant 0626354: Chemical computations on future high-end computers

Chemical simulations present a computational approach to study the behavior of molecules and atoms at atomic and sub-atomic details. Such simulations, however, are greatly limited in size and timescale due to the complexity of the underlying mathematical models that translates into computationally demanding and time-consuming algorithms. For example, in molecular dynamics (MD), the non-bonded force-field calculations are typically responsible for over 80% of the overall execution time of MD codes and are the main bottleneck in achieving the microsecond timescales. In quantum chemistry, the calculation of two-electron repulsion integrals (ERIs) remains a bottleneck in many of the ab initio molecular orbital or density functional theory electronic structure codes. In direct self-consistent field (SCF) methods many millions of ERIs are recomputed every SCF iteration and count for the vast majority of the execution time. We are investigating the use of GPUs, FPGAs, and the Cell/B.E. to accelerate the execution of kernels used in chemistry codes, such as two-electron repulsion integral calculations used in direct SCF codes and the non-bonded force-field calculations used in NAMD. We have implemented Rys quadrature scheme for two-electron Coulomb repulsion integrals to evaluate primitive integrals [pq|rs] for Gaussian-type orbitals (GTO) basis sets on the SRC-6/7 reconfigurable computers and IBM Cell/B.E. blade system. We also implemented NAMD's non-bonded force-field kernel on SRC-6 reconfigurable computer and IBM Cell/B.E. processor. (See project website for more details.)

NSF grant 0810563: Investigating Application Analysis and Design Methodologies for Computational Accelerators

The impact of computational accelerators on scientific applications and the investment required to utilize these resources is not fully understood in the scientific computing community. While accelerator-based computing architectures offer great potential performance, the execution models, software architectures, and development processes that are required to realize that potential currently differ dramatically from exiting computational architectures. We are conducting an exploratory investigation to understand the impact of accelerator technologies on scientific and engineering codes and to quantify the efforts and requirements necessary to implement these codes on the newly emerging accelerator technologies. We are also investigating formal methods in application analysis and cross-platform software engineering for accelerator technologies. Our approach is based on implementing a commonly used algorithm on several accelerator architectures and, in doing so, developing formal guidelines and recipes that other researchers can adopt when porting their own applications to similar accelerator-based architectures. Specifically, we use the 2-point correlation function (TPACF) algorithm (previously developed under the NASA grant and extended to allow for error estimation) as a testbed for cross-platform implementation.

back to my homepage