NCSA Home
Contact Us | Intranet | Search

Enhanced Computing Environment

Blue Waters Project
Computing System
Productive, Easy-to-use, Reliable Computing System
Hardware
Software Configuration
Enhanced Computing Environment
Performance modeling and interim systems

While IBM will provide a solid high-performance computing environment for Blue Waters, Illinois and other members of the Great Lakes Consortium will work with IBM to enhance the high-performance computing environment to ensure that applications can take full advantage of Blue Waters' hardware capabilities to achieve high sustained performance. It will accommodate software requirements from a wide variety of applications, and it will be improved throughout Blue Waters' lifetime to meet the ever-evolving needs of its researchers.

In other words, Blue Waters and the software that runs it are designed with applications in mind—real world applications that scientists use everyday to simulate our world and drive discovery.

The enhanced HPC environment will also increase the productivity of application developers, system managers, and researchers by providing an integrated toolset to use, analyze, monitor, and control the behavior of Blue Waters.

These efforts will ensure that Blue Waters' software environment is compatible with software on other high-end systems—like the new NSF-supported computing systems at Oak Ridge National Laboratory, the Texas Advanced Computing Center, and the Pittsburgh Supercomputing Center. Many of the new tools and improvements to existing tools will also be incorporated into open-source software that will be available to researchers worldwide through the software's development communities.

The enhanced HPC environment will include a wide variety of improvements.

Computational Libraries: This project will ensure that the computational libraries needed to support scientific applications perform well at the required scale. The libraries will be identified, ported, and tuned for Blue Waters, in collaboration with the library developers and scientific application developers.


This project will also include work on ESSL and PESSL, mathematical subroutines specifically provided by IBM to improve the performance of engineering and scientific applications on POWER processor-based systems.

Cactus: Cactus, an open-source component-based framework for HPC parallel application development that supports large-scale science and engineering applications and collaborative development teams, is being evaluated for use on Blue Waters. It can help abstract new programming methods and technologies to enable petascale applications.

Charm++ and AMPI: Given the sheer number of processors, disk drives, and other components in Blue Waters, virtualization will be key to making the system highly reliable. Virtualization allows computations and other tasks on Blue Waters to be moved transparently to other processors or drives should the initial components fail or become unavailable. This feature improves fault tolerance. It also improves performance by hiding some of the latency as work is moved among components and by optimally balancing the workload on Blue Waters. Finally, it allows application developers to overlap communication and computation in novel ways.

Charm++ and Adaptive Message Passing Interface, both developed at Illinois, will be enhanced and used for virtualization on Blue Waters.

Debugging and performance tuning: Debugging an application running on even hundreds of processors of a large parallel computing system is a major challenge today, and few current tools work on more than a few thousands of processors, much less Blue Waters' hundreds of thousands. IBM will provide a new debugger designed to operate at this massive scale.

GPFS+HPSS: The GPFS and HPSS software stacks must each work individually, and then work together transferring data between disk and tape. To provide this capability, NCSA will add a RAIT capability to HPSS and an I/O import/export capability for data flowing into and out of Blue Waters. RAIT is needed to ensure data integrity and the import/export capability to allow for scheduling of reliable data transfers into or out of the machine. 
For more on GPFS and HPSS, see the Blue Waters hardware configuration.

Integrated Application Development Environment: An easy-to-use integrated development environment for Blue Waters will make scientific programmers more productive. The Eclipse-based environment will aid in creating and improving applications by bringing the development, debugging, optimization, and job submission and management processes into a single tool.

The environment will be secure, support hybrid codes, include online educational materials, and provide an easy means of including third-party tools and services.

Such tools are commonly used in other software development areas but are not frequently used in high-performance computing. Blue Waters and its integrated application development environment will enable scientific programmers to shift to this new environment.

Integrated Systems Console: This environment will present the system administrator with a single, unified view of the system. It will aggregate event reports and performance metrics from across the system and mine that information using global filters, fault detection, and fault prediction modules.

The system-management environment will coordinate system monitoring, checkpointing of computational runs, file system management including RAID-recovery, and interconnect-reconfiguration activities. 
These features will allow system managers to identify the root cause of problems and to establish standard, automated responses to common problems.

Performance Tools: Low-overhead agents will be used on Blue Waters' processing elements to collect data and perform local pre-analyses, including data compression and reduction, to limit system-level impact. The local data is then subjected to a variety of hierarchical data-reductions and analyses within a tool communication infrastructure.

Because there is no one single tool that can do all performance analysis tasks, a common interface for performance and debugging information is being created. Some of the needed tools will come from IBM, and others will come from the open-source research community and other third-party providers.

IBM's tools are currently part of their HPC Toolkit and provide an assortment of performance measurement technologies. As part of the DARPA HPCS program, IBM is also developing an automated performance optimization tool. It will interpret performance data to suggest, and in some cases implement, changes to scientific application code that will improve performance.

Open-source tools will also be available, including:

  • Tau, for controlling and tracing performance-monitoring instrumentation.
  • KOJAK, for instrumentation, event-trace generation, and postprocessing of event traces.
  • Jumpshot, for visualizing performance data.

Photran Eclipse: Photran, which is part of the Parallel Tools Platform (PTP), will be extended to support Fortran 2003 and Fortran 2008 as well as debugging of parallel programs.


System Simulator: Before full deployment and even after initial production, Blue Waters projects will use sophisticated system simulators to get advanced insight for optimization, scaling, and re-engineering their applications. IBM is providing instruction-level processor simulators and full system network simulators for the Blue Waters staff.

BigSim, a whole-system simulator for early application development and identification of performance problems, is being combined with IBM's POWER7 chip simulator to model the performance of application on Blue Waters. The integrated simulator is available on NCSA's interim systems, along with associated documentation and user tutorials.

Visualization: Blue Waters will not offer any dedicated visualization hardware like dedicated nodes on the system or a separate external system. Visualization of data will be done by transferring the data or performing remote visualization directly on the Blue Waters hardware. With this in mind, NCSA is porting and tuning common visualization libraries and tools to Blue Waters, including the VisIT visualization package and the gnuplot interactive data and function plotting utility.

Workflow System: Ensemble Broker and system-local workflow systems are being extended to work with Blue Waters. This software will provide advanced capabilities for researchers to compose multi-level graphs (orchestrating multiple job submissions, for instance as high level, and system-local orchestration). The Ensemble Broker graphical user interface will be integrated into the Eclipse Integrated Application Development Environment in order to provide support for the job execution cycle.