NCSA Home
Contact Us | Intranet | Search

ncsa

The NCSA Performance Engineering Seminar Series

Organizer: Faisal Saied



Top
When: Friday, Dec 15, 1:30 PM Where: 4169 Beckman Introduction to PETSc Bill Gropp Senior Research Scientist MCS, Argonne National Labs PETSc, the Portable, Extensible Toolkit for Scientific Computation is a suite of data structures and routines for the uni- and parallel-processor solution of large-scale scientific application problems modeled by partial differential equations. PETSc 2.0 is fully usable from Fortran, C/C++, supports real and complex numbers, and runs on most machines, now including Windows. This talk provides a brief introduction to PETSc, including examples illustrating the use of PETSc to create portable parallel applications.



Top
When: 11:00 AM, Friday, Dec 1211:00 AM Where: 4169 Beckman Parallel Software Engineering with KPTS and OpenMP Sanjiv Shah, Kuck & Associates, Inc. OpenMP is an industry standard programming specification for shared memory parallel computers. It offers scalability and portability across Unix and Windows NT systems. Parallel Software Engineering requires more than the ability to compile parallel applications. Tools to verify correct usage of parallel constructs and visualize performance issues are invaluable. The talk will introduce Kuck & Associates' KAP/Pro Toolset for OpenMP (KPTS), a collection of parallel programming tools for OpenMP. The Toolset includes Assure, which automatically locates bugs in OpenMP programs, and GuideView, which visualizes parallel program performance. KPTS also includes a high-performance fully compliant implementation of OpenMP.



Top
January 8 (Thursday), 1998 at 10:00 a.m. 4269 Beckman Institute OpenMP on Silicon Graphics Platforms Ramesh Menon Silicon Graphics, Inc OpenMP is an application program interfaces (API) for shared memory parallel programming. Pioneered by SGI, it is fast becoming a de-facto industry stan- dard as evidenced by the large number of Hardware and Software vendors en- dorsing the standard. The functionality is designed to enable programmers to write coarse grain, scalable, shared memory parallel programs while also preserving the ability to easily implement loop level parallelism. This talk will present the why, what and how of OpenMP as it relates to Silicon Graphics platforms. A com- parison with existing functionality will be presented along with some typi- cal examples in Fortran. The C/C++ specification is expected to be released in early 1998. http://www.sgi.com/Technology/OpenMP



Top
TITLE: The Linear System Analyzer Project SPEAKER: Prof Randall Bramley Department of Computer Science Indiana University-Bloomington TIME: Friday, Jan 30, 11:00 AM PLACE: 5239 Beckman ABSTRACT: The Linear System Analyzer (LSA) is a research project addressing the general problem of developing software component architecture frameworks for large-scale distributed scientific computing. One goal is to provide tools allowing scientists to quickly build their own problem-solving environments for problems in their application domain. Another is to develop methods for connecting specialized resources such as parallel databases with parallel computations. The LSA is a particular implementation of the general framework, providing both focus and a reality check. The LSA targets one of the most difficult and common problems in scientific computing: the numerical solution of large, sparse linear systems of equations. A large number of approaches have been developed for solving those, and it is clear to practitioners that no single solver will work well in all or even a majority of cases. The LSA includes a GUI for people to quickly wire together modules performing linear system input, reordering, scaling, and direct and iterative solves. Modules can run on distributed machines, and send data to other modules via Nexus. This talk discusses the design goals and constraints for the LSA, and their implications for the underlying PSE infrastructure. This is still a work in early progress, and remaining research challenges will also be discussed. The LSA is a joint project with R. Bramley, D. Gannon, T. Stuckey, J. Villacis, J. Balasubramanian, E. Akman, S. Diwan, and M. Govindaraju.



Top
Performance Evaluation and Benchmarking with Large-Scope Applications Rudolf Eigenmann School of Electrical and Computer Engineering Purdue University The use of large, realistic applications for evaluating high-performance computers is not yet standard practice. Typically, small and manageable test programs are used for benchmarking, for measuring the results of research projects, and for justifying project directions. As a consequence, computer systems often fail to prove their value in practice under impact of real applications. This issue is being addressed in a joint academic/industrial initiative to define and maintain a suite of realistic, industrial applications for performance evaluation and benchmarking. The effort includes the SPEC High-Performance Group (HPG), which has recently released the SPEChpc96 suite for benchmarking large-scale computer platforms. Several academic members participate in this effort with the objectives of characterizing these applications from a variety of viewpoints (architecture, compiler, algorithm angles) and making the results available to the research community. This talk will describe the activities around the SPEC/HPG committee, the applications included in the current benchmark suite and ongoing work. The issues mostly dealt with in the academic members will then be presented. They include performance models, evaluation methodologies, characterization tools, and the creation of an infrastructure that makes the results of this effort available to the community at large. -------- A brief bio Rudolf Eigenmann was a member of the research staff at the Center for Supercomputing Research and Development, University of Illinois from 1988 through 1995. In 1995 he joined the faculty at the School of Electrical and Computer Engineering, Purdue University. He also currently serves as the chairman of the SPEC High-Performance Group. His interests include compilers, tools, programming methodologies, and performance evaluation of parallel computers.



Top
Where: 3269 BI When: Friday, Feb 27, 10:30 AM Strassen's Algorithm: A Practical Method for Fast Matrix Multiplication Steven Huss-Lederman CS Department University of Wisconsin-Madison In 1969, Strassen published a paper with the innocuous title "Gaussian Elimination Is Not Optimal". That paper describes an algorithm for multiplying two order n matrices in O(n^lg 7) = O(n^2.807) operations. This is potentially a significant savings over the conventional algorithm which takes O(n^3) operations. That paper began efforts to understand the minimum work required for performing matrix multiplication. In spite of the significance of this and subsequent papers, Strassen-type algorithms have only seen limited use in real applications. This results from several concerns that have been voiced over the years. These include dealing with odd-size matrices, non-square matrices, numerical stability, and temporary memory. However, the most damning claim is that it only applies to enormous matrices so is not of practical interest. As part of the PRISM project, we have undertaken a systematic study of Strassen's algorithm. We have shown, along with others, that each of above concerns can be readily addressed. We have produced a portable, public domain library which implements this algorithm and achieves high performance. This library is plug-and-play with the BLAS GEMM routine. This talk will assume no previous knowledge and should be accessible to all who are interested. The work was performed in collaboration with Elaine Jacobson, Jeremy Johnson, Anna Tsao, and Tom Turnbull.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA Monday, March 2, 4269 BI 2:00 p.m. SPRNG: A Scalable Library For Pseudorandom Number Generation Dr. Michael Mascagni, Coordinator Program in Scientific Computing University of Southern Mississippi Abstract: Providing high quality pseudorandom numbers for parallel computers supplies many deep and fascinating mathematical problems as well as unique software engineering challenges. One of the more practical issues in parallel pseudorandom number generation is finding methods that provide portability and reproducibility across architectures. We summarize some recent developments in the design and analysis of pseudorandom number generators for parallel computers that are portable and reproducible. These results are the basis for a DARPA sponsored project for the development of a scalable library for pseudorandom number generation that is based at the University of Illinois, Urbana-Champaign. It is hoped that this scalable library will be the seed for a more comprehensive problem solving environment for Monte Carlo computations on scalable platforms. In addition, it is hoped that this work can be used in the ASCI project, where Monte Carlo calculations are a vital part of the total simulation effort.
Web site: SPRNG


Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA Title: NAG Software Solutions for High Performance Computing Speaker: Arnold R. Krommer, NAG When: Thurs, April 9, 2:00 PM Where: 3269 BI Abstract: The Numerical Algorithms Group (NAG) is a leading provider of technical and scientific software. NAG has built a profound world-wide reputation by developing and distributing high-quality numerical, visualization, symbolic, and simulation software for personal computers, workstations, and supercomputers for more than 25 years. NAG software is used at thousands of academic, research and industrial sites everyday to solve complex computational problems in physics, chemistry, biology, engineering, and financial modeling. This presentation provides an overview of the broad range of products offered by NAG, focusing mainly on software relevant to high performance computing. The products highlighted include the IRIS Explorer visualization system, the NAG Fortran SMP (Symmetric Multi-Processor) Library, and the NAG Parallel Library. A major driving force for the further development of the NAG Parallel Library is the Parallel Industrial NumErical Applications and Portable Libraries (PINEAPL) Project which NAG is currently coordinating. The presentation outlines the aims of the PINEAPL Project, describes its progress, summarizes preliminary results and will provide details on FREE access to the software.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA When: Wed, May 6, 1:30 PM Where: Exciting New Trends in Computer Architecture -------------------------------------------- Prof. Josep Torrellas Computer Science Department University of Illinois Urbana-Champaign http://iacoma.cs.uiuc.edu Dramatic increases in the number of transistors that can be integrated on a VLSI chip provide exciting opportunities for new computer architectures. Examples of fertile areas are processor-memory integration, support for speculative parallelization, and support for commercial workloads. In this talk, I will outline our work in some of these areas. In the area of processor-memory integration, we are exploring cost-effective designs for distributed shared-memory machines built around off-the-shelf processor-in-memory (PIM) chips. These machines must exploit physical locality while, at the same time, use PIM chips largely designed for uniprocessors. In addition, we are considering the use of PIM chips as the intelligent memory of a PC, workstation, or collection of them. This intelligent memory interleaves simple computation units with DRAM arrays to form a flexible computation/storage fabric. With speculative parallelization, we are able to run in parallel codes that the compiler cannot parallelize. The idea is to extend the cache coherence protocol hardware of the machine to detect any dependence violation. When a violation is detected, the work is redone in the right order. We are exploring two approaches, namely speculation across the processors of a distributed shared-memory machine, and within a multiprocessor chip. ****** JOSEP TORRELLAS is a faculty at the Computer Science Department. His research interests are single- and multi-processor computer architectures. He received a PhD in Electrical Engineering from Stanford University. He is a recipient of a 1994 National Science Foundation Young Investigator Award.



Top
======================================================================== PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA HINT, NetPIPE, and more John Gustafson, Don Heller Scalable Computing Laboratory Ames Laboratory, US Dept. of Energy Iowa State University gus@ameslab.gov, heller@ameslab.gov http://www.scl.ameslab.gov Wednesday, June 3, 2:00 PM 5239 Beckman Institute The Scalable Computing Laboratory has developed several benchmark programs that demonstrate the capability of computer systems and networks over a wide range of applications. HINT is a computing task that behaves like many programs, in that it will not stay within the confines of any one level of the memory hierarchy. The output of the HINT computation is guaranteed bounds on the answer, so the performance of a system is measured by the quality of the response for work performed, and the rate of quality improvement with increased resources. HINT is easily extended to vector, shared-memory or distributed-memory systems, and we have acquired an extensive database of measurements. By varying the problem size, HINT may be considered a superset of the usual fixed-size benchmarks. By varying the data types, HINT can discover a bias toward integer or floating-point computation. By comparing systems with the same general characteristics, the effect of changes in the memory hierarchy or processor pool can easily be detected. By comparing disparate systems using the same application, some sense of the "right machine for the job" can be gained. NetPIPE is a communication task that uses point-to-point messages in protocol-independent manner, to assess the performance of communication bound applications. Again, by varying the communicated block size, NetPIPE clearly shows the effects of latency, bandwidth, and anomalous behavior. Its original motivation was to help select a network infrastructure for various types of applications and communication with a CAVE virtual reality environment. We have also used it as a guide to the revision of Ethernet device drivers, both improving performance and decreasing variability. This has been useful for our local PC cluster projects. Some additional supporting tools for the analysis of HINT results, verification of the measurements, and hardware performance counters, will also be discussed.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA Implementation and performance analysis of a large-scale groundwater simulation code G. Mahinthakumar Center for Computational Sciences Oak Ridge National Laboratory http://www.ccs.ornl.gov/staff/kumar/ Friday, July 31, 11:00AM 4169 Beckman Institute Abstract -------- A parallel finite-element groundwater code for simulating single-phase flow and multicomponent transport has been developed for a variety of parallel architectures. We will describe the implementation and the analyze the performance of this code with respect to scalability and cache effects. We will also present some applications of this code. The code exhibits characteristics that are typical to many simulation codes such as explicit communication, global reduction operations, sparse matrix operations, and parallel I/O. The parallel implementation is based on domain decomposition with explicit message passing using MPI. We analyze performance on architectures such as the Intel Paragon, IBM SP, Origin 2000, and Convex Exemplar SPP-2000. Our performance results will be focused on the multigrid and Krylov solvers used in the flow and transport modules but some results with respect to the overall performance will also be presented. Our performance results show that the implementation is scalable on architectures which have a good ratio ( > 2 ) of communication bandwidth (MB/sec) to peak performance (Mflops). The single node performance is mainly affected by memory bandwidth and secondary cache size because sparse matrix operations dominate the computations. On machines such as the Origin 2000 which have a good sized (4 MB) secondary cache we achieve better percentage of the peak for small to moderate size problems than machines which have no secondary cache. For the multicomponent transport code we implemented certain computational and memory saving features which enables enhanced resolution of the model with rapid solution times. We are now able to solve nonlinear coupled partial differential equation systems (arising in multicomponent reactive transport ) with more than 120 million degrees of freedom in about 5-10 seconds per time step on machines such as the 1024-processor Intel Paragon XPS/150. To our knowledge, solution to groundwater transport problems of this size has not been reported before in literature.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA NAG & HPC Dr Stef Salvini Numerical Algorithms Group, Ltd Oxford, England Thursday, Nov 19, 11:00AM 4169 Beckman Institute Abstract -------- NAG (The Numerical Algorithms Group) has been a major provider of numerical libraries for many years. In particular, we have been active in HPC through our products and through research and collaborations with other institutions. This talk will concentrate on the work we are carrying out in parallel numerical computing. Parallel numerical libraries can fulfil a number of important roles: * Encapsulate expertise not always available to application developers * Allows levels of performance otherwise difficult to achieve * Allow non specialists to have access to parallelism This talk will deal mostly with the NAG SMP Library. The role of compiler directives, in particular that of Open MP, in the development of efficient, portable code will be reviewed. New results obtained at NCSA in collaboration with AHPCC/HPCERC at UNM show that a multi-threaded alternative approach can return performances and scalabilities superior to those provided by vendors' mathematical libraries. Finally, recent work on hybrid parallelism, carried out at UNM, will be presented. This work originates partly as a consequence of the SMP work, and it is intended to provide a possible strategy for employing existing message passing and SMP technologies.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA PECM Performance Tools In-House Workshop KAI's OpenMP Tools Presenter: Bill Magro, KAI Friday, Jan 15, 9:00AM - 12:00 noon This is a reminder that we are going ahead with the tutorial on KAI's OpenMP tools for Fortran on Friday, Jan 15, from 9:00 AM to 12:00 noon, to be taught by Bill Magro of Kuck and Associates. We have the Numerical Lab in 3514 Beckman reserved. The plan is for Bill to describe the tools, and for attendees to be able to login and try them, hands on. I have told Bill that he should assume a basic familiarity with OpenMP (see the Fortran document in http://www.openmp.org), and focus on the tools. You may want to play with OpenMP on modi4 before this workshop, and bring along a few simple codes.



Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA High-Performance Computing, Numerical Libraries and Trends Prof Jack Dongarra University of Tennessee and Oak Ridge National Laboratory http://www.netlib.org/utk/people/JackDongarra/ Feb 25, 1999 This talk will provide an overview of high performance computing. We will look at some of the trends for the future in the context of numerical libraries. The talk will conclude with a look at directions high performance computing may be heading. ************************************************************** Jack Dongarra dongarra@cs.utk.edu 104 Ayres Hall 423-974-8295 fax: 423-974-8296 Knoxville TN, 37996











Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in Single System Image Shared-Memory Multiprocessors Yan Solihin, Vinh Lam, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign solihin,lam,torrella@cs.uiuc.edu Thursday, May 20, 11:00 AM 4031 Beckman Institute Abstract Single system image (SSI) shared-memory multiprocessors provide an attractive combination of cost-effective architectural design and, thanks to the single-image, shared-memory abstraction, relative ease of programming. Unfortunately, it is well-known that tuning applications for scalable performance in these machines is often a time-consuming effort. While performance monitoring tools can help, they often present only low-level information, lack integration, and are usually costly to run. In this talk, we outline an empirical model that isolates and quantifies scalability bottlenecks in shared-memory parallel applications running on SSI shared-memory machines, in a relatively inexpensive and integrated manner. The scalability bottlenecks currently quantified include insufficient caching space, load imbalance, and synchronization. The model uses as inputs measurements from hardware event counters in the R10000 processors of the Origin 2000. A major advantage of the model is that it is quite inexpensive to run: it only needs the event counter values for the application running with a few different processor counts and data set sizes.


Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES A Performance Comparison of Fortran 90 with MPI and OpenMP on the Origin 2000 Jay Hoeflinger Senior Research Scientist Center for Simulation of Advanced Rockets University of Illinois at Urbana-Champaign Tuesday, September 7, 1:30 PM 4169 Beckman Institute Fortran 90 has become increasingly popular in the scientific community as a language for writing application codes, because (among many other things) it offers storage management facilities and more flexible data structures than does Fortran 77. To make it even more useful, the two major parallel programming paradigms have industry-standard Fortran 90 bindings. SPMD parallelism may be expressed through the use of MPI calls, and shared memory parallelism may be expressed through OpenMP. But the two paradigms cause certain Fortran 90 constructs to display different performance characteristics. This talk will compare MPI with OpenMP in a general way, then describe an experiment run on the Origin 2000 in which an application using MPI was re-implemented with OpenMP in such a way that the performance effects of the paradigms on each Fortran 90 construct were comparable. The performance differences will be described in detail, and conclusions will be drawn about the trade-offs involved in writing high-performance parallel programs using Fortran 90 on the Origin 2000.


Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES Linear Solver Technologies at NCSA Faisal Saied Performance Engineering and Computational Methods Group National Computational Science Alliance University of Illinois at Urbana-Champaign Friday, October 15, 11:00 AM 5239 Beckman Institute Abstract The solution of linear systems of equations is an important part of many large scale computational science applications. In this talk we review the linear solver requirements of several important applications at NCSA, and also the set of technologies, both algorithmic and parallel software, available to attack these problems. The linear solver technologies at NCSA include software from Alliance ET team partners, academic and government lab researchers, commercial software vendors, and application scientists, and include some of the best high performance solutions available, ranging from dense and sparse direct solvers, to preconditioned Krylov subspace solvers, multigrid and domain decomposition. We highlight a few notable linear solver projects, current and past, at NCSA. Finally, we outline some proposals on how NCSA's support for linear solvers can be improved, and taken to the next level.


Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES The SvPablo Performance Analysis and Visualization System Dr. Luiz DeRose Advanced Computing Technology Center (ACTC) IBM Research Yorktown, NY Thursday, February 3, 11:00 AM 3269 Beckman Institute Abstract In this talk I will present SvPablo, a language and architecture independent performance analysis and visualization system. At present, SvPablo supports analysis of applications written in C, Fortran 77, Fortran 90, and HPF on a variety of sequential and parallel systems. In addition to capturing application data via software instrumentation, SvPablo exploits hardware performance counters, in order to capture the interaction of software and hardware. During execution of the instrumented code, the SvPablo library captures data and computes performance metrics on the execution dynamics of each instrumented construct on each processor. Because only statistics, rather than detailed event traces, are maintained, SvPablo can handle measurements of programs that execute for hours or days on hundreds of processors.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES An Interactive Environment for the Rapid Parallelisation of Fortran77 Mesh-Based Codes Dr. Constantinos Ierotheou Parallel Processing Research Group University of Greenwich London Monday, February 28, 3:00 PM 2269 Beckman Institute Abstract In this presentation it will be shown that one viable route to exploiting high performance parallel systems is based on a single environment that can transform serial Fortran programs to a parallel form. The parallel code generated is essentially the modified serial code, plus message passing calls for distributed memory machines or directives for shared memory machines. The effort of parallelising an existing serial code includes: - comprehension of the data flow through the code, - defining and applying a partitioning strategy - code modifications using execution control masks for parallel execution - the addition of communication calls to ensure correct parallel computation. These tasks should largely be borne by the parallelisation tools and not the code paralleliser. If this can be achieved then the time taken to complete the parallelisation of the original serial code is a fraction of that required to perform the same task manually by the code paralleliser. One major research effort over the last decade has resulted in the Computer Aided Parallelisation Tools (CAPTools). This interactive parallelisation environment utilises technology developed at the University of Greenwich and elsewhere. The environment includes the tightly coupled implementation of the major stages described above as well as actual parallel code generation. At the core of CAPTools is a sophisticated dependence analyser. The dependence analysis is performed by taking a global perspective of the original code, consequently, the analysis is fully interprocedural and does not require inlining of code. Results will be shown for benchmark codes and industrial codes on state of the art parallel hardware, parallelised using CAPTools. In many instances where a comparison can be made, the code parallelised using CAPTools is of comparable quality to that created through a manual parallelisation, but the key advantage is that the effort required to achieve such performance is orders of magnitude less than the manual process.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES Retargetable Programming with The KeLP Infrastructure Scott B. Baden Department of Computer Science and Engineering University of California, San Diego http://www.cse.ucsd.edu/users/baden Tuesday, April 25, 11:00 AM 5602 BI Abstract KeLP (Kernel Lattice Parallelism) is a framework for implementing portable scientific applications on distributed memory parallel computers. It is intended for applications with special needs, in particular, that adapt to data-dependent or hardware dependent conditions at run time. KeLP is currently used in full-scale applications including subsurface modeling, turbulence studies, and first principles simulation of real materials and supports multiblock or adaptive mesh refinement techniques through a set of geometric or structural abstractions. These structural abstractions are meta data types that represent various attributes of program execution, including: control flow, data decomposition, and data dependencies. These structural abstractions are architecture-neutral, and support the retargeting of applications to new generations of system architecture. I'll describe four different aspects of KeLP to illustrate this capability: a unified notation for expressing hierarchical parallelism on multi-tier, SMP based clustered architectures, which overcomes common defects or omissions in MPI and its implementations; automated optimization across heterogeneous collections of multiprocessors; management of large data sets; and program coupling.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES Parallel Programming through Web-based Tools Rudi Eigenmann Purdue University School of Electrical and Computer Engineering Wednesday, May 10, 1:30 PM 4169 Beckman Institute Abstract At Purdue University we are developing a parallel programming laboratory that is available to all users via standard Web browsers (http://punch.ecn.purdue.edu/Netcare/parHub.html). Users can get accounts, login, and use laboratory tools much the same way this would be done on an ordinary computer system. However the account resides "on the Web" and access to a particular machine is transparent for the user. The parallel programing lab is built in a project called NETCARE (NETwork computing in Computer Architecture Research and Education). In this talk I will describe the motivation for the design of the Web Parallel Programming Lab and the population of the lab with a growing number of tools, such as the Polaris/OpenMP compiler, the UrsaMinor interactive Performance tuning environment, and a range of related tools that facilitate the design and performance optimization of parallel programs. I will also briefly describe the underlying infrastructure, called PUNCH (Purdue University Network Computing Hubs), which is a large and widely-used system for online-computing.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES "Analyzing MPI Applications With Vampir" Werner Krotz-Vogel PALLAS GmbH Thursday, Sept 14, 1:30PM 2269 Beckman Institute Abstract Vampir is a leader in MPI performance analysis tools. This seminar will show how it does performance analysis of advanced message passing features like: MPI collective operations, optional MPI-IO and Cray shmem. Using Vampir 2.5 it becomes easy to: . Understand the application behavior, . Evaluate load balancing, . Analyze the performance of subroutines or code blocks, . Learn about communication patterns, parameters and performance, . Identify communication hotspots. Vampir does this with a variety of graphical displays of the application's runtime behavior: . Detailed timeline view of events and communication, . Statistical analysis of program execution, . Statistical analysis of communication operations, . System snapshot and animation, and . Dynamic calling tree profiling.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR Irregular Applications in Janus Jens Gerlach and Uwe Der {jens,uweder}@first.gmd.de Nov 15, 2000, 1:30 PM 5602 BI Janus is a conceptual framework and a C++ template library for irregular and regular scientific applications. It provides (potentially distributed) data structures to represent spatial structures and numerical data of these applications. Janus is implemented as a thin layer on top of the C++ Standard Template Library and uses MPI as its default parallel platform. It can also be configured to operate in a sequential mode. A port onto OpenMP is under progress. Janus rests on the observation that there essentially occur two kinds of objects in scientific applications. The first kind are referred to as "spatial structures" such as (rectangular) grids, meshes, or graphs. The other kind of objects are simulation data that are associated with the spatial structures. Typical examples are grid functions, element matrices on a finite element mesh, or (sparse) matrices. Important for the implementation of Janus components is that the spatial structures are considered prior and also more stable than the data associated to them. The conceptual framework of Janus is designed using the paradigm of generic programming that has been successfully applied in the C++ standard library and other libraries such as the Matrix Template Library (MTL) or the POOMA framework. In order to describe spatial structures, Janus has the concepts Domain and Relation. Here a key point are so-called "two phase domains/relations" whose access requirements fits the usage patterns of the spatial structures of complex scientific applications. Simulation data are described by the concept Domain Function. The Janus components, i.e., the template domain, relation and array classes are models of these concepts. In our talk, we point out why the Janus abstraction are particular well suited for a unified description and efficient implementation of irregular and regular application such as, finite element and finite difference methods including structured and unstructured mesh refinements. In particular we discuss the parallel implementation of a two-dimensional finite element method that includes automatic repartitioning of the adaptively refined mesh. Janus has been developed and is maintained by the Real World Computing Partnership, Japan at its distributed laboratory at GMD-FIRST, the German National Research Center for Information Technology in Berlin. For more information about Janus visit: http://www.first.gmd.de/promise.



Top
NCSA PERFORMANCE ENGINEERING SEMINAR HPCView: Tool for Application-Oriented Performance Tuning Rob Fowler Center for High Performance Software Research Rice University March 28, 2001, 10:00 PM 4269 BI Abstract Application performance tuning is a complex problem. Existing performance tools do not adequately support this process in one or moredimensions. In particular, tuning requires assembling information of different kinds from diverse sources and correlating that information with program text to pinpoint the causes of performance bottlenecks. We discuss some of the critical utility and usability issues for application-level performance analysis tools in the context of performance tools we built to support our own work on data layout and optimizing compilers. The main focus will be on HPCView, a language- and architecture-independent tool that combines data from a wide range of instrumentation sources and correlates it with program source code. The tool can also derive synthetic performance metrics from measured data. The results are assembled with the corresponding source code into hierarchical views and they are saved as an HTML database that can be analyzed portably and collaboratively using a commodity browser. In addition to daily use within our group, HPCView and MHSim, a memory hierarchy simulator with similar properties, are being used successfully by several code development teams in DoD and DoE laboratories. HPCView is available on SGI Origins at NCSA. After the formal talk, we will hold a tutorial that will cover additional examples and some of the "nuts and bolts" aspects of working with HPCView. Dr. Fowler will also be available to work with interested groups to start working with HPCView.


SCD home | PECM | PECM 2

/afs/ncsa.uiuc.edu/common/doc/web/SCD/Perf/pecm_visitors.html