Top
When: Friday, Dec 15, 1:30 PM
Where: 4169 Beckman
Introduction to PETSc
Bill Gropp
Senior Research Scientist
MCS, Argonne National Labs
PETSc, the Portable, Extensible Toolkit for Scientific Computation is a suite
of data structures and routines for the uni- and parallel-processor solution
of large-scale scientific application problems modeled by partial differential
equations.
PETSc 2.0 is fully usable from Fortran, C/C++, supports real and complex
numbers, and runs on most machines, now including Windows.
This talk provides a brief introduction to PETSc, including examples
illustrating the use of PETSc to create portable parallel applications.
Top
When: 11:00 AM, Friday, Dec 1211:00 AM
Where: 4169 Beckman
Parallel Software Engineering with KPTS and OpenMP
Sanjiv Shah, Kuck & Associates, Inc.
OpenMP is an industry standard programming specification for shared
memory parallel computers. It offers scalability and portability
across Unix and Windows NT systems.
Parallel Software Engineering requires more than the ability to
compile parallel applications. Tools to verify correct usage of
parallel constructs and visualize performance issues are
invaluable.
The talk will introduce Kuck & Associates' KAP/Pro Toolset for
OpenMP (KPTS), a collection of parallel programming tools for OpenMP.
The Toolset includes Assure, which automatically locates bugs in
OpenMP programs, and GuideView, which visualizes parallel program
performance. KPTS also includes a high-performance fully
compliant implementation of OpenMP.
Top
January 8 (Thursday), 1998 at 10:00 a.m.
4269 Beckman Institute
OpenMP on Silicon Graphics Platforms
Ramesh Menon
Silicon Graphics, Inc
OpenMP is an application program interfaces (API) for shared memory parallel
programming. Pioneered by SGI, it is fast becoming a de-facto industry stan-
dard as evidenced by the large number of Hardware and Software vendors en-
dorsing the standard.
The functionality is designed to enable programmers to write coarse grain,
scalable, shared memory parallel programs while also preserving the ability
to easily implement loop level parallelism. This talk will present the why,
what and how of OpenMP as it relates to Silicon Graphics platforms. A com-
parison with existing functionality will be presented along with some typi-
cal examples in Fortran. The C/C++ specification is expected to be released
in early 1998.
http://www.sgi.com/Technology/OpenMP
Top
TITLE: The Linear System Analyzer Project
SPEAKER: Prof Randall Bramley
Department of Computer Science
Indiana University-Bloomington
TIME: Friday, Jan 30, 11:00 AM
PLACE: 5239 Beckman
ABSTRACT:
The Linear System Analyzer (LSA) is a research project addressing the
general problem of developing software component architecture
frameworks for large-scale distributed scientific computing. One goal
is to provide tools allowing scientists to quickly build their own
problem-solving environments for problems in their application domain.
Another is to develop methods for connecting specialized resources such
as parallel databases with parallel computations.
The LSA is a particular implementation of the general framework,
providing both focus and a reality check. The LSA targets one of the
most difficult and common problems in scientific computing: the
numerical solution of large, sparse linear systems of equations. A
large number of approaches have been developed for solving those, and
it is clear to practitioners that no single solver will work well in
all or even a majority of cases. The LSA includes a GUI for people to
quickly wire together modules performing linear system input,
reordering, scaling, and direct and iterative solves. Modules can run
on distributed machines, and send data to other modules via Nexus.
This talk discusses the design goals and constraints for the LSA, and
their implications for the underlying PSE infrastructure. This is
still a work in early progress, and remaining research challenges will
also be discussed.
The LSA is a joint project with R. Bramley, D. Gannon, T. Stuckey,
J. Villacis, J. Balasubramanian, E. Akman, S. Diwan,
and M. Govindaraju.
Top
Performance Evaluation and Benchmarking with Large-Scope Applications
Rudolf Eigenmann
School of Electrical and Computer Engineering
Purdue University
The use of large, realistic applications for evaluating high-performance
computers is not yet standard practice. Typically, small and manageable test
programs are used for benchmarking, for measuring the results of research
projects, and for justifying project directions. As a consequence, computer
systems often fail to prove their value in practice under impact of real
applications.
This issue is being addressed in a joint academic/industrial initiative to
define and maintain a suite of realistic, industrial applications for
performance evaluation and benchmarking. The effort includes the SPEC
High-Performance Group (HPG), which has recently released the SPEChpc96 suite
for benchmarking large-scale computer platforms. Several academic members
participate in this effort with the objectives of characterizing these
applications from a variety of viewpoints (architecture, compiler, algorithm
angles) and making the results available to the research community.
This talk will describe the activities around the SPEC/HPG committee, the
applications included in the current benchmark suite and ongoing work. The
issues mostly dealt with in the academic members will then be presented. They
include performance models, evaluation methodologies, characterization tools,
and the creation of an infrastructure that makes the results of this effort
available to the community at large.
--------
A brief bio
Rudolf Eigenmann was a member of the research staff at the Center for
Supercomputing Research and Development, University of Illinois from 1988
through 1995. In 1995 he joined the faculty at the School of Electrical and
Computer Engineering, Purdue University. He also currently serves as the
chairman of the SPEC High-Performance Group. His interests include compilers,
tools, programming methodologies, and performance evaluation of parallel
computers.
Top
Where: 3269 BI
When: Friday, Feb 27, 10:30 AM
Strassen's Algorithm: A Practical Method for Fast Matrix Multiplication
Steven Huss-Lederman
CS Department
University of Wisconsin-Madison
In 1969, Strassen published a paper with the innocuous title "Gaussian
Elimination Is Not Optimal". That paper describes an algorithm for
multiplying two order n matrices in O(n^lg 7) = O(n^2.807) operations.
This is potentially a significant savings over the conventional
algorithm which takes O(n^3) operations. That paper began efforts to
understand the minimum work required for performing matrix
multiplication.
In spite of the significance of this and subsequent papers,
Strassen-type algorithms have only seen limited use in real
applications. This results from several concerns that have been
voiced over the years. These include dealing with odd-size matrices,
non-square matrices, numerical stability, and temporary memory.
However, the most damning claim is that it only applies to enormous
matrices so is not of practical interest.
As part of the PRISM project, we have undertaken a systematic study of
Strassen's algorithm. We have shown, along with others, that each of
above concerns can be readily addressed. We have produced a portable,
public domain library which implements this algorithm and achieves
high performance. This library is plug-and-play with the BLAS GEMM
routine.
This talk will assume no previous knowledge and should be accessible
to all who are interested. The work was performed in collaboration
with Elaine Jacobson, Jeremy Johnson, Anna Tsao, and Tom Turnbull.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
Monday, March 2, 4269 BI 2:00 p.m.
SPRNG: A Scalable Library For Pseudorandom Number Generation
Dr. Michael Mascagni, Coordinator
Program in Scientific Computing
University of Southern Mississippi
Abstract:
Providing high quality pseudorandom numbers for parallel computers
supplies many deep and fascinating mathematical problems as well as unique
software engineering challenges. One of the more practical issues in
parallel pseudorandom number generation is finding methods that provide
portability and reproducibility across architectures. We summarize some
recent developments in the design and analysis of pseudorandom number
generators for parallel computers that are portable and reproducible.
These results are the basis for a DARPA sponsored project for the
development of a scalable library for pseudorandom number generation that
is based at the University of Illinois, Urbana-Champaign. It is hoped
that this scalable library will be the seed for a more comprehensive
problem solving environment for Monte Carlo computations on scalable
platforms. In addition, it is hoped that this work can be used in the ASCI
project, where Monte Carlo calculations are a vital part of the total
simulation effort.
Web site:
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
Title: NAG Software Solutions for High Performance Computing
Speaker: Arnold R. Krommer, NAG
When: Thurs, April 9, 2:00 PM
Where: 3269 BI
Abstract: The Numerical Algorithms Group (NAG) is a leading provider of
technical and scientific software. NAG has built a profound
world-wide reputation by developing and distributing
high-quality numerical, visualization, symbolic, and simulation
software for personal computers, workstations, and
supercomputers for more than 25 years. NAG software is used
at thousands of academic, research and industrial sites everyday
to solve complex computational problems in physics, chemistry,
biology, engineering, and financial modeling.
This presentation provides an overview of the broad range of
products offered by NAG, focusing mainly on software relevant
to high performance computing. The products highlighted include
the IRIS Explorer visualization system, the NAG Fortran SMP
(Symmetric Multi-Processor) Library, and the NAG Parallel
Library.
A major driving force for the further development of the
NAG Parallel Library is the Parallel Industrial NumErical
Applications and Portable Libraries (PINEAPL) Project which
NAG is currently coordinating. The presentation outlines the
aims of the PINEAPL Project, describes its progress, summarizes
preliminary results and will provide details on FREE access to
the software.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
When: Wed, May 6, 1:30 PM
Where:
Exciting New Trends in Computer Architecture
--------------------------------------------
Prof. Josep Torrellas
Computer Science Department
University of Illinois Urbana-Champaign
http://iacoma.cs.uiuc.edu
Dramatic increases in the number of transistors that can be integrated
on a VLSI chip provide exciting opportunities for new computer
architectures. Examples of fertile areas are processor-memory
integration, support for speculative parallelization, and support for
commercial workloads. In this talk, I will outline our work in some of
these areas.
In the area of processor-memory integration, we are exploring
cost-effective designs for distributed shared-memory machines built
around off-the-shelf processor-in-memory (PIM) chips. These machines
must exploit physical locality while, at the same time, use PIM chips
largely designed for uniprocessors. In addition, we are considering
the use of PIM chips as the intelligent memory of a PC, workstation,
or collection of them. This intelligent memory interleaves simple
computation units with DRAM arrays to form a flexible
computation/storage fabric.
With speculative parallelization, we are able to run in parallel codes
that the compiler cannot parallelize. The idea is to extend the cache
coherence protocol hardware of the machine to detect any dependence
violation. When a violation is detected, the work is redone in the
right order. We are exploring two approaches, namely speculation
across the processors of a distributed shared-memory machine, and
within a multiprocessor chip.
******
JOSEP TORRELLAS is a faculty at the Computer Science Department. His research
interests are single- and multi-processor computer architectures. He received
a PhD in Electrical Engineering from Stanford University. He is a recipient of
a 1994 National Science Foundation Young Investigator Award.
Top
========================================================================
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
HINT, NetPIPE, and more
John Gustafson, Don Heller
Scalable Computing Laboratory
Ames Laboratory, US Dept. of Energy
Iowa State University
gus@ameslab.gov, heller@ameslab.gov
http://www.scl.ameslab.gov
Wednesday, June 3, 2:00 PM
5239 Beckman Institute
The Scalable Computing Laboratory has developed several benchmark
programs that demonstrate the capability of computer systems and
networks over a wide range of applications. HINT is a computing task
that behaves like many programs, in that it will not stay within the
confines of any one level of the memory hierarchy. The output of the
HINT computation is guaranteed bounds on the answer, so the
performance of a system is measured by the quality of the response for
work performed, and the rate of quality improvement with increased
resources. HINT is easily extended to vector, shared-memory or
distributed-memory systems, and we have acquired an extensive database
of measurements. By varying the problem size, HINT may be considered a
superset of the usual fixed-size benchmarks. By varying the data
types, HINT can discover a bias toward integer or floating-point
computation. By comparing systems with the same general
characteristics, the effect of changes in the memory hierarchy or
processor pool can easily be detected. By comparing disparate systems
using the same application, some sense of the "right machine for the
job" can be gained.
NetPIPE is a communication task that uses point-to-point messages in
protocol-independent manner, to assess the performance of
communication bound applications. Again, by varying the communicated
block size, NetPIPE clearly shows the effects of latency, bandwidth,
and anomalous behavior. Its original motivation was to help select a
network infrastructure for various types of applications and
communication with a CAVE virtual reality environment. We have also
used it as a guide to the revision of Ethernet device drivers, both
improving performance and decreasing variability. This has been useful
for our local PC cluster projects. Some additional supporting tools
for the analysis of HINT results, verification of the measurements,
and hardware performance counters, will also be discussed.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
Implementation and performance analysis of
a large-scale groundwater simulation code
G. Mahinthakumar
Center for Computational Sciences
Oak Ridge National Laboratory
http://www.ccs.ornl.gov/staff/kumar/
Friday, July 31, 11:00AM
4169 Beckman Institute
Abstract
--------
A parallel finite-element groundwater code for simulating
single-phase flow and multicomponent transport has been developed for
a variety of parallel architectures. We will describe the
implementation and the analyze the performance of this code with
respect to scalability and cache effects. We will also present some
applications of this code. The code exhibits characteristics that
are typical to many simulation codes such as explicit communication,
global reduction operations, sparse matrix operations, and parallel
I/O. The parallel implementation is based on domain decomposition
with explicit message passing using MPI. We analyze performance on
architectures such as the Intel Paragon, IBM SP, Origin 2000, and
Convex Exemplar SPP-2000. Our performance results will be focused on
the multigrid and Krylov solvers used in the flow and transport
modules but some results with respect to the overall performance will
also be presented.
Our performance results show that the implementation is scalable on
architectures which have a good ratio ( > 2 ) of communication
bandwidth (MB/sec) to peak performance (Mflops). The single node
performance is mainly affected by memory bandwidth and secondary
cache size because sparse matrix operations dominate the
computations. On machines such as the Origin 2000 which have a good
sized (4 MB) secondary cache we achieve better percentage of the peak
for small to moderate size problems than machines which have no
secondary cache. For the multicomponent transport code we implemented
certain computational and memory saving features which enables
enhanced resolution of the model with rapid solution times. We are
now able to solve nonlinear coupled partial differential equation
systems (arising in multicomponent reactive transport ) with more
than 120 million degrees of freedom in about 5-10 seconds per time
step on machines such as the 1024-processor Intel Paragon XPS/150. To
our knowledge, solution to groundwater transport problems of this
size has not been reported before in literature.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
NAG & HPC
Dr Stef Salvini
Numerical Algorithms Group, Ltd
Oxford, England
Thursday, Nov 19, 11:00AM
4169 Beckman Institute
Abstract
--------
NAG (The Numerical Algorithms Group) has been a major provider
of numerical libraries for many years. In particular, we have been
active in HPC through our products and through research and
collaborations with other institutions.
This talk will concentrate on the work we are carrying out in
parallel numerical computing.
Parallel numerical libraries can fulfil a number of important roles:
* Encapsulate expertise not always available to application
developers
* Allows levels of performance otherwise difficult to achieve
* Allow non specialists to have access to parallelism
This talk will deal mostly with the NAG SMP Library. The role of
compiler directives, in particular that of Open MP, in the development
of efficient, portable code will be reviewed.
New results obtained at NCSA in collaboration with AHPCC/HPCERC at UNM show
that a multi-threaded alternative approach can return performances
and scalabilities superior to those provided by vendors' mathematical
libraries.
Finally, recent work on hybrid parallelism, carried out at UNM, will
be presented. This work originates partly as a consequence of the SMP work,
and it is intended to provide a possible strategy for employing existing
message passing and SMP technologies.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
PECM Performance Tools In-House Workshop
KAI's OpenMP Tools
Presenter: Bill Magro, KAI
Friday, Jan 15, 9:00AM - 12:00 noon
This is a reminder that we are going ahead with the tutorial
on KAI's OpenMP tools for Fortran on Friday, Jan 15, from 9:00 AM
to 12:00 noon, to be taught by Bill Magro of Kuck and Associates.
We have the Numerical Lab in 3514 Beckman reserved. The plan is for
Bill to describe the tools, and for attendees to be able to login and
try them, hands on.
I have told Bill that he should assume a basic familiarity with OpenMP
(see the Fortran document in http://www.openmp.org), and focus on the
tools.
You may want to play with OpenMP on modi4 before this workshop, and
bring along a few simple codes.
Top
PERFORMANCE ENGINEERING SEMINAR SERIES, NCSA
Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in
Single System Image Shared-Memory Multiprocessors
Yan Solihin, Vinh Lam, and Josep Torrellas
Department of Computer Science
University of Illinois at Urbana-Champaign
solihin,lam,torrella@cs.uiuc.edu
Thursday, May 20, 11:00 AM
4031 Beckman Institute
Abstract
Single system image (SSI) shared-memory multiprocessors provide an attractive
combination of cost-effective architectural design and, thanks to the
single-image, shared-memory abstraction, relative ease of programming. Unfortunately,
it is well-known that tuning applications for scalable performance in these machines
is often a time-consuming effort.
While performance monitoring tools can help, they often present only low-level
information, lack integration, and are usually costly to run. In this talk, we outline
an empirical model that isolates and quantifies scalability bottlenecks in shared-memory
parallel applications running on SSI shared-memory machines, in a relatively inexpensive
and integrated manner. The scalability bottlenecks currently quantified include
insufficient caching space, load imbalance, and synchronization. The model uses as inputs
measurements from hardware event counters in the R10000 processors of the Origin 2000.
A major advantage of the model is that it is quite inexpensive to run: it only needs the
event counter values for the application running with a few different processor counts
and data set sizes.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
A Performance Comparison of Fortran 90 with MPI and OpenMP on the Origin 2000
Jay Hoeflinger
Senior Research Scientist
Center for Simulation of Advanced Rockets
University of Illinois at Urbana-Champaign
Tuesday, September 7, 1:30 PM
4169 Beckman Institute
Fortran 90 has become increasingly popular in the scientific community
as a language for writing application codes, because (among many other
things) it offers storage management facilities and more flexible data
structures than does Fortran 77. To make it even more useful, the two
major parallel programming paradigms have industry-standard Fortran 90
bindings. SPMD parallelism may be expressed through the use of MPI
calls, and shared memory parallelism may be expressed through OpenMP.
But the two paradigms cause certain Fortran 90 constructs to display
different performance characteristics.
This talk will compare MPI with OpenMP in a general way, then describe
an experiment run on the Origin 2000 in which an application using MPI
was re-implemented with OpenMP in such a way that the performance
effects of the paradigms on each Fortran 90 construct were comparable.
The performance differences will be described in detail, and
conclusions will be drawn about the trade-offs involved in writing
high-performance parallel programs using Fortran 90 on the Origin
2000.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
Linear Solver Technologies at NCSA
Faisal Saied
Performance Engineering and Computational Methods Group
National Computational Science Alliance
University of Illinois at Urbana-Champaign
Friday, October 15, 11:00 AM
5239 Beckman Institute
Abstract
The solution of linear systems of equations is an important part of
many large scale computational science applications. In this talk we
review the linear solver requirements of several important
applications at NCSA, and also the set of technologies, both
algorithmic and parallel software, available to attack these problems.
The linear solver technologies at NCSA include software from Alliance
ET team partners, academic and government lab researchers, commercial
software vendors, and application scientists, and include some of the
best high performance solutions available, ranging from dense and
sparse direct solvers, to preconditioned Krylov subspace solvers,
multigrid and domain decomposition. We highlight a few notable linear
solver projects, current and past, at NCSA. Finally, we outline some
proposals on how NCSA's support for linear solvers can be improved,
and taken to the next level.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
The SvPablo Performance Analysis and Visualization System
Dr. Luiz DeRose
Advanced Computing Technology Center (ACTC)
IBM Research
Yorktown, NY
Thursday, February 3, 11:00 AM
3269 Beckman Institute
Abstract
In this talk I will present SvPablo, a language and architecture
independent performance analysis and visualization system. At present,
SvPablo supports analysis of applications written in C, Fortran 77,
Fortran 90, and HPF on a variety of sequential and parallel
systems. In addition to capturing application data via software
instrumentation, SvPablo exploits hardware performance counters, in
order to capture the interaction of software and hardware. During
execution of the instrumented code, the SvPablo library captures data
and computes performance metrics on the execution dynamics of each
instrumented construct on each processor. Because only statistics,
rather than detailed event traces, are maintained, SvPablo can handle
measurements of programs that execute for hours or days on hundreds of
processors.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
An Interactive Environment for the Rapid
Parallelisation of Fortran77 Mesh-Based Codes
Dr. Constantinos Ierotheou
Parallel Processing Research Group
University of Greenwich
London
Monday, February 28, 3:00 PM
2269 Beckman Institute
Abstract
In this presentation it will be shown that one viable route to
exploiting high performance parallel systems is based on a single
environment that can transform serial Fortran programs to a parallel
form. The parallel code generated is essentially the modified serial
code, plus message passing calls for distributed memory machines or
directives for shared memory machines. The effort of parallelising an
existing serial code includes:
- comprehension of the data flow through the code,
- defining and applying a partitioning strategy
- code modifications using execution control masks for parallel execution
- the addition of communication calls to ensure correct parallel
computation. These tasks should largely be borne by the
parallelisation tools and not the code paralleliser. If this can be
achieved then the time taken to complete the parallelisation of the
original serial code is a fraction of that required to perform the
same task manually by the code paralleliser.
One major research effort over the last decade has resulted in the
Computer Aided Parallelisation Tools (CAPTools). This interactive
parallelisation environment utilises technology developed at the
University of Greenwich and elsewhere. The environment includes the
tightly coupled implementation of the major stages described above as
well as actual parallel code generation. At the core of CAPTools is a
sophisticated dependence analyser. The dependence analysis is
performed by taking a global perspective of the original code,
consequently, the analysis is fully interprocedural and does not
require inlining of code.
Results will be shown for benchmark codes and industrial codes on
state of the art parallel hardware, parallelised using CAPTools. In
many instances where a comparison can be made, the code parallelised
using CAPTools is of comparable quality to that created through a
manual parallelisation, but the key advantage is that the effort
required to achieve such performance is orders of magnitude less than
the manual process.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
Retargetable Programming with The KeLP Infrastructure
Scott B. Baden
Department of Computer Science and Engineering
University of California, San Diego
http://www.cse.ucsd.edu/users/baden
Tuesday, April 25, 11:00 AM
5602 BI
Abstract
KeLP (Kernel Lattice Parallelism) is a framework for implementing
portable scientific applications on distributed memory parallel
computers. It is intended for applications with special needs,
in particular, that adapt to data-dependent or hardware dependent
conditions at run time. KeLP is currently used in full-scale
applications including subsurface modeling, turbulence studies,
and first principles simulation of real materials and supports
multiblock or adaptive mesh refinement techniques through a set of
geometric or structural abstractions.
These structural abstractions are meta data types that
represent various attributes of program execution, including: control
flow, data decomposition, and data dependencies. These structural
abstractions are architecture-neutral, and support the retargeting
of applications to new generations of system architecture. I'll
describe four different aspects of KeLP to illustrate this
capability: a unified notation for expressing hierarchical
parallelism on multi-tier, SMP based clustered architectures,
which overcomes common defects or omissions in MPI and its
implementations; automated optimization across heterogeneous
collections of multiprocessors; management of large data sets;
and program coupling.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
Parallel Programming through Web-based Tools
Rudi Eigenmann
Purdue University
School of Electrical and Computer Engineering
Wednesday, May 10, 1:30 PM
4169 Beckman Institute
Abstract
At Purdue University we are developing a parallel programming
laboratory that is available to all users via standard Web browsers
(http://punch.ecn.purdue.edu/Netcare/parHub.html). Users can get
accounts, login, and use laboratory tools much the same way this would
be done on an ordinary computer system. However the account resides
"on the Web" and access to a particular machine is transparent for the
user. The parallel programing lab is built in a project called NETCARE
(NETwork computing in Computer Architecture Research and Education).
In this talk I will describe the motivation for the design of the Web
Parallel Programming Lab and the population of the lab with a growing
number of tools, such as the Polaris/OpenMP compiler, the UrsaMinor
interactive Performance tuning environment, and a range of related
tools that facilitate the design and performance optimization of
parallel programs. I will also briefly describe the underlying
infrastructure, called PUNCH (Purdue University Network Computing
Hubs), which is a large and widely-used system for online-computing.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR SERIES
"Analyzing MPI Applications With Vampir"
Werner Krotz-Vogel
PALLAS GmbH
Thursday, Sept 14, 1:30PM
2269 Beckman Institute
Abstract
Vampir is a leader in MPI performance analysis tools. This seminar will
show how it does performance analysis of advanced message passing
features like: MPI collective operations, optional MPI-IO and Cray shmem.
Using Vampir 2.5 it becomes easy to:
. Understand the application behavior,
. Evaluate load balancing,
. Analyze the performance of subroutines or code blocks,
. Learn about communication patterns, parameters and performance,
. Identify communication hotspots.
Vampir does this with a variety of graphical displays of the application's
runtime behavior:
. Detailed timeline view of events and communication,
. Statistical analysis of program execution,
. Statistical analysis of communication operations,
. System snapshot and animation, and
. Dynamic calling tree profiling.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR
Irregular Applications in Janus
Jens Gerlach and Uwe Der
{jens,uweder}@first.gmd.de
Nov 15, 2000, 1:30 PM
5602 BI
Janus is a conceptual framework and a C++ template library
for irregular and regular scientific applications.
It provides (potentially distributed) data structures to represent
spatial structures and numerical data of these applications.
Janus is implemented as a thin layer on top of the C++ Standard Template
Library and uses MPI as its default parallel platform.
It can also be configured to operate in a sequential mode.
A port onto OpenMP is under progress.
Janus rests on the observation that there essentially occur
two kinds of objects in scientific applications.
The first kind are referred to as "spatial structures" such as
(rectangular) grids, meshes, or graphs.
The other kind of objects are simulation data that are associated with
the spatial structures. Typical examples are grid functions, element
matrices on a finite element mesh, or (sparse) matrices.
Important for the implementation of Janus components is that the
spatial structures are considered prior and also more stable than the
data associated to them.
The conceptual framework of Janus is designed using the paradigm
of generic programming that has been successfully applied in the
C++ standard library and other libraries such as the Matrix Template
Library (MTL) or the POOMA framework.
In order to describe spatial structures, Janus has the concepts
Domain and Relation.
Here a key point are so-called "two phase domains/relations" whose
access requirements fits the usage patterns of the spatial structures
of complex scientific applications.
Simulation data are described by the concept Domain Function.
The Janus components, i.e., the template domain, relation and array
classes are models of these concepts.
In our talk, we point out why the Janus abstraction are particular
well suited for a unified description and efficient implementation of
irregular and regular application such as, finite element and finite
difference methods including structured and unstructured mesh refinements.
In particular we discuss the parallel implementation of a two-dimensional
finite element method that includes automatic repartitioning of
the adaptively refined mesh.
Janus has been developed and is maintained by the Real World Computing
Partnership, Japan at its distributed laboratory at GMD-FIRST, the German
National Research Center for Information Technology in Berlin.
For more information about Janus visit:
http://www.first.gmd.de/promise.
Top
NCSA PERFORMANCE ENGINEERING SEMINAR
HPCView: Tool for Application-Oriented Performance Tuning
Rob Fowler
Center for High Performance Software Research
Rice University
March 28, 2001, 10:00 PM
4269 BI
Abstract
Application performance tuning is a complex problem. Existing performance
tools do not adequately support this process in one or moredimensions.
In particular, tuning requires assembling information of different kinds
from diverse sources and correlating that information with program text
to pinpoint the causes of performance bottlenecks. We discuss some of
the critical utility and usability issues for application-level performance
analysis tools in the context of performance tools we built to support our
own work on data layout and optimizing compilers.
The main focus will be on HPCView, a language- and architecture-independent
tool that combines data from a wide range of instrumentation sources and
correlates it with program source code. The tool can also derive synthetic
performance metrics from measured data. The results are assembled with the
corresponding source code into hierarchical views and they are saved as an
HTML database that can be analyzed portably and collaboratively using a
commodity browser. In addition to daily use within our group, HPCView and
MHSim, a memory hierarchy simulator with similar properties, are being used
successfully by several code development teams in DoD and DoE laboratories.
HPCView is available on SGI Origins at NCSA. After the formal talk, we will
hold a tutorial that will cover additional examples and some of the "nuts and
bolts" aspects of working with HPCView. Dr. Fowler will also be available to
work with interested groups to start working with HPCView.
SCD home |
PECM |
PECM 2
/afs/ncsa.uiuc.edu/common/doc/web/SCD/Perf/pecm_visitors.html