Using the wireless network at NCSA

All meetings will be held in the NCSA building—1205 W. Clark St., Urbana, Illinois—unless otherwise noted.

Thursday, July 20

8:00 am
8:45 am
Opening Remarks
Franck Cappello, Argonne National Laboratory
9:00 am
PyCOMPSs: Programming Distributed Computing with Sequential Python
Rosa M Badia, BSC
Python has been adopted by a large community of application developers, including HPC. However, Python does not natively support parallelization: its more popular interpreter prevents Python threads to run in parallel. While several Python modules support parallelism (multiprocessing, Parallel Python), PyCOMPSs is the first approach to support a task-based programming model where the tasks' data-dependence are inferred at runtime. PyCOMPSs has a simple and non-intrusive syntax based on decorators to annotate the code tasks and a tiny API for synchronization. PyCOMPSs has been integrated with the Jupyter notebook, a web application very popular between Python programmers. PyCOMPSs relies in a powerful runtime that builds the tasks' graph, performs all scheduling and data transfers decisions, and supports the execution of the codes in distributed parallel platforms. It is also integrated with persistent storage technologies.
10:30 am
11:00 am
StarPU-MPI: Achieving Scalable Performance on Clusters with the Sequential Task Flow (STF) Programming Model
Samuel Thibault, INRIA
Task-based programming models are getting momentum in the HPC field: various runtime systems and standards under development are using this paradigm to express unstructured parallelism, balance computation load on heterogeneous systems, etc. The StarPU runtime system is one of the leaders in the field, it performs advanced task scheduling for accelerators, optimizes data transfers, leverages persistent storage, supports distributed execution, proposes simulated execution.

StarPU advocates the use of the now-common Sequential Task Flow (STF) programming model, in which dependencies between tasks are inferred automatically, thus making the expression of parallel algorithms very simple and safe, while enabling all features of StarPU.

This programming model actually also allows for very scalable performance over large clusters, by making it possible to achieve a completely decentralized execution.

This tutorial will present the StarPU STF programming interface and features, and how completely decentralized execution can be achieved.
12:30 pm
2:00 pm
PaRSEC: A Distributed Task-Based Programming Paradigm for HPC
George Bosilca, University of Tennessee
Current hardware are capable of delivering a sheer amount of computational power if enough parallelism is available. This tremendous computational power is however challenging to harness, as current programming models struggle with the algorithmic load imbalance or uneven hardware settings. To increase the potential to fully utilise the available computations power, an appropriate software infrastructure is needed, one where the developers and the runtime collaborate together to expose, orchestrate and confront the maximal degree of parallelism from algorithms. This talk surveys state-of-the-art task-based programming paradigms, and highlight the potential changes they bring to parallel programming. We then cover more in details the PaRSEC programming environment and two of its domain specific extensions, providing a solid foundation to more complex usage scenario. The participants will learn how a separation of the algorithms, the data involved, and the available computational resources can bring unexpected level of performance even to legacy applications. In particular, we will discuss how this model is applicable to linear algebra, dense and sparse, and what its impact can be in the short term for many large scale scientific codes.

Content Level: Beginner 50%, Intermediate 35%, Advanced 15%

More info:
3:30 pm
4:00 pm
Task-Based Programming with OpenMP (and OmpSs)
Xavier Martorell, BSC
In this talk, we present the main OpenMP and OmpSs features related to task-based programming. OpenMP is the standard parallel programming model developed to support multicore architectures. It is based on compiler directives, that are understood by C, C++ and FORTRAN compilers. Since OpenMP 3.0, we have the support for tasking. An OpenMP task is a portion of code that can be executed (in)dependently of other parts of the code. Since OpenMP 4.0, we have the support for task dependences, which allow the programmer so specify the data that tasks are going to use and let the runtime system to determine which is a proper task execution order. Within OmpSs, we include additional task dependence types that increase the flexibility for the runtime system to schedule the tasks. The talk will include examples of code annotations, with several benchmarks, like matrix multiplication, n-body, cholesky.
5:30 pm
Dinner (invitation only)

Friday, July 21

8:00 am
Registration/Continental breakfast
8:30 am
Programming Models for Accelerators
Volodymyr Kindratenko, University of Illinois at Urbana-Champaign
CUDA, OpenCL, and more recently OpenACC programming models have been successfully used to implement complex scientific codes on large-scale HPC systems with GPU accelerators, such as Blue Waters. This presentation will give an overview of the programming models for GPU-based HPC systems focusing on best coding practices, code portability, and performance.
10:00 am
10:30 am
Sanjay Kale, University of Illinois at Urbana-Champaign
Modern supercomputers present several challenges to effectively program parallel applications: ex- posing concurrency, optimizing data movement, controlling load imbalance, addressing heterogeneity, handling variations in application behavior, tolerating system failures, etc. By leveraging Charm++, application developers have been able to successfully address these challenges and efficiently run their code on large supercomputers. The foundational concepts underlying Charm++ are overdecomposition, asynchrony, migratability, and adaptivity. A Charm++ program specifies collections of interacting objects, which are assigned to processors dynamically by the runtime system.

Charm++ provides an asynchronous, message-driven, task-based programming model with migratable objects and an adaptive runtime system that controls execution. It automates communication overlap, load balance, fault tolerance, checkpoints for split-execution, power management, MPI interoperation, and promotes modularity.

This talk will begin with an explanation of the foundational concepts, followed by an exposition of the syntax and specific capabilities, with several simple example programs, as well as real- world application case studies. We will illustrate how users can write Charm++ programs in C++ which can scale across a range of supercomputers and scales.
1:30 pm
PGAS Programming Models and Coarray
Hitoshi Murai, RIKEN
The partitioned global address space (PGAS) model has emerged as a programming model for large-scale clusters. Coarrays, originally supported by the Fortran 2008 standard, is one of well-defined and the most popular ones supported by major venders in commercial compilers. RIKEN AICS has been working on the implementation of Coarray in Fortran and C as a part of PGAS XcalableMP language. In this tutorial, the basic ideas of the PGAS model will be presented with some examples. In particular, coarrays will be explained in detail.
3:00 pm
3:30 pm
HPC Component Models
Christian Perez, INRIA
Software component models emphasize the separation of concerns between the coding of building blocks and their composition into an application. It simplifies code re-use or replacement (for example testing a new implementation or algorithm) as well as it eases application deployment as its structure is known.

From the traditional software component point of view, this talk will cover the challenges of expressing HPC oriented patterns in such models to have both high performance and good software engineering properties. Several models and examples will be presented to illustrate the challenges on patterns such as parallel method invocations, collective operations, data sharing, task dependency as well as on higher level features such as hierarchy and genericity.
5:00 pm