Workshop on Parallel Programming for Resilience and Energy Efficiency -- PP4REE
(to be held as part of Principles and Practice of Parallel Programming -- PPoPP 2016)
March 12-16, 2016
Nowadays, the number of components in High Performance Computing (HPC) systems increases at the pace
dictated by Moore's Law, but the mean time between failures (MTBF) for the complete system is significantly shrinking.
For example, when accounting for the instruction & data caches and register files, the mean time
between soft errors for the Sequoia supercomputer at Lawrence Livermore National Laboratory is estimated to be 1.5 days.
As HPC systems move into the Exascale era, the number of system components will increase by up to
three orders of magnitude, and MTBF will further deteriorate, thus promoting resilience into a fundamental
challenge. This scenario renders current system solutions to resilience, such as coordinated checkpointing, unfeasible,
and motivates the use of algorithmic, programming model, or runtime system approaches to improve
the resilience of parallel applications at scale.
While a resilience crisis is looming in the HPC domain, the end of Dennard~scaling (i.e., the ability to shrink the feature size of integrated circuits while
maintaining a constant power density) has pushed energy consumption into a primary design principle, in par with performance,
for which holistic solutions are currently necessary, from the hardware to the application software.
The Green500 ranking, based on the LINPACK benchmark, shows remarkable improvements in the MFLOPS/W (millions of floating-point
arithmetic operations per Joule)
of recent HPC facilities. However, with the cost of 1~MW being close to $1~million, any improvement on this
metric will surely have an enormous positive impact on the deployment of future Exascale systems. Despite a flurry of research
in recent years on techniques that improve the energy-efficiency of HPC systems via software intervention, energy remains transparent to existing
parallel programming models used in production settings.
The quest for higher energy-efficiency in future HPC systems is inherently connected to the quest for enhanced resilience for two reasons:
First, resilience techniques have a non-trivial energy cost. Second, ongoing efforts to further improve the energy-efficiency of hardware
at the device level (such as operating hardware below its nominal margins or replacing DDR technology with non-volatile memory technologies)
may compromise hardware reliability.
The purpose of this workshop is to explore the space of techniques for improving the resilience and energy-efficiency (REE) of parallel programs at the
algorithmic and language levels. We are particularly interested in papers that present cross-cutting techniques that trade energy-efficiency with resilience.
We solicit original papers that include but are not limited to the following topics:
* Programming languages, interfaces, and general software techniques for REE.
* Scheduling and mapping for REE.
* Run-times for REE.
* Algorithmic techniques for REE.
* Programming models for computing paradigms that improve REE, such as near-threshold computing, approximate computing, or neuromorphic computing.
* Applications and cases studies of success.
Papers should not exceed ten single-space double-column pages (including figures, tables and references) using a 10-point font on 8.5x11-inch pages.
We suggest to use IEEE two-column template for conference proceedings. Submissions will be judged based on correctness, originality, technical
strength, significance, presentation, quality and appropriateness. Submitted papers should not have appeared in or be under consideration for another
venue. A full peer-process will be followed with each paper being reviewed by at least 3 members of the program committee. Submissions will be made through EasyChair
Submission of Papers: November 23, 2015.
Notification of Acceptance: January 5, 2016.
Workshop: March 12-16, 2016 (half day).
* Christos D. Antonopoulos, Electrical and Computer Engineering Department of the University of Thessaly, Greece.
* Dimitrios S. Nikolopoulos, EEECS, at Queen's University of Belfast, Northern Ireland, United Kingdom.
* Oscar Plata, Department of Computer Architecture at the University of Malaga, Spain.
* Enrique S. Quintana-Orti, Department of Computer Engineering & Sciences, Universidad Jaume~I of Castellon, Spain.
To be confirmed.
Extended versions of best papers will appear, after an additional review process, in a special issue of Elsevier Parallel Computing journal.