- Keynote Address:
On the path to Exascale: Deploying an Emerging HPC Architecture
- Keynote Address:
- Finding Concurrency Bugs at Scale
- Activities in the ASC Working Group on Exascale Tools
- Oak Ridge Leadership Computing Facility Tool Needs perspective
Abstract: Recent reports from various organizations have identified multiple challenges on the road to Exascale computing systems. These challenges include the unrelenting issues of performance, scalability, and productivity in the face of ever-increasing complexity; they also include the relatively new priorities of energy-efficiency and resiliency. Not coincidentally, recently announced HPC architectures, such as RoadRunner, Tianhe, Tsubame, Titan, Dash, and Nebulae, illustrate that emerging technologies, such as graphics processors and non-volatile memory can provide innovative solutions to address these challenges. Our Keeneland project is deploying a GPU based system for the NSF user community. Early experiences on these systems have demonstrated performance and power benefits; however, these systems have multiple challenges: low programmer productivity, low portability, and very sensitive performance stability. Taken together, these issues are impeding the adoption of these innovative architectures by the broader community. We can lower these barriers by ensuring existing tools work on these systems, and by developing novel tools that make users more productive.
Bio: Jeffrey Vetter is a Distinguished R&D Staff Member and Group Leader at Oak Ridge National Laboratory (ORNL), and a Joint Professor of Computer Science at the Georgia Institute of Technology (GT). Also, Jeff is the Project Director for the NSF Track 2D Experimental Computing Facility, named Keeneland, for large scale heterogeneous computing using graphics processors, and the Director of the NVIDIA CUDA Center of Excellence. He earned his Ph.D. in Computer Science from the Georgia Institute of Technology, and, currently, his research explores emerging architectures for HPC.
Full bio at http://ft.ornl.gov/~vetter
Abstract: In this talk we will describe practical, low overhead methods for finding concurrency bugs in HPC applications at scale.
We have developed a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C programs and it handles programs written using shared memory approaches as well as bulk communication. Active testing works in two phases: in the first phase, it performs an imprecise dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. In the second phase, active testing re-executes the program by actively controlling the thread schedule so that the data races reported in the first phase can be confirmed. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and split-phase barriers. Another key feature of our technique is that it is precise---a data race confirmed by active testing is an actual data race present in the program; however, being a testing approach, our technique can miss actual data races. We have implemented the framework for the UPC programming language and demonstrated scalability up thousands cores for programs with both fine-grained and bulk (MPI style) communication. The tool confirmed previously known bugs and uncovers several unknown ones. Our extensions capture constructs proposed in several modern programming languages for High Performance Computing, most notably non-blocking barriers and collectives.
In addition we will describe ideas to provide the best and universal debugging tool at scale: printf statements. While they are indispensable for serial program debugging, parallel printf suffers from content explosion and lack of correlation between threads. We propose to replace printf with concurrent printf which will print output only when some interesting concurrent state is reached. The interesting states could be discovered automatically using dynamic program analysis or could be specified by developers using parallel assertions (our programming construct).
In general, we would like to provide programming constructs that would enable the developers to print useful information and control thread schedules automatically or programmatically. We will also present techniques that will simplify debugging. For example, we will describe approaches that will demonstrate, using a few threads and few context switches, a complicated concurrency bug potentially involving thousands of threads.
Bio: Koushik Sen is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research interest lies in Software Engineering, Programming Languages, and Formal methods. He is interested in developing software tools and methodologies that improve programmer productivity and software quality. He is best known for his work on directed automated random testing and concolic testing. He has received a NSF CAREER Award in 2008, a Haifa Verification Conference (HVC) Award in 2009, a IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice in 2010, and a Sloan Foundation Fellowship in 2011. He has won three ACM SIGSOFT Distinguished Paper Awards. He received the C.L. and Jane W-S. Liu Award in 2004, the C. W. Gear Outstanding Graduate Award in 2005, and the David J. Kuck Outstanding Ph.D. Thesis Award in 2007 from the UIUC Department of Computer Science. He holds a B.Tech from Indian Institute of Technology, Kanpur, and M.S. and Ph.D. in CS from University of Illinois at Urbana-Champaign.
Presenter: Martin Schulz
Abstract: The ASC working group on exascale tools first got together during an ASC planning meeting in June 2010 and, at that time, has created a first analysis of gaps and challenges for development environment tools. Through a series of workshop, first within ASC and later in collaboration with representatives of ASCR activities, the scope of the working group as well as its analysis has been refined and extended and has lead to a document describing gaps, challenges, and interdependencies and co-design opportunities across the entire system stack. This talk will summarize these findings as well as present suggested PathForward activities in this area.
Bio: Martin is a Computer Scientist at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). He earned his Doctorate in Computer Science in 2001 from the Technische Universität München (Munich, Germany). He also holds a Master of Science in Computer Science from the University of Illinois at Urbana Champaign. After completing his graduate studies and a postdoctoral appointment in Munich, he worked for two years as a Research Associate at Cornell University, before joining LLNL in 2004.
Martin's research interests include parallel and distributed architectures and applications; performance monitoring, modeling and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; power efficiency for parallel systems; optimizing parallel and distributed I/O; and fault tolerance at the application and system level. In his position at LLNL he especially focuses on the issue of scalability for parallel applications, code correctness tools, and parallel performance analyzer as well as scalable tool infrastructures to support these efforts.
Martin is a member of LLNL's ASC CSSE ADEPT (Application Development Environment and Performance Team) and he works closely with colleagues in CASC's Computer Science Group (CSG) and in the Development Environment Group (DEG). He is also the PI for the ASC/CCE project on Open|SpeedShop and the LLNL PI for the OASCR PetascaleTools project on "Building a Community Tool Infrastructure around Open|SpeedShop".
Presenter: John Mellor-Crummey
Abstract: The Exascale Software Center was conceived as a collaboration between the DOE national laboratories and academia to build a complete software stack for exascale systems. While long-term funding for that effort ultimately did not materialize, it is worth looking back at the center's study and planning with respect to the area of performance and correctness tools for exascale systems. This talk aims to convey what members of the ESC tools team deemed to be key challenges and approaches important for software tools at the exascale.
Bio: John Mellor-Crummey received the B.S.E. degree magna cum laude in electrical engineering and computer science from Princeton University in 1984, and the M.S. (1986) and Ph.D. (1989) degrees in computer science from the University of Rochester. In 1989, he joined Rice University where he holds the rank of Professor in both the Department of Computer Science and the Department of Electrical and Computer Engineering. His research focuses on compilers, tools, and run-time libraries for multicore processors and scalable parallel systems.
Presenter: David Skinner
Abstract: A brief survey of NERSC's users and applications and how performance data is collected and used. With just over 4000 users NERSC's workloads see a great diversity of science, algorithms, user expertise, and interest in performance tools. Science teams computing at NERSC are required to produce performance data for their applications as part of their annual allocation request. Given the large numbers of projects and codes a wealth of performance data has been collected. A movement toward large scale pervasive monitoring of HPC workloads has been a useful derivative of that requirement.
Getting lots of eyes on application and workload performance serves all the stakeholders in the HPC community. Widespread monitoring performance is a diagnostic tool and first step in the use of other performance tools to dig deeper into opportunities to improve user experience and HPC center efficiency.
Bio: David Skinner earned his Ph.D. in theoretical chemistry from UC Berkeley, where his research focused on quantum and semi-classical approaches to reaction dynamics and kinetics. David has been the lead technical advisor to the initial INCITE projects and has provided support to chemistry research at NERSC. He currently leads the SciDAC Outreach Center, which provides information and services that support SciDAC's outreach, training, and research objectives. David's publications while at NERSC have focused on the performance analysis of computing architectures and HPC applications. David is an author of the Integrated Performance Monitoring (IPM) framework.
Presenter: Richard Graham
Abstract: The programming environment (PE) on any HPC computer system is a key component of an effective system. This talk will describe the approach used to provide a usable and sustainable PE for the Oak Ridge Leadership Computing Facility (OLCF). How this is being applied in the context of the OLCF's Titan project will be described, as well as long term requirement.
Bio: Richard Graham is a Distinguished Member of the Research Staff in the Mathematics and Computer Science division at Oak Ridge National Laboratory. He is also the group leader for the Application Performance Tools group. He is working in the area of Scalable Tools and MPI, and is currently chairing the MPI Forum.
Before moving to ORNL he was the Advanced Computing Laboratory acting group leader at the Los Alamos National Laboratory. He joined LANL's Advanced Computing Laboratory (ACL) as a technical staff member in 1999 and as team leader for the Resilient Technologies Team team started the LA-MPI project, and is one of the founders of the Open MPI project. Prior to joining the ACL, he spent seven years working for Cray Research and SGI.
Rich obtained his PhD in Theoretical Chemistry from Texas A&M University in 1990 and did post-doctoral work at the James Franck Institute of the University of Chicago. His BS in chemistry was from Seattle Pacific University.
Presenter: Scott Parker
Abstract:This talk will discuss; the current state of performance tools on the Argonne Leadership Computing Facilities Blue Gene/P system, futures plans for tools on the ALCF's upcoming 10 Petaflop Blue Gene/Q system, and issues related to tools on future exascale systems.
Bio: Scott Parker earned his Ph.D. in computational fluid dynamics from the University of Illinois at Urbana-Champaign, where his research focused on the flow stability of bluff body wakes. Scott joined the Argonne Leadership Computing Facility in 2008 where he works on computational fluid dynamics codes, performance analysis, and performance tools.
Presenter: David Montoya
Abstract: This talk will provide a candid view of the user communities for exascale tools within the NNSA, the challenges that they face, and their current point of involvement. The User communities include; System Developers Support Analysts, System Integrators / Developers, and System Administrators / Support personnel.
Bio: David manages the Production Infrastructure team within the Los Alamos National Laboratory HPC Division. This includes applied tool design/development/support toward performance analysis tools: Open|SpeedShop, Tools Frameworks (CBTF), and Open MPI development as well as HPC production support of HPC network infrastructures, and parallel file systems. Interests include integrated HPC Environment performance analysis and health monitoring; core infrastructure capability development, and cross-organization collaboration development. He holds an MBA in Management Information Systems from the University of Rochester, NY.