ASCR Monthly Computing News Report - February 2009
In this issue...
LLNL, ORNL Researchers More Than Double the Performance of CAM
Under the auspices of a Scientific Application Partnership affiliated with the Scientific Discovery for Advanced Computing (SciDAC) project “A Scalable and Extensible Earth System Model for Climate Change Science,” researchers Pat Worley of Oak Ridge National Laboratory (ORNL) and Art Mirin of Lawrence Livermore National Laboratory (LLNL) have more than doubled the performance of the Community Atmosphere Model (CAM) on the CrayXT4/5. This has come about through a combination of adding additional parallelism, enabling different sections of CAM to execute at their own process count, implementing improved communication protocols particularly relevant at scale, and removing other scalability bottlenecks. CAM is the atmospheric component of the Community Climate System Model (CCSM), which will be used to run simulations for the upcoming Intergovernmental Panel on Climate Change (IPCC) fifth assessment.
PNNL Applies Multithreaded Research to Study of Power Grid Failures
Understanding electric grid failures is a major focus of Pacific Northwest National Laboratorys’ (PNNL’s) multithreaded research on the Cray XMT machine. Daniel Chavarria, Senior Computer Scientist, and Henry Huang, Senior Research Engineer, are using the Cray XMT machine for advanced power grid contingency analysis. Contingency analysis is a key function to assess impacts of various combinations of power system component failures. This application takes advantage of Cray XMT’s hybrid architecture of Threadstorm and Opteron nodes. Threadstorm nodes are used to perform contingency selection, while Opteron nodes perform the floating point computation of actual contingency analysis. Combinatorial contingencies generally exceed the capability of existing computational power; therefore, it is critical to select credible contingency cases within the constraint of available computer resources. A contingency selection method is developed by applying graph theory to power grid topology. This method identifies low-impact components. The failure of these components is of little importance to power grid stability. Removing them from analysis reduces the combinatorial number of contingency cases. This method has been implemented on the Cray XMT machine, taking advantage of the graph processing capability of Cray XMT’s Threadstorm nodes and its programming features. Early results have shown superior performance. Further work will focus on the communication between Threadstorm nodes and Opteron nodes and a visualization method for the post-processing of contingency analysis results.
Changing Paradigm of Data-Intensive Computing Makes Journal Cover
A PNNL-authored article by Richard Kouzes, Gordon Anderson, Stephen Elbert, Ian Gorton,and Deb Gracio on the “Changing Paradigm of Data-Intensive Computing” was selected as the cover feature for the January IEEE Computer Society journal. The article describes PNNL’s leading-edge research to develop new classes of software, algorithms and hardware to provide timely and meaningful analytical results from an exponentially growing tidal wave of complex scientific, energy, environmental, and national security related data in the new information-dominated age. Technical leadership in data-intensive computing is a cornerstone of PNNL’s strategy for growing major new national security programs in information analytics and decision support.
LANL ASCR Researchers Presenting their Results
Konstantin Lipnikov and Daniil Svyatskiy, researchers from Los Alamos National Laboratory (LANL), presented advances in mimetic discretizations at the second international conference on Finite Element Methods in Engineering and Science (FEMTEC 2009). The meeting was held at the Granlibakken conference center at Lake Tahoe on January 5–9, 2009. The meeting is organized jointly by the University of Texas at El Paso (UTEP) and University of Nevada at Reno (UNR), and its goal is to advance the frontiers in performance and reliability of finite element methods, as well as in their advanced applications in computational engineering and science. Konstantin discussed the general approach for constructing the family of discretizations on unstructured polygonal and polyhedral meshes for different types of partial differential equations, such as diffusion and Stokes equations. Daniil presented the new constrained finite element method for anisotropic diffusion equations, which guarantees that numerical solution satisfies the maximum principle. The maximum principle is one of the fundamental properties of the elliptic equation which is very difficult to satisfy on a discrete level.
This work was done as part of ASCR Applied Mathematics Research Project “Mimetic Finite Difference Methods for Partial Differential Equations.”
PNNL Multithreaded Research on String Matching Accepted for IPDPS
A paper showcasing PNNL’s state-of-the-art multithreaded research has been accepted for publication at the 2009 International Parallel and Distributed Processing Symposium in Rome, Italy. The paper by PNNL researchers Oreste Villa and Daniel Chavarria-Miranda and Cray’s Kristyn Maschhoff focuses on the use of the Cray XMT multithreaded system for high-throughput, high-sustained performance string matching for cybersecurity applications. String searching is at the core of many security and network applications such as search engines, intrusion detection systems, virus scanners and spam filters. PNNL researchers are developing a software-based implementation of the Aho-Corasick string searching algorithm on the Cray XMT multithreaded shared memory machine. Their solution relies on the particular features of the XMT architecture and on several algorithmic strategies: it is fast, scalable and its performance is virtually content independent.
NERSC’s Kathy Yelick Appointed to the California Council on Science, Technology
Kathy Yelick, Director of the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL), has been appointed as a council member to the California Council on Science and Technology (CCST), a nonpartisan, non-profit organization that offers expert advice on science and technology related issues to state government. The CCST Council is an assembly of corporate CEOs, academicians, scientists and scholars, bringing these experts together with those who make policy, to utilize science and technology for the economic and social well-being of California. Currently, 78 members and fellows of CCST are also members of the National Academies, five are Nobel laureates, eight are National Medal of Science recipients, and two are recipients of the National Medal of Technology. Yelick is a professor of computer science at the University of California, Berkeley, in addition to heading NERSC.
LLNL’s Mark Seager Honored by Federal Computer Week
Mark Seager, LLNL assistant department head for Advanced Technologies, has been selected by Federal Computer Week magazine as one of this year’s “Federal 100” top executives from government, industry and academia who had the greatest impact on government information systems in 2008. Seager was selected because of the “difference you made in the way agencies, companies and government officials develop, acquire, manage and use information technology". The nomination was submitted by industry collaborators for Seager’s leadership of the Hyperion Project, a collaboration with 10 industry leaders to advance next-generation Linux high-performance computing clusters.
Hyperion was announced at SC08 and brings together Dell, Intel, Supermicro, QLogic, Mellanox, DDN, Sun, LSI and RedHat to create a large-scale testbed for high-performance computing technologies critical to the DOE mission and industry’s ability to make petaflop/s computing and storage more accessible for commerce, industry, and research and development. The first half of Hyperion is now online and being used by the collaboration. When completed in March 2009, the Hyperion cluster will have at least 1,152 nodes with 9,216 cores, with about a 100 teraflop/s peak, more than 9 TB of memory, InfiniBand 4x DDR interconnect, and access to more than 4 GB/s of RAID disk bandwidth. This system is the largest testbed of its kind in the world and will provide the Hyperion collaborators with an unmatched opportunity to develop and test hardware and software technologies at unprecedented scale.
LBNL’s John Bell Joins Study on Combustion Research Cyberinfrastructure
John Bell, head of the Center for Computational Sciences and Engineering at LBNL, has been invited to serve on a National Academy study being run through the Board on Mathematical Sciences and Their Applications. Bell will serve as a member of the Committee on Building Cyberinfrastructure for Combustion Research. The committee’s first meeting is scheduled for March 9–10, 2009, in Washington, D.C.
PNNL’s Todd Halter Appointed to SciDB Advisory Board
Todd Halter, a computer scientist at PNNL, has been appointed to the SciDB advisory board. SciDB is an open source data management system designed to support the needs of scientists across a variety of disciplines. As a member of the advisory board, Halter will be helping to establish the requirements for a new database technology specifically targeting scientific applications. His appointment began in November 2008. SciDB grew out of the first Extremely Large Databases (XLDB) Workshop and the subsequent Science-Database Workshop. SciDB is designed to meet the growing demands of data-intensive scientific analytics in the public and private sectors.
Intrepid in Full Production at the ALCF
Intrepid, a 40-rack IBM Blue Gene/P system at the Argonne Leadership Computing Facility (ALCF) went into full production on February 2, five months ahead of schedule. Intrepid has 40,960 quad-core compute nodes (163,840 cores) and 80 terabytes of memory. Its peak performance is 557 teraflops. By bringing Intrepid into production, the ALCF nearly doubled the days of production computing available to the DOE Office of Science, INCITE awardees, and Argonne projects.
20 Petaflop Sequoia System Procurement Announced by NNSA
The National Nuclear Security Administration (NNSA) recently announced a contract with IBM for a multi-petaflop computer to be delivered to LLNL with the capabilities needed to resolve time-urgent and complex scientific problems needed to ensure the safety and reliability of the national nuclear deterrent. IBM will deliver two systems: Sequoia, a 20-petaflop (quadrillion floating operations per second) system based on future BlueGene technology, to be delivered starting in 2011 and deployed in 2012; and an initial delivery system called Dawn, a 500 teraflop (trillion floating operations per second) BlueGene/P system, scheduled for delivery in the first quarter of 2009. Dawn will lay the applications foundation for multi-petaflops computing on Sequoia. With a speed of 20 petaflops, Sequoia is expected to be the most powerful supercomputer in the world and will be approximately more than 10 times faster than today’s most powerful system.
Supercomputing at ORNL Seeks Energy Savings
Recent energy-saving innovations at ORNL are setting a new standard for resource-responsible high performance computing research. The laboratory’s Computational Sciences Building (CSB) was among the first Leadership in Energy and Environmental Design (LEED)-certified computing facilities in the country. Furthermore, a newly introduced cooling system for the Cray XT Jaguar, dubbed ECOphlex, complements the CSB’s efficiency. Using a common refrigerant and a series of heat exchanges, ECOphlex efficiently removes the heat generated by Jaguar to keep the computer room cool. Another important innovation is one that ORNL has been working on with Cray for several years. Instead of using the more common 208-volt power supply that Jaguar used in the past, the system now runs directly on 480-volt power. This seemingly minor change is saving the laboratory $1 million in the cost of copper used in the power cords for the cabinets.
Finally, ORNL gets a little help from history. The power grid for the city of Oak Ridge was designed when the research conducted during the Manhattan Project used one-seventh of all the electricity in the country. The grid was constructed with every protection possible out of the fear that any interruption in supply would drastically set back development. The result: an extremely resilient local power grid. While all of these steps are important, taken together they are greater than the sum of their parts. Whereas most centers use 0.8 watts of power for cooling per every watt of power used for computing, ORNL’s Leadership Computing Facility (OLCF) enjoys a far more efficient ratio of 0.3 to 1, one of the lowest of all data centers measured.
High School Student Uses NewYorkBlue to Become Intel Science Finalist
Christine Lee Shrock, a high school student at Ward Melville High School in Setauket, New York, was selected as a finalist in the 2009 Intel Science Talent Search. Her project was a molecular dynamics study of the binding of two proteins, MDM2 and the tumor suppressor protein p53. The simulations were carried out on NewYorkBlue, an IBM Blue Gene/L,P complex located at Brookhaven National Laboratory (BNL) and operated by the New York Center for Computational Sciences (NYCCS). NYCCS is a joint effort of BNL and Stony Brook University which is funded by the State of New York. The simulations were supervised by Professor Carlos Simmerling in the Chemistry Department at Stony Brook and used the molecular dynamics program AMBER. AMBER is also being used at BNL to study proteins important to cellulose decomposition. NewYorkBlue is also being used by BNL researchers for finite temperature lattice quantum chromodynamics, climate modeling, nanoscience, and fluid dynamics.. Web Link: www.bnl.gov/newyorkblue
NERSC’s HPSS Upgrades Streamline Password Process for Users
Researchers storing scientific data on NERSC’s High Performance Storage System (HPSS) can now instantly access or manage their HPSS account from an internet browser anywhere in the world, 24 hours a day, seven days a week, and 365 days a year. This kind of on-demand access to massive archival datasets is unprecedented, and is only now possible because of a series of custom software upgrades to HPSS at NERSC, which makes the system compatible with NERSC’s Information Management (NIM) system.
Jason Hick, head of NERSC’s Mass Storage Group, noted that the new software upgrade replaces an ad hoc authentication system where users were required to apply for multiple passwords to use NERSC systems — a NIM password to track their activity on supercomputers like Franklin and Bassi, and a separate password to retrieve archival data stored on HPSS. With the upgrade, HPSS is now integrated with NIM to enable users to manage their HPSS accounts and passwords along with every other system at NERSC.
In addition to making this system compatible with NIM, members of the Mass Storage Group also added a real-time monitoring capability that allows the center’s administrative staff to more easily detect and solve internal HPSS problems. Besides the new features, the HPSS v6.2 system that is now in production has already proven to require less regular maintenance.
LLNL Researcher Deploys Orbit Classification Software for Fusion Scientists
Chandrika Kamath of LLNL deployed the first version of “orbit classification in Poincare plots” software for use by researchers at the Princeton Plasma Physics Laboratory (PPPL) to automatically evaluate particle orbits in tokamaks. An orbit is a set of points, with each point generated by the intersection of a particle with a poloidal plane as it goes around the tokamak. Depending on the initial starting point of the particle, different shapes are traced out in the poloidal plane. In the problem considered in this study, there are four different types of orbits: quasi-periodic, separatrix, island chain, and stochastic. The goal is to assign one of these four labels to an orbit, given the points which comprise the orbit.
Techniques from scientific data mining were used to extract features describing each orbit. These features were then used in decision tree classifiers to assign a label to the orbit. For the data sets provided, the classification error is currently under 4 percent. This software will allow the automatic summarization of plots composed of multiple orbits. This will enable comparison of simulations with experiments as well as the design of experiments; it will also help improve our understanding of the fusion simulations. The software is currently being evaluated by Josh Breslau (PPPL) and had piqued the interest of Linda Sugiyama at MIT.
ORNL Begins Relationship with Universities through Online HPC Classes
Oak Ridge National Laboratory (ORNL) is offering a high performance scientific computing class to four historically black colleges and universities. “This opportunity allows leveraging previously untapped student talent with advanced research capabilities,” said Robert Franklin, president of Morehouse College in Atlanta, Georgia. “It also prepares the students for expanded career opportunities in the future. We have confidence that the course will give rise to many new ideas and other initiatives that we can’t even dream of at this time.” Also attending the classes are students and faculty from Knoxville College in Knoxville, Tennessee; Claflin University in Orangeburg, South Carolina; and Jackson State University in Jackson, Mississippi. The survey course, featuring about 25 students and faculty from the four colleges, will meet twice a week via satellite at ORNL. There they will learn the basics of using the systems, parallel programming, model design, and visualization.
NERSC’s Kathy Yelick Shares Views on Multicore, Manycore Challenges
As part of the Computer Science Distinguished Lecture Series sponsored by the University of British Columbia, NERSC Director Kathy Yelick gave a talk on “Programming Models for Manycore” at the Vancouver, Canada, campus on Feb. 12. Yelick is also the co-author with Rajesh Nishtala, a UC Berkeley grad student, of “Optimizing Collective Communication on Multicores,” a paper to be presented at HotPar 2009, the first USENIX Workshop on Hot Topics in Parallelism. HotPar 2009 will be held March 30–31 in Berkeley, Calif. The workshop will bring together researchers and practitioners doing innovative work in the area of parallel computing. HotPar recognizes the broad impact of multicore computing and seeks relevant contributions from all fields, including application design, languages and compilers, systems, and architecture.
ALCF “Getting Started” Workshop Benefits INCITE Researchers
Researchers representing eight INCITE projects benefited from a February 10–11 INCITE Getting Started Workshop held by the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. Scheduled each year to occur shortly after the DOE INCITE awards are announced, the workshop is specifically designed to give new projects a “jump start” by providing them with assisted hands-on training, plus a complete overview of resources available to them through the ALCF.
LBNL’s Esmond Ng Gives Two Invited Talks
Esmond Ng, leader of the Scientific Computing Group at LBNL, gave two invited talks in February. On Wednesday, Feb. 18, Ng discussed “A Hybrid Approach for Solving Sparse Linear Least Squares Problems” as part of the Linear Algebra and Optimization Seminar series at Stanford University. The following week, he gave a talk on “The Role of Applied Math and Computer Science in Large-Scale Scientific Simulations” at a Feb. 23–25 nuclear physics workshop (the Third LACM-EFES-JUSTIPEN Workshop), organized by the Joint Institute for Heavy Ion Research at ORNL.
ALCF to Host March Workshop for Non-INCITE Project Users
The Argonne Leadership Computing Facility is hosting an Introduction to BG/P Workshop on March 10-11 at Argonne National Laboratory. The focus of this workshop is to provide non-INCITE projects with an overview of ALCF services and resources, technical details on the ALCF Blue Gene/P architecture, as well as hands-on assistance in porting and tuning users' applications onto the Blue Gene/P. The workshop is an excellent opportunity to interact with the ALCF staff and learn more about the BG/P machine, as well as data analytics and visualization capabilities at the ALCF.
ORNL Hosts Lustre Workshop
The Oak Ridge Leadership Computing Facility (OLCF), Sun Microsystems, and Cray, Inc., recently held a two-day workshop on Lustre scalability at ORNL. Participants from all the major Lustre sites, including Sandia National Laboratories, Lawrence Livermore National Laboratory, NASA, and Pacific Northwest National Laboratory, met to identify key scalability issues and develop a realistic roadmap for Lustre by 2012. Some of the issues addressed were the changes that need to be made to the architecture of the petaflop file system and where Sun Microsystems, who developed Lustre, will need to allocate resources to meet some of those scalability challenges. The workshop, held February 10–11, set the stage for a follow-up Lustre Scalability Summit to be held in April, which will focus on scalability challenges through 2014. The summit will be held in conjunction with the 2009 Lustre User Group meeting.
The OLCF Users Meeting will be held April 16, 2009, at ORNL. During the meeting, principal investigators and members of their research teams will meet with OLCF staff and vendors to discuss challenges and solutions in areas such as porting and scaling of applications on the XT system. Each project from the Department of Energy’s INCITE program with an OLCF allocation will be invited to give a 20-minute presentation on its recent and/or upcoming computational work. OLCF staff will also give presentations highlighting the complete range of resources, capabilities and services available to the center’s users. This year’s meeting will be preceded by a “Climbing to Petaflop on Cray XT” workshop April 14–16 (with parallel sessions on April 16). The registration site with complete agenda can be found at this link
SciDAC Outreach Center Hosts HDF5 Workshop at NERSC
The SciDAC Outreach Center hosted a two-day workshop in January to familiarize high performance computer users with HDF5, a unique technology suite that makes possible the management of extremely large and complex data collections. The workshop, which included a day of hands-on HDF5 coding and tuning, was aimed at both improving individual applications and developing a prioritized list of features and use cases which the HDF Group should focus on.
ORNL Announces Users Meeting
The Oak Ridge Leadership Computing Facility (OLCF) Users Meeting will be held April 16, 2009, at ORNL. During the meeting, principal investigators and members of their research teams will meet with OLCF staff and vendors to discuss challenges and solutions in areas such as porting and scaling of applications on the XT system. Each project from the Department of Energy’s INCITE program with an OLCF allocation will be invited to give a 20-minute presentation on its recent and/or upcoming computational work. OLCF staff will also give presentations highlighting the complete range of resources, capabilities and services available to the center’s users. This year’s meeting will be preceded by a “Climbing to Petaflop on Cray XT” workshop April 14–16 (with parallel sessions on April 16). The registration site with complete agenda can be found at the this link