Unlocking On-Package Memory’s Effects on High-Performance Computing’s Scientific Kernels

Intuitive visual analytical model better explains complex architectural scenarios and offers general design principles.

Click to enlarge photo. Enlarge Photo
Unlocking On-Package Memory’s Effects on High-Performance Computing’s Scientific Kernels

Probability density for achievable performance (GFlop/s) using 1024 samples with different tiling and problem size. With eDRAM (DRAM = dynamic random-access memory), the function curve as a whole shifts to the upper right, implying that more samples can reach near-peak (for example, 90 percent) performance. In other words, having eDRAM increases the chance for less-optimized applications to reach “vendor-claimed” performance. However, the right boundary only moves a bit, indicating that eDRAM cannot significantly improve the raw peak performance.

The Science

High-bandwidth memory can improve a computer’s performance. On-package memory (OPM) is a popular option in many commercial systems. Before this effort, little was known about OPM’s implications on speed and power use. The team experimentally characterized and analyzed modern OPM storage. They provided guidelines on tuning the memory to speed up high-performance computing (HPC) applications.

The Impact

This study about OPMs is both essential and fundamental for advancing computing systems. For example, it motivates software-architecture co-design exploration. Further, it validates models and simulations. It also has resulted in general optimization guidelines. The work shows how to tune applications and architectures for the best performance on platforms with certain OPMs.


The researchers conducted a thorough experimental evaluation to discern how modern OPMs affected the performance and power efficiency of important HPC scientific kernels, which compose a computer’s core operating system. They examined different tuning modes of OPM and how they influenced application tuning for the best system performance. The team from Pacific Northwest National Laboratory, University of Copenhagen, and Virginia Tech evaluated diverse HPC kernels on two Intel™ OPMs, eDRAM on multicore Broadwell and MCDRAM on manycore Knights Landing, with a large set of their representative input matrices (for example, 968 matrices for sparse kernels). This study allowed the team to derive an intuitive visual analytical model to better explain complex architectural scenarios, as well as provide general guidelines for future architecture optimizations and efficiency tuning.


Ang Li
Pacific Northwest National Laboratory

Shuaiwen Leon Song
Pacific Northwest National Laboratory


This work was supported by the Department of Energy, Office of Science, Office of Advanced Scientific Computing Research as part of the Center for Advanced Technology Evaluation (CENATE) project with additional support from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska Curie “TICOH” project and the National Science Foundation Exploiting Parallelism and Scalability (XPS) program.


A. Li, W. Liu, M.R.B. Kristensen, B. Vinter, H. Wang, K. Hou, A. Marquez, and S.L. Song, “Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernelsExternal link.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, 12-17 November, article 26, 14 pages (2017). [DOI: 10.1145/3126908.3126931]

Related Links

Pacific Northwest National Laboratory’s Center for Advanced Technology EvaluationExternal link  

Highlight Categories

Program: ASCR

Performer/Facility: University, DOE Laboratory

Additional: Technology Impact, Collaborations, Non-DOE Interagency Collaboration, International Collaboration

Last modified: 3/20/2018 10:00:48 AM