June 2013

Supercomputing on a Budget

The optimization of commercial hardware and specialized software enables cost-effective supercomputing.

Click to enlarge photo. Enlarge Photo
jpg image, 241049 bytes

Image courtesy of Jefferson Lab

Jefferson Lab's K20 supercomputer was listed on the TOP500 list of fastest computers in June 2013. This single rack system houses two supercomputers: one built of NVIDIA K20 GPUs and one built with Intel Xeon Phis.

The Science

Computer scientists and nuclear physicists have designed powerful, low-cost, teraflop-class computer systems based on commercial processors and high-speed connections that are up to ten times more cost effective than conventional systems. Further performance gains are realized with the optimization of the computational software, which has also improved the performance of more powerful systems with similar architectures.

The Impact

Jefferson Lab’s supercomputers enable scientists to calculate how the building blocks of visible matter, quarks and gluons, combine to make protons, neutrons and other particles. In partnership with industry, Jefferson Lab is continuing to optimize the software to fully exploit supercomputer performance for nuclear physics applications as well as other topics.


Nuclear physicists are building compute clusters that offer supercomputer-style performance without the supercomputer-style price tag. With software designed at Jefferson Lab and partner institutions, computer scientists and physicists are using NVIDIA K20 GPUs (graphics processing units), Intel Xeon Phi accelerator cards and high-speed network connections of up to 56 gigabits per second to create computing systems that are up to ten times more cost effective than conventional systems. The newest clusters—deployed in late 2012 with cards of over 1 teraflop, double precision performance and four cards per server—expand the range of applications that can be tackled. Jefferson Lab is now running a total of 776 compute accelerators in 220 servers, with a performance of over 150 teraflops on key science kernels. The new GPUs exhibit a marked improvement in performance over un-accelerated systems in benchmark calculations using a framework for theory calculations developed at Jefferson Lab called Chroma, along with a library of highly optimized routines called QUDA, developed primarily by NVIDIA and a larger developer community that includes Jefferson Lab. Since the first NVIDIA GPUs were deployed at Jefferson Lab in 2009, code developers at the laboratory and NVIDIA have worked in partnership to ensure that Chroma and QUDA harness their full capabilities for science. Following optimization, Chroma and QUDA ran three times faster on last-generation GPUs in performance tests; when the code was tested on the newer NIVIDIA K20 GPUs, it ran a total of six times faster. This success in creating software suitable for heterogeneous computing has the potential for producing benefits in other areas of scientific computing, from biology to oceanography. Initial code developments for the Xeon Phi systems, developed in close collaboration with Intel Parallel Labs, also show excellent performance of up to 300 gigaflops on a single device in single precision, indicating a great potential for this new architecture.


Chip Watson
Thomas Jefferson National Accelerator Facility


Jefferson Lab's supercomputer program is funded through the National Computational Infrastructure for Lattice Gauge Theory project, while the development of the software infrastructure is funded by the Scientific Discovery through Advanced Computing, or SciDAC program, both in the Department of Energy's (DOE's) Office of Science. Additional funding has been provided for hardware by the American Recovery and Reinvestment Act (ARRA). Supercomputer time is allocated based on the recommendations of the United States Lattice Gauge Theory Computational Program (USQCD), a consortium of top LQCD theorists in the United States that spans both high-energy physics and nuclear physics. The USQCD consortium developed the QUDA code library that is used with the Chroma code.


TOP500 semi-annual list of the top computer sitesExternal link, June 2013

B. Joo, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey and W. Watson III, "Lattice QCD on Intel(R) Xeon Phi(tm) Coprocessors." Lecture Notes in Computer Science 7905, 40 (2013). [DOI: 10.1007/978-3-642-38750-0]

R. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower and S. Gottlieb, "Scaling lattice QCD beyond 100 GPUs." SC'11 Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Article No. 70 (2011). [DOI: 10.1145/2063384.2063478]

R. Babich, M. A. Clark and B. Joo, "Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics." SC'10, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis 1 (2010). [DOI: 10.1109/SC.2010.40]

B. Joo and M. A. Clark, "Lattice QCD on GPU clusters, using the QUDA library and the Chroma software system." International Journal of High Performance Computing Applications 26(4), 386 (2012). [DOI: 10.1177/1094342011429695]

Related Links

http://www.top500.org/lists/2013/06/External link

https://www.jlab.org/news/ontarget/news/ontarget/target-december-2012#acceleratesExternal link

https://www.jlab.org/news/releases/jlab-cluster-tops-100-teraflopsExternal link

https://www.jlab.org/news/ontarget/target-june-2010External link

Highlight Categories

Program: ASCR, HEP, NP

Performer/Facility: DOE Laboratory, Industry, SC User Facilities, ASCR User Facilities, OLCF, NP User Facilities, CEBAF

Additional: Technology Impact

Last modified: 1/27/2014 9:44:52 AM