369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer 'Roadrunner'
TC Germann and K Kadau and S Swaminarayan, CONCURRENCY AND COMPUTATION- PRACTICE & EXPERIENCE, 21, 2143-2159 (2009).
We describe the implementation of a short-range parallel molecular dynamics (MD) code, SPaSM, on the heterogeneous general-purpose Roadrunner supercomputer. Each Roadrunner 'TriBlade' compute node consists of two AMD Opteron dual-core microprocessors and four IBM PowerXCell 8i enhanced Cell microprocessors (each consisting of one PPU and eight SPU cores), so that there are four MPI ranks per node, each with one Opteron and one Cell. We will briefly describe the Roadrunner architecture and some of the initial hybrid programming approaches that have been taken, focusing on the SPaSM application as a case study. An initial 'evolutionary' port, in which the existing legacy code runs with minor modifications on the Opterons and the Cells are only used to compute interatomic forces, achieves roughly a 2x speedup over the unaccelerated code. On the other hand, our 'revolutionary' implementation adopts a Cell-centric view, with data structures optimized for, and living on, the Cells. The Opterons are mainly used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), nearly 10 x faster than the unaccelerated (Opteron-only) version. Copyright (C) 2009 John Wiley & Sons, Ltd.
Return to Publications page