Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system

Q Wu and CQ Yang and T Tang and LQ Xiao, JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 73, 1592-1604 (2013).

DOI: 10.1016/j.jpdc.2013.07.015

Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics, processing units (CPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in CPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH- 1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations. (C) 2013 Elsevier Inc. All rights reserved.

