Loop-level profiling and analysis of DoD applications using TAU

S Moore and D Cronk and S Shende and A Malony, PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2006, 378-383 (2006).

Performance of computationally intensive applications often depends critically on the floating point and memory performance of nested loop structures. this paper describes extensions to the Tuning Analysis and Utilities (TA U) parallel performance system that implement automated of parallel C/C++ and Fortran programs to collect loop-level profile data. Link-time and run-time options for configuring the instrumented version of the code to perform various types of measurements, such as time and hardware counter based profiling are described. Finally, examples are given of collecting and analyzing loop-level profile data for several DoD applications.

