LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] how to translate benchmark performance results to flops
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] how to translate benchmark performance results to flops

From: Stefan Paquay <stefanpaquay@...24...>
Date: Mon, 23 Apr 2018 13:39:29 +0000

I would go one step further and argue that FLOPS is only a sensible metric for hardware, not software. The most "performant" software in terms of FLOPS would run an infinite loop with some floating point operations inside and nothing else, but that would hardly be worth running. 

On Mon, Apr 23, 2018, 8:06 AM Axel Kohlmeyer <akohlmey@...24...> wrote:
On Mon, Apr 23, 2018 at 7:02 AM, karniav <karniav@...7569...> wrote:
> Lammps results do not follow the standard way of reporting performance (in
> flops/sec).

i strongly disagree, that FLOPS are a meaningful descriptor for the
performance of an MD code.
what matters is how quickly a defined task is done, which is what
LAMMPS reports. it would be easy to achieve a higher FLOPS rating
while at the same time have a worse actual performance. this is
particularly true for MD codes. example: when running highly threaded
and vectorized kernels, e.g. on GPUs or xeon phi accelerators, it is
more efficient to not take advantage of newton's third law and
effectively double the number of floating point operations per time
step (and thus artificially inflate the FLOP count) to reduce the
overhead of atomic operations or waiting on locks, where with serial
or minimally threaded execution, one one rather reduce the number of
operations for more efficient processing.

> Is there a way to translate the results for the Lennard-Jones benchmark for
> example, in flops/sec?

no. this is a non-trivial operation. the number of floating point
operations varies due to the variations of the number of neighbors.
you have a different number of floating point operations for pairs of
atoms that are within the cutoff and those outside the cutoff. on top
of that, you have floating point operations associated with other
operations, e.g. the neighbor list builds, that are difficult to
estimate or would incur unacceptable overhead if collected/computed.

> example existing output:
> Performance: 17997.357 tau/day, 41.661 timesteps/s
> 99.4% CPU use with 8 MPI tasks x 8 OpenMP threads
> Can you provide with more info on how to interpret these results and how to

> translate them to flops/sec?

as stated above, determining the number of FLOPS is difficult to do
unless one would accept a lot of unwanted overhead.

please also note, that FLOPS/s is redundant, as FLOPS is an
abbreviation for "floating point operations per second"; so it should
be either FLOPS or FLOP/s.

if you want to have a handle on the number of floating point (and
SSE/AVX) operations (and lots of other relevant performance metrics)
occurring during an MD run (or any executable for that matter), your
best bet are reading the performance counters embedded into your CPU.
for example using the "perf" tool


> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites,!
> _______________________________________________
> lammps-users mailing list

Dr. Axel Kohlmeyer  akohlmey@...24...
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
lammps-users mailing list