Re: [lammps-users] regarding scaling up LAMMPS on CPU/GPU
Axel Kohlmeyer <akohlmey@...24...>
Wed, 5 Jul 2017 07:54:24 -0400
On Tue, Jul 4, 2017 at 12:37 PM, Quang Ha <quang.t.ha.20@...24...> wrote:
> Hi all,
> Happy 4th July!
> Anyhow, I just have some questions regarding the scaling up potential of
> LAMMPS. I found the benchmark documentation here:
> According to the results, it seems like LAMMPS scales better on CPU compared
> to GPU. Is this always the case, like do we always expect LAMMPS to perform
> better on, say, KNL when compared to, say, Titan X?
scaling != performance.
also, you didn't mention whether you were looking at "strong scaling"
(i.e. same system size regardless of number of nodes) or "weak
scaling" (i.e. same system size per node).
parallel scaling primarily on two factors: the amount of extra work
required when running in parallel, and the overhead caused by
when you increase your per node performance (e.g. by adding one or
more GPUs), then your communication overhead will show more
drastically. similarly, with GPUs the optimal utilization requires a
much larger number of particles per node, thus for "strong scaling"
tests, you will see a drop in performance (and thus scaling) once you
drop below that number.
there are plenty of cases, where you can get the best absolute
performance (i.e. the performance when the application scales out)
with CPUs, yet that requires 3-10x as many nodes, as with nodes
containing accelerators. or looking at it the other way around, the
biggest impact of GPUs is usually with a small to moderate number of
> Reasons for asking
> simply because I need to specify the nodes for the supercomputer I am
> applying to use, so it would be better to use the one which I can the most
> efficient resutls out of.
the numbers on the LAMMPS webpage are at best a guideline for what to
expect. you cannot derive actual performance information from it,
because many factors, including the details of your input file,
determine performance. the benchmark numbers are usually for the
"pure" MD simulation, without any analysis computes, dumps for output
and typically for systems with good load balance. most real-world
simulations with diverge from that. the only way to find out for
certain is to run benchmarks of your own with a representative input
of a representative system at the suitable size and on the machine
where you plan to run.
> Or have my interpretation of the benchmark been wrong? Would love to hear
> some opinions, please.
> Many thanks,
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> lammps-users mailing list
Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.