LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Imbalanced cpu nodes
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] Imbalanced cpu nodes


From: "T. Majdi" <majdit@...1849...>
Date: Mon, 26 Jun 2017 17:41:04 -0400

Dear Professor Kohlmeyer,

Thank you for your thorough response. I've contacted our system administrates and asked about including the patch you suggested.

In the mean time, for the test runs, how would I distinguish a memory leak from  a feature of LAMMPS that slowly grows its memory use until it runs out?

Thank you so much,
Tara

On Mon, Jun 26, 2017 at 3:51 PM, Axel Kohlmeyer <akohlmey@...24...> wrote:
On Mon, Jun 26, 2017 at 1:46 PM, T. Majdi <majdit@...1849...> wrote:
>
> Dear LAMMPS developers and users,
>
>
> My solid-state non-equilibrium thermal conductivity simulations have been very consistent in memory usage: they use 2.3 G and are very well balanced across different nodes. Recently, I have had my jobs fail due to “std::bad_alloc”. After tracking the memory usage, I found that two nodes use more memory and the amount increases sharply in time. Would anyone know why this may be?I had something similar happen before and found out that it was because of invoking compute centro/atom too regularly. I am not sure what has caused a similar problem to occur again.

the error message suggests, that you are running out of "address
space", i.e. a call using the "new" operator failed.
there are many possible reasons for that. the two most likely are:
1) you are using a feature of LAMMPS that slowly grows its memory use
until you run out
2) you are using a feature of LAMMPS that has a memory leak.

the first thing you can try to resolve this, is to check out the very
latest LAMMPS patch, version 23June2017 and check if the issue
persists.
if yes, then you need to narrow down, which of the two issues it is.
for that, you first should reduce your system size to be *much*
smaller, so one can quickly run it on a single processor within a few
minutes. it need not crash, but you can monitor its memory usage. it
also does not have to be physically meaningful, it just needs to run
all the various commands in a similar fashion.

if you have such an input, you can either try to run it yourself using
the memcheck module of the valgrind software or you can post the
complete input deck here (or on github as an issue) and wait if one
the LAMMPS developers has time to look into it and possibly confirm
whether it is a bug or a feature. for that however, it is crucial,
that your input is really small and runs really fast. none of the
developers has time to have to wait a long time for a simple debug run
to close (running under valgrind makes LAMMPS over an order of
magnitude slower).

axel.

>
>
> The images below are screen shots of the virtual memory on different nodes after 1.3 hrs, 1.4 hrs, and 1.8 hrs. I've also attached the output file.
>
>
> I appreciate any input I may receive.
>
> Thank you!
>
>
>
>
>
>
>
> --
> Tahereh Majdi, B.Eng., M.A.Sc.
> PhD candidate, Engineering Physics
> McMaster University
>
> e:majdit@...1849...
> t: (905)-541-3814
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> lammps-users mailing list
> lammps-users@...12...396...sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lammps-users
>



--
Dr. Axel Kohlmeyer  akohlmey@...43...4...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.



--
Tahereh Majdi, B.Eng., M.A.Sc.
PhD candidate, Engineering Physics
McMaster University

e:majdit@...1849...
t: (905)-541-3814