LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Imbalanced cpu nodes
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] Imbalanced cpu nodes


From: "T. Majdi" <majdit@...1849...>
Date: Mon, 26 Jun 2017 19:28:31 -0400

Dear Professor Kohlmeyer,

I sincerely appreciate your response. I have learned a lot from it. I'll follow through your recommendations and post my findings as soon as I get them.

Thank you again!
Tara

On Mon, Jun 26, 2017 at 6:46 PM, Axel Kohlmeyer <akohlmey@...24...> wrote:
On Mon, Jun 26, 2017 at 5:41 PM, T. Majdi <majdit@...1849...> wrote:
> Dear Professor Kohlmeyer,
>
> Thank you for your thorough response. I've contacted our system
> administrates and asked about including the patch you suggested.

you don't want just that patch, but use the latest (development)
version of LAMMPS. over the last couple of years, the LAMMPS
developers have used a variety of tools to systematically audit the
LAMMPS source code for a variety of programming issues, and that
included memory leaks. so your version from february 2016 has several
known memory leaks, that have been fixed since. before looking into
this, thus we need to know whether what you are seeing is not caused
by one of those.

> In the mean time, for the test runs, how would I distinguish a memory leak
> from  a feature of LAMMPS that slowly grows its memory use until it runs
> out?

you need to do what i suggested, i.e. devise a version of your
calculation, that is much smaller in how much memory it needs, that
starts off a data file (not a restart), and that runs very fast and
over a much smaller number of time steps, yet does all operations that
your current input does. for that you would not need to run on a large
machine, but could just compile a serial version of LAMMPS yourself
directly on your desktop machine.

with only 2650 atoms, you don't have a large amount of force
computation and memory required for that, so it looks to me, that your
main memory consumption might be in the averaging fixes.

with a fast/small input an experienced programmer then can use tools
like valgrind or compiler instrumentation to determine memory leaks.
for that it is usually necessary that a calculation finishes and then
then the total tally of memory allocations and deallocations is
inspected.

but nobody likes to look for bugs that were already found and fixed,
so we first need the confirmation that the unexpected growing memory
use still exists in the latest LAMMPS version.

axel.

>
> Thank you so much,
> Tara
>
> On Mon, Jun 26, 2017 at 3:51 PM, Axel Kohlmeyer <akohlmey@...24...> wrote:
>>
>> On Mon, Jun 26, 2017 at 1:46 PM, T. Majdi <majdit@...1849...> wrote:
>> >
>> > Dear LAMMPS developers and users,
>> >
>> >
>> > My solid-state non-equilibrium thermal conductivity simulations have
>> > been very consistent in memory usage: they use 2.3 G and are very well
>> > balanced across different nodes. Recently, I have had my jobs fail due to
>> > “std::bad_alloc”. After tracking the memory usage, I found that two nodes
>> > use more memory and the amount increases sharply in time. Would anyone know
>> > why this may be?I had something similar happen before and found out that it
>> > was because of invoking compute centro/atom too regularly. I am not sure
>> > what has caused a similar problem to occur again.
>>
>> the error message suggests, that you are running out of "address
>> space", i.e. a call using the "new" operator failed.
>> there are many possible reasons for that. the two most likely are:
>> 1) you are using a feature of LAMMPS that slowly grows its memory use
>> until you run out
>> 2) you are using a feature of LAMMPS that has a memory leak.
>>
>> the first thing you can try to resolve this, is to check out the very
>> latest LAMMPS patch, version 23June2017 and check if the issue
>> persists.
>> if yes, then you need to narrow down, which of the two issues it is.
>> for that, you first should reduce your system size to be *much*
>> smaller, so one can quickly run it on a single processor within a few
>> minutes. it need not crash, but you can monitor its memory usage. it
>> also does not have to be physically meaningful, it just needs to run
>> all the various commands in a similar fashion.
>>
>> if you have such an input, you can either try to run it yourself using
>> the memcheck module of the valgrind software or you can post the
>> complete input deck here (or on github as an issue) and wait if one
>> the LAMMPS developers has time to look into it and possibly confirm
>> whether it is a bug or a feature. for that however, it is crucial,
>> that your input is really small and runs really fast. none of the
>> developers has time to have to wait a long time for a simple debug run
>> to close (running under valgrind makes LAMMPS over an order of
>> magnitude slower).
>>
>> axel.
>>
>> >
>> >
>> > The images below are screen shots of the virtual memory on different
>> > nodes after 1.3 hrs, 1.4 hrs, and 1.8 hrs. I've also attached the output
>> > file.
>> >
>> >
>> > I appreciate any input I may receive.
>> >
>> > Thank you!
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Tahereh Majdi, B.Eng., M.A.Sc.
>> > PhD candidate, Engineering Physics
>> > McMaster University
>> >
>> > e:majdit@...1849...
>> > t: (905)-541-3814
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > lammps-users mailing list
>> > lammps-users@...396...sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/lammps-users
>> >
>>
>>
>>
>> --
>> Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
>> College of Science & Technology, Temple University, Philadelphia PA, USA
>> International Centre for Theoretical Physics, Trieste. Italy.
>
>
>
>
> --
> Tahereh Majdi, B.Eng., M.A.Sc.
> PhD candidate, Engineering Physics
> McMaster University
>
> e:majdit@...1849...
> t: (905)-541-3814



--
Dr. Axel Kohlmeyer  akohlmey@...43...4...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.



--
Tahereh Majdi, B.Eng., M.A.Sc.
PhD candidate, Engineering Physics
McMaster University

e:majdit@...1849...
t: (905)-541-3814