LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
[lammps-users] Inconsistent error with using read_restart to run simulation post-equilibrium
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lammps-users] Inconsistent error with using read_restart to run simulation post-equilibrium


From: Quang Ha <quang.t.ha.20@...24...>
Date: Tue, 17 Apr 2018 11:16:58 -0400

Hi all,

I am trying to figure out restarting a simulation with read_restart. There seems to be some thing wrong with restarting the simulation even though at the beginning everything seems to be re-started just fine. But the error that I was getting are different with each run (!) so I am confused as of how to start debugging it. Here are some of the results: the previous simulation ended at step 3485. The time step where error occurs during the restart run is not consistent even though using the exact same script and read from the same restart file (restart.equil.mpiio).

At one time, it fails pretty late into the post-equilibrium simulation:
Step v_time
[...]
    3843    11027.007 
lmp_mpi: malloc.c:3551: _int_malloc: Assertion `(bck->bk->size & NON_MAIN_ARENA) == 0' failed.
[hyperion:17979] *** Process received signal ***
[hyperion:17979] Signal: Aborted (6)
[hyperion:17979] Signal code:  (-6)

Some other time it crashed earleir
Step v_time 
[...]
    3487    10005.509 
[hyperion:18092] *** Process received signal ***
[hyperion:18092] Signal: Segmentation fault (11)
[hyperion:18092] Signal code:  (128)
[hyperion:18092] Failing at address: (nil)

or even showing up with some terrifying lines of words: https://pastebin.com/iyxA5EHH

How should I go around and debug this behaviour? Is this where I have to use MPI debugging tools such as Totalview/DDT/VTune? 

Thanks,
Quang