I am trying to figure out restarting a simulation with read_restart. There seems to be some thing wrong with restarting the simulation even though at the beginning everything seems to be re-started just fine. But the error that I was getting are different with each run (!) so I am confused as of how to start debugging it. Here are some of the results: the previous simulation ended at step 3485. The time step where error occurs during the restart run is not consistent even though using the exact same script and read from the same restart file (restart.equil.mpiio).
At one time, it fails pretty late into the post-equilibrium simulation:
lmp_mpi: malloc.c:3551: _int_malloc: Assertion `(bck->bk->size & NON_MAIN_ARENA) == 0' failed.
[hyperion:17979] *** Process received signal ***
[hyperion:17979] Signal: Aborted (6)
[hyperion:17979] Signal code: (-6)
Some other time it crashed earleir
[hyperion:18092] *** Process received signal ***
[hyperion:18092] Signal: Segmentation fault (11)
[hyperion:18092] Signal code: (128)
[hyperion:18092] Failing at address: (nil)
How should I go around and debug this behaviour? Is this where I have to use MPI debugging tools such as Totalview/DDT/VTune?