LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] mprun error
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] mprun error


From: bahman daneshian <bahmanpbamp@...24...>
Date: Tue, 22 May 2018 14:52:21 +0200

Dear Axel, 

Thank you very much for your email. Actually, I have checked the LAMMPS performance on our HPC with the machine supervisors. it seems that lammps consumes all memory on a node until that node is out of memory and then freezes, regardless of what I reduce the numbers of nodes to use.

Beside that memory consumption on the allocated nodes is not symmetrical but on two nodes much higher than on the others.

Although I applied an MPI command in my script.bash file in order to manage the the run between all allocated nodes, Should I insert similar command again inside my lammps input file??
My input file is as belwo. please let me know whether I misses something.....




input file:
#Phase 1 ------------------------------------------Simulation main setup----------------------------------
dimension        3
units    real
atom_style     charge
boundary       p p p


variable radius equal 500 # makes 1000 Ang or 100nm particle
variable n equal 500 # makes 81*3.786 length=306

variable a1 equal 3.786
variable a2 equal 3.786
variable a3 equal 9.514

variable boxx equal "v_n*v_a1"
variable boxy equal "v_n*v_a2"


variable diameter equal "2*v_radius"
variable x0 equal "0.5*v_boxx"
variable y0 equal "0.5*v_boxy"
variable subs_thick equal "3*v_a3"               #v_a3 equal 9.514
variable subslow_thick equal "1*v_a3"
variable z_gap equal "0.5*v_radius"
variable distance equal "v_subs_thick+v_subslow_thick+v_z_gap+v_radius"
variable boxz equal "v_distance+v_diameter+v_radius"

#simulation box
region box block 0 ${boxx} 0 ${boxy} 0 ${boxz} units box
create_box      2 box    #2 is number of atoms


lattice custom 1 a1 3.786  0.00000  0.00000   a2  0.0000   3.786  0.00000  a3 0 0 9.514  &
basis 0.0000 0.2500 0.3750 &
basis 0.0000 0.7500 0.6250 &
basis 0.5000 0.7500 0.8750 &
basis 0.5000 0.2500 0.1250 &
basis 0.0000 0.0000 0.1700 &
basis 0.0000 0.7500 0.4200 &
basis 0.5000 0.2500 0.3300 &
basis 0.5000 0.7500 0.0800 &
basis 0.500 0.5000 0.6700 &
basis 0.500 0.2500 0.9200 &
basis 0.000 0.7500 0.8300 &
basis 0.000 0.2500 0.5800
mass 1 47.86000
mass 2 15.99940


#particle
region         particle sphere  ${x0} ${y0} ${distance} ${radius} units box
create_atoms 2 region particle &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group particle region particle
set type 1 charge 2.196 
set type 2  charge -1.098



#substrate 
region          substrate block 0 ${boxx} 0 ${boxy} ${subslow_thick} ${subs_thick} units box
create_atoms 2 region substrate &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group substrate region substrate
set type 1 charge 2.196 
set type 2  charge -1.098
#group        model union particle substrate


#lower_substrate 
region          lower_substrate block 0  ${boxx} 0  ${boxy}  0 ${subslow_thick} units box
create_atoms 2 region lower_substrate &
basis 1 1 &
basis 2 1 &
basis 3 1 &
basis 4 1 &
basis 5 2 &
basis 6 2 &
basis 7 2 &
basis 8 2 &
basis 9 2 &
basis 10 2 &
basis 11 2 &
basis 12 2
group lower_substrate region lower_substrate
set type 1 charge 2.196 
set type 2  charge -1.098
group        model union particle substrate

#--Phase 2----------------------------------------Buckingham Potential-----------------------------------------------

pair_style buck/coul/long 15
pair_coeff   1 1   717647.40 0.154 121.067
pair_coeff   1 2   391049.10 0.194 290.331
pair_coeff   2 2   271716.30 0.234 696.888

neighbor 2.0 bin # skin distance for real units  is by default 2.0
neigh_modify every 1 delay 0 check yes

kspace_style pppm 0.0001


#pair_style lj/cut/coul/cut 6.0 15.0 # cut off is usually 2.5 unitless in lj
#pair_coeff   1 1   0.609 1.9565
#pair_coeff   1 2   0.292  2.4419
#pair_coeff   2 2   0.140  2.9273

neigh_modify delay 5

#----Phase 4-------------------------------------Initial Equilibration at 300K ----------------------------------------
reset_timestep 0
timestep 1.0 # or 2
velocity all create 300 12345 mom yes rot no


fix 1 lower_substrate setforce 0.0 0.0 0.0
fix 2 particle nvt temp 300.0 300.0 100.0
fix 3 substrate nvt temp 300.0 300.0 100.0

thermo 100
dump 1 all xyz 100 dump1000.txt 

run 20000
unfix 2


#----Phase 5---------------------------------------Particle Impact at the room temperature -------------------
fix 4 particle nve 
velocity particle set 0 0 -0.003 units box 

thermo 100
run 50000


 Yours Sincerely,

Bahman Daneshian

On 21 May 2018 at 15:48, Axel Kohlmeyer <akohlmey@...24...> wrote:


On Mon, May 21, 2018 at 6:39 AM, bahman daneshian <bahmanpbamp@...24...> wrote:
Dear LAMMPS and HPC experts, 

I am trying to run lammps on HPC. We installed LAMMPS (16 Mar 2018). So, lammps correctly works for small models. However, it seems that when the model becomes large(in order of 100 nm), computation cannot start and we face with this error:

mpirun noticed that process rank 56 with PID 0 on node node03 exited on signal 9 (Killed).

 please let me know whether you have any idea to solve this issue or not.

​please note, that this is not a well formulated question: when i would answer it accurately, i would have to tell you "yes" and nothing else.

​going beyond that, this doesn't look like a LAMMPS specific problem at all, but rather something related to the machine you are running on. the fact, that this is triggered by a large system size hints at you running out of available RAM on the nodes, you are running on. this is something that you have to work out with your local system managers. so my suggestion is that you work with them to determine what is causing this (i.e. whether your jobs trigger the OOM killer feature).

axe.

​[...]​

-
mpirun noticed that process rank 56 with PID 0 on node node03 exited on signal 9 (Killed).


Yours Sincerey,
Bahman Daneshian

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users




--
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.