Re: [lammps-users] error: segmentation fault_reax/c_KOKKOS
Re: [lammps-users] error: segmentation fault_reax/c_KOKKOS

From: Axel Kohlmeyer
Date: Tue, 5 Sep 2017 16:15:30 -0400

On Mon, Sep 4, 2017 at 11:49 AM, Mohammad Izadi wrote:

Dear lammps users,

I installed lammps kokkos_mpi_only (lammps-31Mar17) on a server computer and I run reax/c/KOKKOS-package from the command line:

​please first update to the latest LAMMPS version and test if your issue persists. 
if yes, please provide a full input deck, so people elsewhere can reproduce this issue and debug it.



  nohup  mpirun     -np    2    ./lmp_kokkos_mpi_only   -k   on   -sf    kk    <  in.input &

My system has 2073 atoms and it is in a gas phase. When I have a smaller system (e.g. a system with 200 atom), it works without any error. My input file is as below:


echo            both

units                       real

newton            on

atom_style            charge

dimension       3

boundary        p p p

#read_restart    restart22

restart 500      restart11 restart22


pair_style              reax/c NULL

pair_coeff              * * ffield.reax.input C H O N S Si Na Ar

neighbor                2 bin

neigh_modify      every 5 delay 0 check no

velocity          all create 2100 235485 mom yes rot yes

fix                       1 all nvt temp 2100.0 2100.0 100.0

fix                   2 all qeq/reax 1 0.0 10.0 1e-6 reax/c

fix                  4 all reax/c/species 10 10 250 species.txt

fix                  6 all efield 0.0001 0.0 0.0

fix_modify     6 energy yes

fix                  7 all reax/c/bonds 250 bonds.reaxc

compute reax all pair reax/c

variable eb             equal c_reax[1]

variable ea             equal c_reax[2]

variable elp            equal c_reax[3]

variable emol        equal c_reax[4]

variable ev             equal c_reax[5]

variable epen         equal c_reax[6]

variable ecoa         equal c_reax[7]

variable ehb           equal c_reax[8]

variable et              equal c_reax[9]

variable eco           equal c_reax[10]

variable ew            equal c_reax[11]

variable ep             equal c_reax[12]

variable efi             equal c_reax[13]

variable eqeq         equal c_reax[14]

thermo_style    custom  step  temp  atoms  etotal  ke  pe  v_eb  v_ea  v_elp  v_emol  v_ev  v_epen v_ecoa  v_ehb  v_et  v_eco  v_ew  v_ep  v_efi  v_eqeq  density  vol  press

thermo          250

timestep 0.1

dump                     1 all xyz  250

run                          4000000


Also, when I use a single core run, it doesn’t stop, but with multi core runs and large systems (2073 atom) instantly it stop with the bottom error:


WARNING: Fixes cannot send data in Kokkos communication, switching to classic communication (../comm_kokkos.cpp:382)

[cschpc:169783] *** Process received signal ***

[cschpc:169783] Signal: Segmentation fault (11)

[cschpc:169783] Signal code: Address not mapped (1)

[cschpc:169783] Failing at address: (nil)

[cschpc:169783] [ 0] /lib64/ [0x3f6940f710]

[cschpc:169783] [ 1] ./lmp_kokkos_mpi_only(_ZN6Kokkos12parallel_forINS_11RangePolicyIJNS_6SerialEN9LAMMPS_NS27PairReaxFindBondSpeciesZeroEEEENS3_15PairReaxCKokkosIS2_EEEEvRKT_RKT0_RKSsPNS_4Impl9enable_ifIXntsrNSG_11is_integralIS8_EE5valueEvE4typeE+0x268) [0x17a1bc8]

[cschpc:169783] [ 2] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS15PairReaxCKokkosIN6Kokkos6SerialEE15FindBondSpeciesEv+0xb0) [0x17aa0d0]

[cschpc:169783] [ 3] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS15PairReaxCKokkosIN6Kokkos6SerialEE7computeEii+0x34a4) [0x17e26e4]

[cschpc:169783] [ 4] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS12VerletKokkos5setupEv+0x6aa) [0x1a6b43a]

[cschpc:169783] [ 5] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS3Run7commandEiPPc+0x65e) [0x1a2271e]

[cschpc:169783] [ 6] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x26) [0xcfcc66]

[cschpc:169783] [ 7] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS5Input15execute_commandEv+0x7e7) [0xcfb0f7]

[cschpc:169783] [ 8] ./lmp_kokkos_mpi_only(_ZN9LAMMPS_NS5Input4fileEv+0x317) [0xcfbc57]

[cschpc:169783] [ 9] ./lmp_kokkos_mpi_only(main+0x46) [0xd136c6]

[cschpc:169783] [10] /lib64/ [0x3f6881ed5d]

[cschpc:169783] [11] ./lmp_kokkos_mpi_only() [0x6adfd1]

[cschpc:169783] *** End of error message ***


mpirun noticed that process rank 0 with PID 169783 on node exited on signal 11 (Segmentation fault).


Is it from the shortage of the ram on the computer?

It does not help my mind. If you have any suggestion about this problem, I will be glad.


Thanks in advance for your help


Best regard



Mohammad Ebrahim izadi,

Department of Chemistry,

Tehran University,

Islamic Republic of Iran,

Phone : +98 – 21 – 61113358

Fax :  +98 – 21 – 66409348

Dr. Axel Kohlmeyer
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.