LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands


From: Axel Kohlmeyer <akohlmey@...24...>
Date: Mon, 20 Nov 2017 11:02:33 -0500

please always respond to the mailing list and not only to individual people.

On Mon, Nov 20, 2017 at 10:49 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:

Dear Axel, 


Usually when there was a LAMMPS error it used to be written at the end of both the log file and the output to the screen. In this case there is nothing additional (this is with various installation of lammps on that system - both mine and the cluster installation)

​it doesn't matter what happens usually. you've asked for advice, and i have given you the correct advice. if you don't want to follow it, that is up to you. but of course, that will also mean, that i won't make any more efforts to help you. as i've said before, unless you've disabled them, there *must* be some additional output files containing stdout and stderr captured by the batch system and they will contain the information that is needed. if you don't have that information, i cannot help you. 

axel.



 


The only reported error is the one I attached in the first email. 


I just retested the  in.lj benchmark in the /bench/KEPLER folder and it seems to have worked properly.



LAMMPS (11 Aug 2017)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (429.977 859.953 429.977)
  4 by 12 by 8 MPI processor grid
Created 134217728 atoms

--------------------------------------------------------------------------
- Using acceleration for lj/cut:
-  with 12 proc(s) per device.
--------------------------------------------------------------------------
Device 0: Tesla P100-PCIE-16GB, 56 CUs, 15/16 GB, 1.3 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Device 0 on core 0...Done.
Initializing Device 0 on core 1...Done.
Initializing Device 0 on core 2...Done.
Initializing Device 0 on core 3...Done.
Initializing Device 0 on core 4...Done.
Initializing Device 0 on core 5...Done.
Initializing Device 0 on core 6...Done.
Initializing Device 0 on core 7...Done.
Initializing Device 0 on core 8...Done.
Initializing Device 0 on core 9...Done.
Initializing Device 0 on core 10...Done.
Initializing Device 0 on core 11...Done.

Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 0.005
Per MPI rank memory allocation (min/avg/max) = 60.24 | 61.2 | 62.57 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1.44   -6.7733676            0   -4.6133676   -5.0196694
    1000   0.70386858   -5.6762642            0   -4.6204613   0.70407172
Loop time of 48.8047 on 384 procs for 1000 steps with 134217728 atoms

Performance: 8851.603 tau/day, 20.490 timesteps/s
99.9% CPU use with 384 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 15.916     | 17.491     | 18.798     |  11.9 | 35.84
Neigh   | 5.1737e-05 | 7.6974e-05 | 9.656e-05  |   0.0 |  0.00
Comm    | 12.448     | 13.725     | 15.599     |  13.4 | 28.12
Output  | 0.0030501  | 0.011711   | 0.064718   |   8.7 |  0.02
Modify  | 14.618     | 15.124     | 15.605     |   4.1 | 30.99
Other   |            | 2.453      |            |       |  5.03

Nlocal:    349525 ave 350207 max 348928 min
Histogram: 4 16 35 61 96 100 53 16 2 1
Nghost:    88551.5 ave 89028 max 88173 min
Histogram: 7 22 53 83 81 72 37 20 4 5
Neighs:    0 ave 0 max 0 min
Histogram: 384 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0
Ave neighs/atom = 0
Neighbor list builds = 50
Dangerous builds not checked


---------------------------------------------------------------------
      Device Time Info (average):
---------------------------------------------------------------------
Neighbor (CPU):  0.3026 s.
Device Overhead: 0.9870 s.
Average split:   1.0000.
Threads / atom:  4.
Max Mem / Proc:  494.17 MB.
CPU Driver_Time: 0.7367 s.
CPU Idle_Time:   5.4407 s.
---------------------------------------------------------------------


Please see the log.cite file for references relevant to this simulation

Total wall time: 0:00:51


This was the submission script:



#!/bin/bash -l
#
#SBATCH --job-name=phos_4
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=12
#SBATCH --cpus-per-task=1
#SBATCH --constraint=gpu
#SBATCH --partition=normal

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export CRAY_CUDA_MPS=1

srun -c $SLURM_CPUS_PER_TASK lmp_mpi -sf gpu -pk gpu 1  -v x 128 -v y 128 -v z 128 -v t 100 < in.lj


mv log.lammps log.10Sep14.lj.gpu.double.128K.16.1

Kind regards,

Riccardo


Below you can find the input file. 


units metal


variable ts       equal    0.001       
variable nequil   equal    10000        
variable nsteps   equal  8000000     


variable temp_s   equal      300 
variable temp_f   equal      300  
variable trel     equal        0.1     
variable tscale   equal        1       
variable npttype  string     iso      
variable pres     equal        1.01325  
variable prel     equal        1.0  # barostat relaxation time

boundary p p p
package gpu 1 neigh no

atom_style full

read_data ${inpfile}"  # read in coordinates

include ${fffile}  #  force field

kspace_style pppm 1.0e-6
kspace_modify fftbench no

 fix npt free npt temp ${temp_s} ${temp_f} ${trel} ${npttype} ${pres} ${pres} ${prel} tchain 5 pchain 5 mtk yes


run ${nsteps}



From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 20 November 2017 15:31:13

To: riccardo innocenti
Cc: lammps-users@...396...sourceforge.net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
 
unless you turned them off, batch systems usually capture the standard and error output from the submitted scripts. there *must* be an error message in those somewhere.

but there are a few things in your log file that don't make much sense to me.
have you been able to run any of the benchmark examples correctly on the GPUs?
what exactly is the command line and the input for your simulation?

On Mon, Nov 20, 2017 at 2:58 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:

Dear Axel,


But in this case it does not seem to print an error message. 


These are my last lines in log.lammps:


Neighbor list info ...
  update every 1 steps, delay 10 steps, check yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 12
  ghost atom cutoff = 12
  binsize = 6, bins = 25 25 25
  7 neighbor lists, perpetual/occasional/extra = 7 0 0
  (1) pair coul/long/gpu, perpetual, skip from (6)
      attributes: full, newton off
      pair build: skip
      stencil: none
      bin: none
  (2) pair lj/cut/gpu, perpetual, skip from (6)
      attributes: full, newton off
      pair build: skip
      stencil: none
      bin: none
  (3) pair lj/cut/coul/long/gpu, perpetual, skip from (6)
      attributes: full, newton off
      pair build: skip
      stencil: none
      bin: none
  (4) pair buck/gpu, perpetual, skip from (6)
      attributes: full, newton off
      pair build: skip
      stencil: none
      bin: none
  (5) pair lennard/mdf, perpetual, skip from (7)
      attributes: half, newton off
      pair build: skip
      stencil: none
      bin: none
  (6) neighbor class addition, perpetual
      attributes: full, newton off
      pair build: full/bin
      stencil: full/bin/3d
      bin: standard
  (7) neighbor class addition, perpetual, half/full from (6)
      attributes: half, newton off
      pair build: halffull/newtoff
      stencil: none
      bin: none
WARNING: Inconsistent image flags (../domain.cpp:785)
Memory usage per processor = 84.5436 Mbytes
Step Time PotEng Temp Press Volume Pxx Pyy Pzz Cella Cellb Cellc CellAlpha CellBeta CellGamma CPU
       0            0    -45205.24          300   -297.94047    3167414.5   -130.57545   -340.03169   -423.21427    146.85936    146.85936    146.85936           90           90           90            0


and then the programs just call MPI_Abort(). There does not seem to be any indication of where the error is.


Kind regards,

Riccardo


From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 19 November 2017 19:07:21

To: riccardo innocenti
Cc: lammps-users@...655....net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
 


On Sun, Nov 19, 2017 at 10:34 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:

Dear Axel,


Thank you for the reply.


I was not interested in accelerating those styles (mdf) on the gpu, but the other ones present in my force field file (e.g. pppm, coul/long...).


what part of the output could help identify what the problem is?

​wherever the error messages are captured.  when LAMMPS calls MPI_Abort(), this will only be after it printed an error message stating why it stopped.

axel.​
 


Kind regards,

Riccardo


From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 19 November 2017 16:19:26
To: riccardo innocenti
Cc: lammps-users@...655....net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
 


On Sun, Nov 19, 2017 at 9:12 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:

Dear All,


I am trying to run some simulations on gpu accelerated nodes (NVIDIA Tesla K20X with 6 GB GDDR5 memory) using the mdf class of  potentials. The lammps version I am using is 10Mar17.


When I used the mdf pair_style (Does not matter if it is the buck, lennard or lj type) the simulation fails after outputting the energy at step 0 without error messages (in log.lammps). My output file last lines look like:


​the output below if from your queuing ​system, except for the first line. so it is not useful at all. consult with your local admin staff to learn how to find the output to the screen.

also, trying to run mdf pair styles on the GPU is a pointless exercise, since those styles are not GPU accelerated, as is clearly evident from the LAMMPS manual.

axel.

 


Rank 20 [Sun Nov 19 15:05:23 2017] [c0-1c1s1n0] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 20
srun: error: nid01988: task 20: Aborted
srun: Terminating job step 4576432.0
slurmstepd: error: *** STEP 4576432.0 ON nid01987 CANCELLED AT 2017-11-19T15:05:23 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
Initializing Device 0 on core 11...srun: error: nid01992: tasks 60-68,70-71: Killed
srun: error: nid01990: tasks 36-47: Killed
srun: error: nid01994: tasks 84-95: Killed
srun: error: nid01993: tasks 72-83: Killed
srun: error: nid01988: tasks 12-19,21-23: Killed
srun: error: nid01989: tasks 24-35: Killed
srun: error: nid01992: task 69: Killed
srun: error: nid01987: tasks 0-11: Killed
srun: error: nid01991: tasks 48-59: Killed
"slurm-4576432.out" 170L, 6808C                  


When I run the simulation without GPU acceleration the simulations run without any issues.


I am not sure what the error could be. Does anyone has any suggestion?


Kind regards,

Riccardo


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users




--
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users




--
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users




--
Dr. Axel Kohlmeyer  akohlmey@...92......  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.



--
Dr. Axel Kohlmeyer  akohlmey@...92......  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.