From: | Axel Kohlmeyer <akohlmey@...24...> |
Date: | Mon, 20 Nov 2017 11:02:33 -0500 |
Dear Axel,
Usually when there was a LAMMPS error it used to be written at the end of both the log file and the output to the screen. In this case there is nothing additional (this is with various installation of lammps on that system - both mine and the cluster installation)
The only reported error is the one I attached in the first email.
I just retested the in.lj benchmark in the /bench/KEPLER folder and it seems to have worked properly.
LAMMPS (11 Aug 2017)Lattice spacing in x,y,z = 1.6796 1.6796 1.6796Created orthogonal box = (0 0 0) to (429.977 859.953 429.977)4 by 12 by 8 MPI processor gridCreated 134217728 atoms
------------------------------------------------------------ -------------- - Using acceleration for lj/cut:- with 12 proc(s) per device.------------------------------------------------------------ -------------- Device 0: Tesla P100-PCIE-16GB, 56 CUs, 15/16 GB, 1.3 GHZ (Mixed Precision)------------------------------------------------------------ --------------
Initializing Device and compiling on process 0...Done.Initializing Device 0 on core 0...Done.Initializing Device 0 on core 1...Done.Initializing Device 0 on core 2...Done.Initializing Device 0 on core 3...Done.Initializing Device 0 on core 4...Done.Initializing Device 0 on core 5...Done.Initializing Device 0 on core 6...Done.Initializing Device 0 on core 7...Done.Initializing Device 0 on core 8...Done.Initializing Device 0 on core 9...Done.Initializing Device 0 on core 10...Done.Initializing Device 0 on core 11...Done.
Setting up Verlet run ...Unit style : ljCurrent step : 0Time step : 0.005Per MPI rank memory allocation (min/avg/max) = 60.24 | 61.2 | 62.57 MbytesStep Temp E_pair E_mol TotEng Press0 1.44 -6.7733676 0 -4.6133676 -5.01966941000 0.70386858 -5.6762642 0 -4.6204613 0.70407172Loop time of 48.8047 on 384 procs for 1000 steps with 134217728 atoms
Performance: 8851.603 tau/day, 20.490 timesteps/s99.9% CPU use with 384 MPI tasks x no OpenMP threads
MPI task timing breakdown:Section | min time | avg time | max time |%varavg| %total------------------------------------------------------------ --- Pair | 15.916 | 17.491 | 18.798 | 11.9 | 35.84Neigh | 5.1737e-05 | 7.6974e-05 | 9.656e-05 | 0.0 | 0.00Comm | 12.448 | 13.725 | 15.599 | 13.4 | 28.12Output | 0.0030501 | 0.011711 | 0.064718 | 8.7 | 0.02Modify | 14.618 | 15.124 | 15.605 | 4.1 | 30.99This was the submission script:Other | | 2.453 | | | 5.03
Nlocal: 349525 ave 350207 max 348928 minHistogram: 4 16 35 61 96 100 53 16 2 1Nghost: 88551.5 ave 89028 max 88173 minHistogram: 7 22 53 83 81 72 37 20 4 5Neighs: 0 ave 0 max 0 minHistogram: 384 0 0 0 0 0 0 0 0 0
Total # of neighbors = 0Ave neighs/atom = 0Neighbor list builds = 50Dangerous builds not checked
------------------------------------------------------------ --------- Device Time Info (average):------------------------------------------------------------ --------- Neighbor (CPU): 0.3026 s.Device Overhead: 0.9870 s.Average split: 1.0000.Threads / atom: 4.Max Mem / Proc: 494.17 MB.CPU Driver_Time: 0.7367 s.CPU Idle_Time: 5.4407 s.------------------------------------------------------------ ---------
Please see the log.cite file for references relevant to this simulation
Total wall time: 0:00:51
#!/bin/bash -l##SBATCH --job-name=phos_4#SBATCH --time=1:00:00#SBATCH --nodes=1#SBATCH --tasks-per-node=12#SBATCH --cpus-per-task=1#SBATCH --constraint=gpu#SBATCH --partition=normal
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK export CRAY_CUDA_MPS=1
srun -c $SLURM_CPUS_PER_TASK lmp_mpi -sf gpu -pk gpu 1 -v x 128 -v y 128 -v z 128 -v t 100 < in.lj
mv log.lammps log.10Sep14.lj.gpu.double.128K.16.1 Kind regards,
Riccardo
Below you can find the input file.
units metal
variable ts equal 0.001variable nequil equal 10000variable nsteps equal 8000000
variable temp_s equal 300variable temp_f equal 300variable trel equal 0.1variable tscale equal 1run ${nsteps}variable npttype string isovariable pres equal 1.01325variable prel equal 1.0 # barostat relaxation time
boundary p p ppackage gpu 1 neigh no
atom_style full
read_data ${inpfile}" # read in coordinates
include ${fffile} # force field
kspace_style pppm 1.0e-6kspace_modify fftbench no
fix npt free npt temp ${temp_s} ${temp_f} ${trel} ${npttype} ${pres} ${pres} ${prel} tchain 5 pchain 5 mtk yes
From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 20 November 2017 15:31:13
To: riccardo innocenti
Cc: lammps-users@...396...sourceforge.net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commandsunless you turned them off, batch systems usually capture the standard and error output from the submitted scripts. there *must* be an error message in those somewhere.
but there are a few things in your log file that don't make much sense to me.have you been able to run any of the benchmark examples correctly on the GPUs?what exactly is the command line and the input for your simulation?
On Mon, Nov 20, 2017 at 2:58 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:
Dear Axel,
But in this case it does not seem to print an error message.
These are my last lines in log.lammps:
Neighbor list info ...
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 12
ghost atom cutoff = 12
binsize = 6, bins = 25 25 25
7 neighbor lists, perpetual/occasional/extra = 7 0 0
(1) pair coul/long/gpu, perpetual, skip from (6)
attributes: full, newton off
pair build: skip
stencil: none
bin: none
(2) pair lj/cut/gpu, perpetual, skip from (6)
attributes: full, newton off
pair build: skip
stencil: none
bin: none
(3) pair lj/cut/coul/long/gpu, perpetual, skip from (6)
attributes: full, newton off
pair build: skip
stencil: none
bin: none
(4) pair buck/gpu, perpetual, skip from (6)
attributes: full, newton off
pair build: skip
stencil: none
bin: none
(5) pair lennard/mdf, perpetual, skip from (7)
attributes: half, newton off
pair build: skip
stencil: none
bin: none
(6) neighbor class addition, perpetual
attributes: full, newton off
pair build: full/bin
stencil: full/bin/3d
bin: standard
(7) neighbor class addition, perpetual, half/full from (6)
attributes: half, newton off
pair build: halffull/newtoff
stencil: none
bin: none
WARNING: Inconsistent image flags (../domain.cpp:785)
Memory usage per processor = 84.5436 Mbytes
Step Time PotEng Temp Press Volume Pxx Pyy Pzz Cella Cellb Cellc CellAlpha CellBeta CellGamma CPU
0 0 -45205.24 300 -297.94047 3167414.5 -130.57545 -340.03169 -423.21427 146.85936 146.85936 146.85936 90 90 90 0
and then the programs just call MPI_Abort(). There does not seem to be any indication of where the error is.
Kind regards,
Riccardo
From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 19 November 2017 19:07:21
To: riccardo innocenti
Cc: lammps-users@...655....net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
On Sun, Nov 19, 2017 at 10:34 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:
Dear Axel,
Thank you for the reply.
I was not interested in accelerating those styles (mdf) on the gpu, but the other ones present in my force field file (e.g. pppm, coul/long...).
what part of the output could help identify what the problem is?
wherever the error messages are captured. when LAMMPS calls MPI_Abort(), this will only be after it printed an error message stating why it stopped.
axel.
Kind regards,
Riccardo
From: Axel Kohlmeyer <akohlmey@...24...>
Sent: 19 November 2017 16:19:26
To: riccardo innocenti
Cc: lammps-users@...655....net
Subject: Re: [lammps-users] issue running lennard/mdf when using gpu acceleration for other pair_style commands
On Sun, Nov 19, 2017 at 9:12 AM, riccardo innocenti <riccardo-1990@...4463...> wrote:
Dear All,
I am trying to run some simulations on gpu accelerated nodes (NVIDIA Tesla K20X with 6 GB GDDR5 memory) using the mdf class of potentials. The lammps version I am using is 10Mar17.
When I used the mdf pair_style (Does not matter if it is the buck, lennard or lj type) the simulation fails after outputting the energy at step 0 without error messages (in log.lammps). My output file last lines look like:
the output below if from your queuing system, except for the first line. so it is not useful at all. consult with your local admin staff to learn how to find the output to the screen.
also, trying to run mdf pair styles on the GPU is a pointless exercise, since those styles are not GPU accelerated, as is clearly evident from the LAMMPS manual.
axel.
Rank 20 [Sun Nov 19 15:05:23 2017] [c0-1c1s1n0] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 20
srun: error: nid01988: task 20: Aborted
srun: Terminating job step 4576432.0
slurmstepd: error: *** STEP 4576432.0 ON nid01987 CANCELLED AT 2017-11-19T15:05:23 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
Initializing Device 0 on core 11...srun: error: nid01992: tasks 60-68,70-71: Killed
srun: error: nid01990: tasks 36-47: Killed
srun: error: nid01994: tasks 84-95: Killed
srun: error: nid01993: tasks 72-83: Killed
srun: error: nid01988: tasks 12-19,21-23: Killed
srun: error: nid01989: tasks 24-35: Killed
srun: error: nid01992: task 69: Killed
srun: error: nid01987: tasks 0-11: Killed
srun: error: nid01991: tasks 48-59: Killed
"slurm-4576432.out" 170L, 6808C
When I run the simulation without GPU acceleration the simulations run without any issues.
I am not sure what the error could be. Does anyone has any suggestion?
Kind regards,
Riccardo
------------------------------------------------------------ ------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users
--
Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.
------------------------------------------------------------ ------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users
--
Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.
------------------------------------------------------------ ------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users
--
Dr. Axel Kohlmeyer akohlmey@...92...... http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.