LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] GPU compiled but binary sleeps...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] GPU compiled but binary sleeps...

From: Axel Kohlmeyer <akohlmey@...24...>
Date: Wed, 2 Aug 2017 19:40:10 -0400

On Wed, Aug 2, 2017 at 9:24 AM, Meij, Henk <hmeij@...1881...> wrote:

Should mention I compiled for K20 using cuda v5.0 ... maybe too old?

​hard to say. what is the output of nvc_get_devices ?
can you run any other GPU code on this machine?



I tried this recipe and did it all without errors to end up with lmp_gnu and lmp_gpu.

Cpu  run runs fine but  gpu run disappears in a nanosleep loop again. Weird.


Any ideas?


From: Meij, Henk
Sent: Monday, July 31, 2017 11:09:27 AM
Subject: GPU compiled but binary sleeps...

Hi all, I compiled 31Mar17 with g++ with some packages and libjpeg, sequence below.

lmp_serial and lmp_mpi (openmpi) compile and execute the colloid example successfully.

Then I compile lmp_auto (for some reason editing lib/gpu/Makefile[auto|linux] have no effect) with CUDA_HOME etc all set. There are no compilation errors, the compilation finishes and lmp_auto is created (which I rename to lmp_gpu_double)

 cd /tmp/lammps/lammps-31Mar17/src
 make yes-gpu; make yes-colloid;  make yes-class2;  make yes-kspace;  make yes-misc;  make yes-molecule

make clean
./ -v -j 2 -p colloid class2 kspace misc molecule gpu -gpu mode=double arch=35 -o gpu_double -a lib-gpu file clean mpi

But when I run gpu colloid example via scheduler lammps starts in the allocated gpu and hangs, here is scheduler invocation (GPUIDX does nothing right now, it is a toggle flag for cpu only or gpu only, but I'm running vanilla in.colloid)

executing /share/apps/CENTOS6/openmpi/1.8.4/bin/mpirun -x LD_LIBRARY_PATH -machinefile /home/hmeij/.lsbatch/mpi_machines.855573 -np 1 /share/apps/CENTOS6/lammps/31Mar17/lmp_gpu_double -suffix gpu -var GPUIDX 1 -in in.colloid -l out.colloid
LAMMPS (31 Mar 2017)   <---- output from job

strace reveals it looping in nanosleeps (and what's with the 284g virt footprint, the gpu process launched is 61mb on gpu)
20054 hmeij     20   0 23752 2340 1148 S  0.0  0.0   0:00.02 res
20058 hmeij     20   0  103m 1248 1044 S  0.0  0.0   0:00.00 1501267348.8555
20061 hmeij     20   0  103m 1316 1100 S  0.0  0.0   0:00.00 1501267348.8555
20178 hmeij     20   0  104m 1320 1104 S  0.0  0.0   0:00.00 openmpi-mpirun-
20276 hmeij     20   0  139m 3544 2428 S  0.0  0.0   0:00.08 mpirun
20278 hmeij     20   0  284g  43m  38m S  0.0  0.0   0:13.35 lmp_gpu_double

[root@...7028... ~]# strace -p 20278
Process 20278 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
ioctl(16, 0xc020462a, 0x7fffffffa7b0)   = 0
nanosleep({10, 0}, NULL)                = 0
ioctl(16, 0xc020462a, 0x7fffffffa7b0)   = 0
nanosleep({10, 0}, ^C <unfinished ...>
Process 20278 detached

Any pointers as to what may be the cause. Thanks,


Check out the vibrant tech community on one of the world's most
engaging tech sites,!
lammps-users mailing list

Dr. Axel Kohlmeyer  akohlmey@...12...24...
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.