LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
[lammps-users] GPU compiled but binary sleeps...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lammps-users] GPU compiled but binary sleeps...


From: "Meij, Henk" <hmeij@...1881...>
Date: Mon, 31 Jul 2017 15:09:27 +0000

Hi all, I compiled 31Mar17 with g++ with some packages and libjpeg, sequence below.


lmp_serial and lmp_mpi (openmpi) compile and execute the colloid example successfully.


Then I compile lmp_auto (for some reason editing lib/gpu/Makefile[auto|linux] have no effect) with CUDA_HOME etc all set. There are no compilation errors, the compilation finishes and lmp_auto is created (which I rename to lmp_gpu_double)


 cd /tmp/lammps/lammps-31Mar17/src
 make yes-gpu; make yes-colloid;  make yes-class2;  make yes-kspace;  make yes-misc;  make yes-molecule

make clean
./Make.py -v -j 2 -p colloid class2 kspace misc molecule gpu -gpu mode=double arch=35 -o gpu_double -a lib-gpu file clean mpi


But when I run gpu colloid example via scheduler lammps starts in the allocated gpu and hangs, here is scheduler invocation (GPUIDX does nothing right now, it is a toggle flag for cpu only or gpu only, but I'm running vanilla in.colloid)


executing /share/apps/CENTOS6/openmpi/1.8.4/bin/mpirun -x LD_LIBRARY_PATH -machinefile /home/hmeij/.lsbatch/mpi_machines.855573 -np 1 /share/apps/CENTOS6/lammps/31Mar17/lmp_gpu_double -suffix gpu -var GPUIDX 1 -in in.colloid -l out.colloid
LAMMPS (31 Mar 2017)   <---- output from job

strace reveals it looping in nanosleeps (and what's with the 284g virt footprint, the gpu process launched is 61mb on gpu)
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20054 hmeij     20   0 23752 2340 1148 S  0.0  0.0   0:00.02 res
20058 hmeij     20   0  103m 1248 1044 S  0.0  0.0   0:00.00 1501267348.8555
20061 hmeij     20   0  103m 1316 1100 S  0.0  0.0   0:00.00 1501267348.8555
20178 hmeij     20   0  104m 1320 1104 S  0.0  0.0   0:00.00 openmpi-mpirun-
20276 hmeij     20   0  139m 3544 2428 S  0.0  0.0   0:00.08 mpirun
20278 hmeij     20   0  284g  43m  38m S  0.0  0.0   0:13.35 lmp_gpu_double

[root@...7028... ~]# strace -p 20278
Process 20278 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
ioctl(16, 0xc020462a, 0x7fffffffa7b0)   = 0
nanosleep({10, 0}, NULL)                = 0
ioctl(16, 0xc020462a, 0x7fffffffa7b0)   = 0
nanosleep({10, 0}, ^C <unfinished ...>
Process 20278 detached

Any pointers as to what may be the cause. Thanks,

-Henk