LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Improved performance for setup on large numbers of MPI ranks
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] Improved performance for setup on large numbers of MPI ranks


From: Christopher Knight <cjknight2009@...24...>
Date: Thu, 26 Oct 2017 16:12:38 -0500

Hi Stan,

Yup, I saw the note. I think I have replicate working with triclinic systems; waiting for verification in large tests queued up. Input and source attached if you want to try before commit,

chris


Attachment: in.lj-triclinic
Description: Binary data


Attachment: replicate.cpp
Description: Binary data



On Oct 26, 2017, at 4:05 PM, Moore, Stan <stamoor@...3...> wrote:

Chris, I just updated the PR on GitHub with an example input deck. It looks like an issue with triclinic boxes (orthogonal boxes such as the rhodo benchmark work great).

Stan

-----Original Message-----
From: Christopher Knight [mailto:cjknight2009@...24...] 
Sent: Thursday, October 26, 2017 1:39 PM
To: Moore, Stan <stamoor@...3...>
Cc: akohlmey@...24...; lammps-users@lists.sourceforge.net
Subject: [EXTERNAL] Re: [lammps-users] Improved performance for setup on large numbers of MPI ranks

Hi Stan,

Yup, sure thing. Just did a quick test with the remotes/origin/comm-nprocs-opt branch and the rhodo input replicated 30x30x30 OK for me on 4,096 ranks. 

When writing this, not assigning atoms correctly was usually because the overlap criteria wasn’t robust, so my first guess is that’s the issue in your case (assuming it’s not because I simply omitted support/testing for something, e.g. triclinic).

Can you share details of what you ran and/or inputs (if not readily accessible)? I didn’t see description on github of inputs that crashed.

chris


=========

LAMMPS (23 Oct 2017)
 using 1 OpenMP thread(s) per MPI task
using multi-threaded neighbor list subroutines Reading data file ...
 orthogonal box = (-27.5 -38.5 -36.3646) to (27.5 38.5 36.3615)
 16 by 16 by 16 MPI processor grid
 reading atoms ...
 32000 atoms
…
Replicating atoms ...
 orthogonal box = (-27.5 -38.5 -36.3646) to (1622.5 2271.5 2145.42)
 16 by 16 by 16 MPI processor grid
Replicate::bounding box image: lo= -1 -1 -1  hi= 1 1 1
Replicate:: buf_all memory allocating       8.02 MB
Replicate: average # of replicas added to proc= 111.458252 out of 27000 (0.412808 %)
 864000000 atoms
 748521000 bonds
 1092609000 angles
 1534383000 dihedrals
 27918000 impropers
…



On Oct 26, 2017, at 12:54 PM, Moore, Stan <stamoor@...3...> wrote:

Chris,

Axel added your code as a PR on LAMMPS GitHub: https://github.com/lammps/lammps/pull/713. I tried out the replicate memory command but got a crash, can you take a look?

Thanks,

Stan