LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Accelerate the simulation-kspace as the bottleneck
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] Accelerate the simulation-kspace as the bottleneck


From: Azade Yazdan Yar <azade.yazdanyar@...24...>
Date: Sat, 24 Jun 2017 18:36:24 +0200

Hi, 

The cutoff for Columb is 12A (I have buckingham and lj as the short-range with cutoffs of 12 and 10A, respectively). 
At the beginning, I used ewald with an accuracy of 10e-6. Then I switched to pppm (with the same accuracy) as I saw the DFFT_SINGLE option. 
I am using intel and intelmpi for compiling LAMMPS. If I understand correctly, you are suggesting to recompile it with openmpi?

I have 8 atom types in the solid slab out of which 2 are Ti, 4 are O and 2 are H. The two Ti, for example, have similar pair interactions with other atom types in my system. Is there a way that I can tell Lammps to do such calculations once instead of twice? the charge for these two Ti are different and that's why I am defining two atom types. These calculations, though, are not what is slowing down the simulation, after all.

I also defined a group which defined two atom types which I have in the solid slab. I excluded this group in 'neigh_modify' and applied the thermostat etc. on the rest of atom types as I thought this can also reduce the number of calculations. 

Thanks a lot.

Sincerely,
Azade

On Sat, Jun 24, 2017 at 4:27 PM, Axel Kohlmeyer <akohlmey@...24...> wrote:


On Sat, Jun 24, 2017 at 6:15 AM, Azade Yazdan Yar <azade.yazdanyar@...24...> wrote:
Hi,

I have a system which consists of a solid slab, water and a vacuum layer. I am using pppm and the 'slab' option.
I used to use dlpoly for my system but due to slow performance and its poor scalability for my specific system, I decided to see how better can lammps do. I reduced the number of total atoms in my system to one forth (before 24,000, now 6,800) as I figured out that my system was unnecessarily large, before.
After running some tests with lammps and 'ewald' as the kspace, this was the breakdown of lammps performance:

​what is your pair style (coulomb) cutoff? 

have you tried pppm instead of ewald?

have you tried using a mix of MPI and OpenMP?

in general, there is a lower limit as to how many atoms per processor you can have until there is no more speedup. often this is in the range of a few 100s of atoms. your system is very small, so you cannot expect spectacular scaling here.

axel. 

 

Loop time of 1550.53 on 24 procs for 5000 steps with 6575 atoms

Performance: 0.195 ns/day, 123.058 hours/ns, 3.225 timesteps/s
99.6% CPU use with 24 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.0082662  | 34.619     | 86.531     | 610.4 |  2.23
Bond    | 0.0041995  | 0.014318   | 0.040231   |  10.9 |  0.00
Kspace  | 1433       | 1485.3     | 1520.9     |  94.2 | 95.80
Neigh   | 0.947      | 0.96571    | 1.0072     |   2.1 |  0.06
Comm    | 0.01577    | 1.6679     | 2.531      |  67.3 |  0.11
Output  | 25.249     | 25.249     | 25.261     |   0.0 |  1.63
Modify  | 1.8545     | 2.3485     | 3.1424     |  28.0 |  0.15
Other   |            | 0.3262     |            |       |  0.02

Nlocal:    273.958 ave 759 max 0 min
Histogram: 12 0 0 0 0 6 2 0 0 4
Nghost:    7101.79 ave 15694 max 0 min
Histogram: 4 4 4 0 0 0 8 0 0 4
Neighs:    141151 ave 404803 max 0 min
Histogram: 12 0 0 2 2 0 2 2 0 4

As you see, kspace is the bottleneck, so I read that I can instead use pppm and DFFT_SINGLE to accelerate the simulation.
I recompiled lammps, and here you can see the breakdown again:

Loop time of 307.837 on 24 procs for 5000 steps with 6575 atoms

Performance: 0.982 ns/day, 24.432 hours/ns, 16.242 timesteps/s
97.2% CPU use with 24 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.012733   | 37.727     | 112.62     | 691.2 | 12.26
Bond    | 0.0044436  | 0.013352   | 0.03664    |   9.4 |  0.00
Kspace  | 188.99     | 264.28     | 303.19     | 263.7 | 85.85
Neigh   | 1.2895     | 1.3094     | 1.3538     |   1.9 |  0.43
Comm    | 0.014927   | 1.7641     | 2.6289     |  67.4 |  0.57
Output  | 0.10792    | 0.10805    | 0.10915    |   0.1 |  0.04
Modify  | 1.8796     | 2.3371     | 3.041      |  25.7 |  0.76
Other   |            | 0.2954     |            |       |  0.10

Nlocal:    273.958 ave 729 max 0 min
Histogram: 12 0 0 0 0 1 6 1 0 4
Nghost:    7108.12 ave 15831 max 0 min
Histogram: 4 4 4 0 0 0 8 0 0 4
Neighs:    77342.3 ave 238561 max 0 min
Histogram: 12 0 0 4 0 1 1 2 1 3

So the speed increased in general. I tried different number of nodes; having 4 nodes will give me an efficiency of 50% which I can afford. When using 4 nodes, I will be able to do 2 ns/day. Before, with dlpoly and the larger system, I used to do 0.5 ns/day; and this was in serial.

So as I see it, even though my system is smaller now, the performance has not improved as I hoped. Can anyone give my some ideas if there are extra things I can try? This speed is still computationally expensive for my goal. I will be happy to give you more information on system details, but at this point, I am not sure what details are of interest.

Sincerely,
Azade

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
lammps-users mailing list
lammps-users@...655....net
https://lists.sourceforge.net/lists/listinfo/lammps-users




--
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.