[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From: |
Axel Kohlmeyer <akohlmey@...24...> |

Date: |
Tue, 17 Apr 2018 16:52:46 -0400 |

On Tue, Apr 17, 2018 at 4:47 PM, Wei Peng <pengwrpi2@...24...> wrote: > Dear Axel, > > Thank you for your followup! I tried the method you suggested, but I got the > performance worse. I am attaching the input file here, can you tell me which > part was wrong? Thank you! there is no usable input here anywhere. i cannot say anything. perhaps, you are following a red herring and have some other performance issue, that is somewhere else. axel. > > group np1 id 1:429:1 > > #group atoms into nanoparticle 1 > > compute coord1 np1 property/atom xu yu zu > compute c1 np1 com > variable cond1 atom "(c_coord1[1] - c_c1[1]) * 1 > 0.0" > group half1 dynamic np1 var cond1 every 1 > run 1 > group half1 static > > #find the half of 1st nanoparticle that I am gonna apply force > > variable famp equal "0.1" > > variable dirx1 atom "c_coord1[1]-c_c1[1]" > variable diry1 atom "c_coord1[2]-c_c1[2]" > variable dirz1 atom "c_coord1[3]-c_c1[3]" > variable diramp1 atom "sqrt(v_dirx1^2 + v_diry1^2 + v_dirz1^2)" > variable fx1 atom "gmask(half1)*v_famp*v_dirx1/v_diramp1" > variable fy1 atom "gmask(half1)*v_famp*v_diry1/v_diramp1" > variable fz1 atom "gmask(half1)*v_famp*v_dirz1/v_diramp1" > fix addfnp1 half1 addforce v_fx1 v_fy1 v_fz1 > > #calculate and apply for the force onto the half of the 1st nanoparticle. > #half1 is the group ID of the half of the nanoparticle that was applied the > force to > > variable nliquid equal "count(liquid)" > > compute ftotal all reduce sum v_fx1 v_fy1 v_fz1 v_fx2 v_fy2 v_fz2 > v_fx3 v_fy3 v_fz3 v_fx4 v_fy4 v_fz4 v_fx5 v_fy5 v_fz5 v_fx6 v_fy6 v_fz6 > v_fx7 v_fy7 v_fz7 v_fx8 v_fy8 v_fz8 > variable fxliquid equal "-1.0 * (c_ftotal[1] + c_ftotal[4] + > c_ftotal[7] + c_ftotal[10] + c_ftotal[13] + c_ftotal[16] + c_ftotal[19] + > c_ftotal[22]) / v_nliquid" > variable fyliquid equal "-1.0 * (c_ftotal[2] + c_ftotal[5] + > c_ftotal[8] + c_ftotal[11] + c_ftotal[14] + c_ftotal[17] + c_ftotal[20] + > c_ftotal[23]) / v_nliquid" > variable fzliquid equal "-1.0 * (c_ftotal[3] + c_ftotal[6] + > c_ftotal[9] + c_ftotal[12] + c_ftotal[15] + c_ftotal[18] + c_ftotal[21] + > c_ftotal[24]) / v_nliquid" > fix addliquid liquid addforce v_fxliquid v_fyliquid v_fzliquid > > # here is how I sum up all the applied force and take the opposite to the > liquid atoms" > > Loop time of 356.273 on 1024 procs for 10000 steps with 53432 atoms > > Performance: 24251.090 tau/day, 28.068 timesteps/s > 100.0% CPU use with 1024 MPI tasks x 1 OpenMP threads > > MPI task timing breakdown: > Section | min time | avg time | max time |%varavg| %total > --------------------------------------------------------------- > Pair | 0.12908 | 0.34635 | 0.37527 | 2.9 | 0.10 > Bond | 0.013135 | 0.072627 | 3.3069 | 93.9 | 0.02 > Neigh | 7.3525 | 7.7132 | 7.8345 | 3.1 | 2.16 > Comm | 3.8194 | 4.5665 | 9.9215 | 53.9 | 1.28 > Output | 0.0743 | 0.07437 | 0.079351 | 0.1 | 0.02 > Modify | 337.3 | 342.59 | 343.31 | 6.4 | 96.16 > Other | | 0.9084 | | | 0.25 > > Nlocal: 52.1797 ave 267 max 42 min > Histogram: 988 15 10 3 4 2 0 0 1 1 > Nghost: 236.644 ave 752 max 208 min > Histogram: 955 36 13 7 5 4 2 0 1 1 > Neighs: 263.897 ave 347 max 47 min > Histogram: 5 4 1 6 4 22 305 500 171 6 > > Total # of neighbors = 270231 > Ave neighs/atom = 5.05747 > Ave special neighs/atom = 0.63243 > Neighbor list builds = 2220 > Dangerous builds = 0 > Total wall time: 0:05:58 > > # This is the performance report. With the new method, the ave running time > 28 timesteps/second. Previously, it was 45 timesteps/second. > > Wei Peng > Graduate Student at Rensselaer Polytechnic Institute > Department of Materials Science and Engineering > 110 8th Street, Troy, NY 12180 > > On Tue, Apr 17, 2018 at 9:27 AM, Axel Kohlmeyer <akohlmey@...24...> wrote: >> >> On Mon, Apr 16, 2018 at 8:28 PM, Wei Peng <pengwrpi2@...24...> wrote: >> > Dear Axel, >> > >> > Thank you so much for your prompt response! >> > >> > Can you tell me how to correctly merge all the reduction? >> > >> > I tried this: >> > >> > compute ftotal all reduce sum v_fx1 v_fy1 v_fz1 v_fx2 v_fy2 v_fz2 v_fx3 >> > v_fy3 v_fz3 v_fx4 v_fy4 v_fz4 v_fx5 v_fy5 v_fz5 v_fx6 v_fy6 v_fz6 v_fx7 >> > v_fy7 v_fz7 v_fx8 v_fy8 v_fz8 >> > >> > And I found c_ftotal[1] is wrong, not the same as c_fxsum1 ( which is >> > from >> > "compute fxsum1 half1 reduce sum v_fx1"). Ideally, I hope v_fx1 can be >> > only >> > defined for group half1, but I don't know how to limit scope of per-atom >> > variable to a fraction of atoms in the simulation box. >> > >> > I am thinking about set fx1 value for all other atoms to be zero >> > explicitly >> > (but I still don't know how), and do the merge of reduction as I posted >> > above. But this is clearly not an elegant solution. What advice do you >> > have >> > for either defining v_fx1 exclusively on atoms of half1 group or doing >> > the >> > merge of reduction differently? >> >> you need to use the gmask(group ID) function in the individual atom >> style variables that you want to sum over to select the atoms by >> group. gmask() is 1 for atoms in a group and 0 for atoms outside, but >> as a per-atom function, it runs perfectly in parallel. >> >> axel. >> >> >> > >> > Thanks again, >> > Wei >> > >> > >> > Wei Peng >> > Graduate Student at Rensselaer Polytechnic Institute >> > Department of Materials Science and Engineering >> > 110 8th Street, Troy, NY 12180 >> > >> > On Mon, Apr 16, 2018 at 5:59 PM, Axel Kohlmeyer <akohlmey@...24...> >> > wrote: >> >> >> >> On Mon, Apr 16, 2018 at 4:58 PM, Wei Peng <pengwrpi2@...24...> wrote: >> >> > Dear LAMMPS administrators or users, >> >> > >> >> > I am simulating a system with 8 nanoparticles containing >> >> > Lennard-Jones >> >> > atoms >> >> > connected by FENE bonds. Every step, I applied a force to each atom >> >> > that >> >> > are >> >> > contained in half of each nanoparticle. The force applied to every >> >> > atom >> >> > all >> >> > points to the center of the mass of the whole nanoparticle. To >> >> > conserve >> >> > the >> >> > momentum, I added an opposite force that is applied to the rest of >> >> > the >> >> > system. To take the additional energy out of the system, I used >> >> > Nose-Hoover >> >> > thermostat. >> >> > >> >> > I used fix addforce command to apply the extra force. It turns out it >> >> > works >> >> > very well, and it can give the expected result. However, the >> >> > efficiency >> >> > of >> >> > the computation is significantly slowed down by this additional force >> >> > (~20 >> >> > times slower). I guess the low efficiency is mainly caused by the >> >> > communication cost, but it turns out not the case (according to the >> >> > output >> >> > file produced by LAMMPS). >> >> >> >> that output only includes cost of communication for regular known >> >> communication points >> >> does not include time spend in communication that is included in >> >> variable updates or computes. >> >> >> >> the obvious cost is for compute reduce. you should be able to cut this >> >> cost by a significant margin by doing one reduction over three values >> >> instead of three reductions over one value. >> >> ...and since you have seven more nanoparticles, you can reduce that >> >> cost even by combining all reductions into one. >> >> so right now you seem to be doing 24 reductions, which can be handled >> >> by just one compute reduce. >> >> >> >> another "hidden" reduction operation is in the count(groupID) >> >> function. if the number doesn't change over the course of a run, you >> >> can cache it with something like this: >> >> >> >> variable nliquid equal $(ncount(liquid)) >> >> >> >> so by avoiding redundant reductions you should be able to >> >> significantly reduce the computational effort. >> >> >> >> axel. >> >> >> >> > >> >> > I have attached the input file and the output file here. The version >> >> > of >> >> > LAMMPS is Aug. 2017. >> >> > >> >> > Can you give me any advice on accelerating the computation. Thank you >> >> > for >> >> > your help! >> >> > >> >> > Below is the input file: >> >> > >> >> > >> >> > compute coord1 np1 property/atom xu yu zu >> >> > compute c1 np1 com >> >> > >> >> > # np1 is the group id of nanoparticle 1. >> >> > >> >> > variable famp equal "0.1" >> >> > >> >> > variable dirx1 atom "c_coord1[1]-c_c1[1]" >> >> > variable diry1 atom "c_coord1[2]-c_c1[2]" >> >> > variable dirz1 atom "c_coord1[3]-c_c1[3]" >> >> > variable diramp1 atom "sqrt(v_dirx1^2 + v_diry1^2 + >> >> > v_dirz1^2)" >> >> > variable fx1 atom "v_famp*v_dirx1/v_diramp1" >> >> > variable fy1 atom "v_famp*v_diry1/v_diramp1" >> >> > variable fz1 atom "v_famp*v_dirz1/v_diramp1" >> >> > compute fxsum1 half1 reduce sum v_fx1 >> >> > compute fysum1 half1 reduce sum v_fy1 >> >> > compute fzsum1 half1 reduce sum v_fz1 >> >> > fix addfnp1 half1 addforce v_fx1 v_fy1 v_fz1 >> >> > >> >> > # "half1" is the group id of the half of nanoparticle 1 that was >> >> > applied >> >> > a >> >> > force to >> >> > # I did the same thing for other 7 nanoparticles, which was not >> >> > posted >> >> > here. >> >> > >> >> > variable fxliquid equal "-1.0 >> >> > >> >> > >> >> > *(c_fxsum1+c_fxsum2+c_fxsum3+c_fxsum4+c_fxsum5+c_fxsum6+c_fxsum7+c_fxsum8)/count(liquid)" >> >> > >> >> > variable fyliquid equal "-1.0 * >> >> > >> >> > >> >> > (c_fysum1+c_fysum2+c_fysum3+c_fysum4+c_fysum5+c_fysum6+c_fysum7+c_fysum8)/count(liquid)" >> >> > >> >> > variable fzliquid equal "-1.0 * >> >> > >> >> > >> >> > (c_fzsum1+c_fzsum2+c_fzsum3+c_fzsum4+c_fzsum5+c_fzsum6+c_fzsum7+c_fzsum8)/count(liquid)" >> >> > >> >> > fix addliquid liquid addforce v_fxliquid v_fyliquid >> >> > v_fzliquid >> >> > >> >> > # I summed up the forces applied to the 8 nanoparticles and take the >> >> > opposite and then add it to every atom in group "liquid". >> >> > >> >> > Here is the performance report in the output file: >> >> > >> >> > Loop time of 2228.15 on 1024 procs for 100000 steps with 53432 atoms >> >> > >> >> > Performance: 38776.610 tau/day, 44.880 timesteps/s >> >> > 100.0% CPU use with 1024 MPI tasks x 1 OpenMP threads >> >> > >> >> > MPI task timing breakdown: >> >> > Section | min time | avg time | max time |%varavg| %total >> >> > --------------------------------------------------------------- >> >> > Pair | 3.1147 | 3.4368 | 3.6571 | 4.7 | 0.15 >> >> > Bond | 0.14084 | 0.74264 | 5.9983 | 113.4 | 0.03 >> >> > Neigh | 152.64 | 153.88 | 154.96 | 4.0 | 6.91 >> >> > Comm | 39.216 | 47.364 | 63.729 | 80.0 | 2.13 >> >> > Output | 8.1437 | 8.8403 | 9.5221 | 13.4 | 0.40 >> >> > Modify | 1982.8 | 1999.6 | 2007.8 | 12.5 | 89.74 >> >> > Other | | 14.33 | | | 0.64 >> >> > >> >> > Nlocal: 52.1797 ave 478 max 40 min >> >> > Histogram: 1010 6 2 2 0 2 0 1 0 1 >> >> > Nghost: 236.358 ave 1235 max 204 min >> >> > Histogram: 993 16 4 2 3 3 1 1 0 1 >> >> > Neighs: 261.92 ave 343 max 63 min >> >> > Histogram: 2 3 4 4 5 66 355 429 145 11 >> >> > >> >> > Total # of neighbors = 268206 >> >> > Ave neighs/atom = 5.01958 >> >> > Ave special neighs/atom = 0.63243 >> >> > Neighbor list builds = 22088 >> >> > Dangerous builds = 0 >> >> > Total wall time: 0:37:10 >> >> > >> >> > >> >> > Sincerely, >> >> > Wei >> >> > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Check out the vibrant tech community on one of the world's most >> >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> > _______________________________________________ >> >> > lammps-users mailing list >> >> > lammps-users@lists.sourceforge.net >> >> > https://lists.sourceforge.net/lists/listinfo/lammps-users >> >> > >> >> >> >> >> >> >> >> -- >> >> Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0 >> >> College of Science & Technology, Temple University, Philadelphia PA, >> >> USA >> >> International Centre for Theoretical Physics, Trieste. Italy. >> > >> > >> >> >> >> -- >> Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0 >> College of Science & Technology, Temple University, Philadelphia PA, USA >> International Centre for Theoretical Physics, Trieste. Italy. > > -- Dr. Axel Kohlmeyer akohlmey@...24... http://goo.gl/1wk0 College of Science & Technology, Temple University, Philadelphia PA, USA International Centre for Theoretical Physics, Trieste. Italy.

**Follow-Ups**:**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Wei Peng <pengwrpi2@...24...>

**References**:**[lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Wei Peng <pengwrpi2@...24...>

**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Axel Kohlmeyer <akohlmey@...24...>

**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Wei Peng <pengwrpi2@...24...>

**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Axel Kohlmeyer <akohlmey@...24...>

**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency***From:*Wei Peng <pengwrpi2@...24...>

- Prev by Date:
**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency** - Next by Date:
**Re: [lammps-users] COM g(r) of Polymer molecular** - Previous by thread:
**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency** - Next by thread:
**Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency** - Index(es):