LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lammps-users] Fix Addforce Slow Down the Computational Efficiency


From: Axel Kohlmeyer <akohlmey@...24...>
Date: Tue, 17 Apr 2018 16:52:46 -0400

On Tue, Apr 17, 2018 at 4:47 PM, Wei Peng <pengwrpi2@...24...> wrote:
> Dear Axel,
>
> Thank you for your followup! I tried the method you suggested, but I got the
> performance worse. I am attaching the input file here, can you tell me which
> part was wrong? Thank you!

there is no usable input here anywhere. i cannot say anything.
perhaps, you are following a red herring and have some other
performance issue, that is somewhere else.

axel.

>
> group            np1 id 1:429:1
>
> #group atoms into nanoparticle 1
>
> compute          coord1 np1 property/atom xu yu zu
> compute          c1 np1 com
> variable         cond1 atom "(c_coord1[1] - c_c1[1]) * 1 > 0.0"
> group            half1 dynamic np1 var cond1 every 1
> run              1
> group            half1 static
>
> #find the half of 1st nanoparticle that I am gonna apply force
>
> variable         famp equal "0.1"
>
> variable         dirx1 atom "c_coord1[1]-c_c1[1]"
> variable         diry1 atom "c_coord1[2]-c_c1[2]"
> variable         dirz1 atom "c_coord1[3]-c_c1[3]"
> variable         diramp1 atom "sqrt(v_dirx1^2 + v_diry1^2 + v_dirz1^2)"
> variable         fx1 atom "gmask(half1)*v_famp*v_dirx1/v_diramp1"
> variable         fy1 atom "gmask(half1)*v_famp*v_diry1/v_diramp1"
> variable         fz1 atom "gmask(half1)*v_famp*v_dirz1/v_diramp1"
> fix              addfnp1 half1 addforce v_fx1 v_fy1 v_fz1
>
> #calculate and apply for the force onto the half of the 1st nanoparticle.
> #half1 is the group ID of the half of the nanoparticle that was applied the
> force to
>
> variable         nliquid equal "count(liquid)"
>
> compute          ftotal all reduce sum v_fx1 v_fy1 v_fz1 v_fx2 v_fy2 v_fz2
> v_fx3 v_fy3 v_fz3 v_fx4 v_fy4 v_fz4 v_fx5 v_fy5 v_fz5 v_fx6 v_fy6 v_fz6
> v_fx7 v_fy7 v_fz7 v_fx8 v_fy8 v_fz8
> variable         fxliquid equal "-1.0 * (c_ftotal[1] + c_ftotal[4] +
> c_ftotal[7] + c_ftotal[10] + c_ftotal[13] + c_ftotal[16] + c_ftotal[19] +
> c_ftotal[22]) / v_nliquid"
> variable         fyliquid equal "-1.0 * (c_ftotal[2] + c_ftotal[5] +
> c_ftotal[8] + c_ftotal[11] + c_ftotal[14] + c_ftotal[17] + c_ftotal[20] +
> c_ftotal[23]) / v_nliquid"
> variable         fzliquid equal "-1.0 * (c_ftotal[3] + c_ftotal[6] +
> c_ftotal[9] + c_ftotal[12] + c_ftotal[15] + c_ftotal[18] + c_ftotal[21] +
> c_ftotal[24]) / v_nliquid"
> fix              addliquid liquid addforce v_fxliquid v_fyliquid v_fzliquid
>
> # here is how I sum up all the applied force and take the opposite to the
> liquid atoms"
>
> Loop time of 356.273 on 1024 procs for 10000 steps with 53432 atoms
>
> Performance: 24251.090 tau/day, 28.068 timesteps/s
> 100.0% CPU use with 1024 MPI tasks x 1 OpenMP threads
>
> MPI task timing breakdown:
> Section |  min time  |  avg time  |  max time  |%varavg| %total
> ---------------------------------------------------------------
> Pair    | 0.12908    | 0.34635    | 0.37527    |   2.9 |  0.10
> Bond    | 0.013135   | 0.072627   | 3.3069     |  93.9 |  0.02
> Neigh   | 7.3525     | 7.7132     | 7.8345     |   3.1 |  2.16
> Comm    | 3.8194     | 4.5665     | 9.9215     |  53.9 |  1.28
> Output  | 0.0743     | 0.07437    | 0.079351   |   0.1 |  0.02
> Modify  | 337.3      | 342.59     | 343.31     |   6.4 | 96.16
> Other   |            | 0.9084     |            |       |  0.25
>
> Nlocal:    52.1797 ave 267 max 42 min
> Histogram: 988 15 10 3 4 2 0 0 1 1
> Nghost:    236.644 ave 752 max 208 min
> Histogram: 955 36 13 7 5 4 2 0 1 1
> Neighs:    263.897 ave 347 max 47 min
> Histogram: 5 4 1 6 4 22 305 500 171 6
>
> Total # of neighbors = 270231
> Ave neighs/atom = 5.05747
> Ave special neighs/atom = 0.63243
> Neighbor list builds = 2220
> Dangerous builds = 0
> Total wall time: 0:05:58
>
> # This is the performance report. With the new method, the ave running time
> 28 timesteps/second. Previously, it was 45 timesteps/second.
>
> Wei Peng
> Graduate Student at Rensselaer Polytechnic Institute
> Department of Materials Science and Engineering
> 110 8th Street, Troy, NY 12180
>
> On Tue, Apr 17, 2018 at 9:27 AM, Axel Kohlmeyer <akohlmey@...24...> wrote:
>>
>> On Mon, Apr 16, 2018 at 8:28 PM, Wei Peng <pengwrpi2@...24...> wrote:
>> > Dear Axel,
>> >
>> > Thank you so much for your prompt response!
>> >
>> > Can you tell me how to correctly merge all the reduction?
>> >
>> > I tried this:
>> >
>> > compute ftotal all reduce sum v_fx1 v_fy1 v_fz1 v_fx2 v_fy2 v_fz2 v_fx3
>> > v_fy3 v_fz3 v_fx4 v_fy4 v_fz4 v_fx5 v_fy5 v_fz5 v_fx6 v_fy6 v_fz6 v_fx7
>> > v_fy7 v_fz7 v_fx8 v_fy8 v_fz8
>> >
>> > And I found c_ftotal[1] is wrong, not the same as c_fxsum1 ( which is
>> > from
>> > "compute fxsum1 half1 reduce sum v_fx1"). Ideally, I hope v_fx1 can be
>> > only
>> > defined for group half1, but I don't know how to limit scope of per-atom
>> > variable to a fraction of atoms in the simulation box.
>> >
>> > I am thinking about set fx1 value for all other atoms to be zero
>> > explicitly
>> > (but I still don't know how), and do the merge of reduction as I posted
>> > above. But this is clearly not an elegant solution. What advice do you
>> > have
>> > for either defining v_fx1 exclusively on atoms of half1 group or doing
>> > the
>> > merge of reduction differently?
>>
>> you need to use the gmask(group ID) function in the individual atom
>> style variables that you want to sum over to select the atoms by
>> group. gmask() is 1 for atoms in a group and 0 for atoms outside, but
>> as a per-atom function, it runs perfectly in parallel.
>>
>> axel.
>>
>>
>> >
>> > Thanks again,
>> > Wei
>> >
>> >
>> > Wei Peng
>> > Graduate Student at Rensselaer Polytechnic Institute
>> > Department of Materials Science and Engineering
>> > 110 8th Street, Troy, NY 12180
>> >
>> > On Mon, Apr 16, 2018 at 5:59 PM, Axel Kohlmeyer <akohlmey@...24...>
>> > wrote:
>> >>
>> >> On Mon, Apr 16, 2018 at 4:58 PM, Wei Peng <pengwrpi2@...24...> wrote:
>> >> > Dear LAMMPS administrators or users,
>> >> >
>> >> > I am simulating a system with 8 nanoparticles containing
>> >> > Lennard-Jones
>> >> > atoms
>> >> > connected by FENE bonds. Every step, I applied a force to each atom
>> >> > that
>> >> > are
>> >> > contained in half of each nanoparticle. The force applied to every
>> >> > atom
>> >> > all
>> >> > points to the center of the mass of the whole nanoparticle. To
>> >> > conserve
>> >> > the
>> >> > momentum, I added an opposite force that is applied to the rest of
>> >> > the
>> >> > system. To take the additional energy out of the system, I used
>> >> > Nose-Hoover
>> >> > thermostat.
>> >> >
>> >> > I used fix addforce command to apply the extra force. It turns out it
>> >> > works
>> >> > very well, and it can give the expected result. However, the
>> >> > efficiency
>> >> > of
>> >> > the computation is significantly slowed down by this additional force
>> >> > (~20
>> >> > times slower). I guess the low efficiency is mainly caused by the
>> >> > communication cost, but it turns out not the case (according to the
>> >> > output
>> >> > file produced by LAMMPS).
>> >>
>> >> that output only includes cost of communication for regular known
>> >> communication points
>> >> does not include time spend in communication that is included in
>> >> variable updates or computes.
>> >>
>> >> the obvious cost is for compute reduce. you should be able to cut this
>> >> cost by a significant margin by doing one reduction over three values
>> >> instead of three reductions over one value.
>> >> ...and since you have seven more nanoparticles, you can reduce that
>> >> cost even by combining all reductions into one.
>> >> so right now you seem to be doing 24 reductions, which can be handled
>> >> by just one compute reduce.
>> >>
>> >> another "hidden" reduction operation is in the count(groupID)
>> >> function. if the number doesn't change over the course of a run, you
>> >> can cache it with something like this:
>> >>
>> >> variable nliquid equal $(ncount(liquid))
>> >>
>> >> so by avoiding redundant reductions you should be able to
>> >> significantly reduce the computational effort.
>> >>
>> >> axel.
>> >>
>> >> >
>> >> > I have attached the input file and the output file here. The version
>> >> > of
>> >> > LAMMPS is Aug. 2017.
>> >> >
>> >> > Can you give me any advice on accelerating the computation. Thank you
>> >> > for
>> >> > your help!
>> >> >
>> >> > Below is the input file:
>> >> >
>> >> >
>> >> > compute          coord1 np1 property/atom xu yu zu
>> >> > compute          c1 np1 com
>> >> >
>> >> > # np1 is the group id of nanoparticle 1.
>> >> >
>> >> > variable         famp equal "0.1"
>> >> >
>> >> > variable         dirx1 atom "c_coord1[1]-c_c1[1]"
>> >> > variable         diry1 atom "c_coord1[2]-c_c1[2]"
>> >> > variable         dirz1 atom "c_coord1[3]-c_c1[3]"
>> >> > variable         diramp1 atom "sqrt(v_dirx1^2 + v_diry1^2 +
>> >> > v_dirz1^2)"
>> >> > variable         fx1 atom "v_famp*v_dirx1/v_diramp1"
>> >> > variable         fy1 atom "v_famp*v_diry1/v_diramp1"
>> >> > variable         fz1 atom "v_famp*v_dirz1/v_diramp1"
>> >> > compute          fxsum1 half1 reduce sum v_fx1
>> >> > compute          fysum1 half1 reduce sum v_fy1
>> >> > compute          fzsum1 half1 reduce sum v_fz1
>> >> > fix              addfnp1 half1 addforce v_fx1 v_fy1 v_fz1
>> >> >
>> >> > # "half1" is the group id of the half of nanoparticle 1 that was
>> >> > applied
>> >> > a
>> >> > force to
>> >> > # I did the same thing for other 7 nanoparticles, which was not
>> >> > posted
>> >> > here.
>> >> >
>> >> > variable         fxliquid equal "-1.0
>> >> >
>> >> >
>> >> > *(c_fxsum1+c_fxsum2+c_fxsum3+c_fxsum4+c_fxsum5+c_fxsum6+c_fxsum7+c_fxsum8)/count(liquid)"
>> >> >
>> >> > variable         fyliquid equal "-1.0 *
>> >> >
>> >> >
>> >> > (c_fysum1+c_fysum2+c_fysum3+c_fysum4+c_fysum5+c_fysum6+c_fysum7+c_fysum8)/count(liquid)"
>> >> >
>> >> > variable         fzliquid equal "-1.0 *
>> >> >
>> >> >
>> >> > (c_fzsum1+c_fzsum2+c_fzsum3+c_fzsum4+c_fzsum5+c_fzsum6+c_fzsum7+c_fzsum8)/count(liquid)"
>> >> >
>> >> > fix              addliquid liquid addforce v_fxliquid v_fyliquid
>> >> > v_fzliquid
>> >> >
>> >> > # I summed up the forces applied to the 8 nanoparticles and take the
>> >> > opposite and then add it to every atom in group "liquid".
>> >> >
>> >> > Here is the performance report in the output file:
>> >> >
>> >> > Loop time of 2228.15 on 1024 procs for 100000 steps with 53432 atoms
>> >> >
>> >> > Performance: 38776.610 tau/day, 44.880 timesteps/s
>> >> > 100.0% CPU use with 1024 MPI tasks x 1 OpenMP threads
>> >> >
>> >> > MPI task timing breakdown:
>> >> > Section |  min time  |  avg time  |  max time  |%varavg| %total
>> >> > ---------------------------------------------------------------
>> >> > Pair    | 3.1147     | 3.4368     | 3.6571     |   4.7 |  0.15
>> >> > Bond    | 0.14084    | 0.74264    | 5.9983     | 113.4 |  0.03
>> >> > Neigh   | 152.64     | 153.88     | 154.96     |   4.0 |  6.91
>> >> > Comm    | 39.216     | 47.364     | 63.729     |  80.0 |  2.13
>> >> > Output  | 8.1437     | 8.8403     | 9.5221     |  13.4 |  0.40
>> >> > Modify  | 1982.8     | 1999.6     | 2007.8     |  12.5 | 89.74
>> >> > Other   |            | 14.33      |            |       |  0.64
>> >> >
>> >> > Nlocal:    52.1797 ave 478 max 40 min
>> >> > Histogram: 1010 6 2 2 0 2 0 1 0 1
>> >> > Nghost:    236.358 ave 1235 max 204 min
>> >> > Histogram: 993 16 4 2 3 3 1 1 0 1
>> >> > Neighs:    261.92 ave 343 max 63 min
>> >> > Histogram: 2 3 4 4 5 66 355 429 145 11
>> >> >
>> >> > Total # of neighbors = 268206
>> >> > Ave neighs/atom = 5.01958
>> >> > Ave special neighs/atom = 0.63243
>> >> > Neighbor list builds = 22088
>> >> > Dangerous builds = 0
>> >> > Total wall time: 0:37:10
>> >> >
>> >> >
>> >> > Sincerely,
>> >> > Wei
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Check out the vibrant tech community on one of the world's most
>> >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >> > _______________________________________________
>> >> > lammps-users mailing list
>> >> > lammps-users@lists.sourceforge.net
>> >> > https://lists.sourceforge.net/lists/listinfo/lammps-users
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
>> >> College of Science & Technology, Temple University, Philadelphia PA,
>> >> USA
>> >> International Centre for Theoretical Physics, Trieste. Italy.
>> >
>> >
>>
>>
>>
>> --
>> Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
>> College of Science & Technology, Temple University, Philadelphia PA, USA
>> International Centre for Theoretical Physics, Trieste. Italy.
>
>



-- 
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.