|From:||Axel Kohlmeyer <akohlmey@...24...>|
|Date:||Mon, 4 Dec 2017 09:13:58 -0500|
Good morning Axel,
your example took on my desktop around 2 seconds.
Using this example on my system with 1632 steps tooks around 13 seconds,
running it with:
for i in range(1632): lmps.command("run 0")
gives around 23 seconds. The problem with running "run 0 pre no" that by doing
so I get for every displacement the same forces and erngies, since the system
is not updated in between. I think the solution is to run "run 1 pre no" for
every displacement, which takes around 14 seconds.
Using "run 1 pre no" during the hessian evaluation gives the same result as doing
it with "run 0", so I think I should do it in the future with "run 1 pre no" and
Or do you have any other/further suggestions?
P.S.: Greetings from Rochus Schmid from Bochum
On 12/03/2017 08:44 PM, Axel Kohlmeyer wrote:
before getting lost in various technical details, you should establish a bottom line, i.e. how fast can your calculation at best get.
for that take your system setup, remove any time integration, tell it to only do the neighbor list build once, and then run those 1632 time steps.the total time of that is the cost of computing forces 1632 times. is this significantly faster than the 23 seconds you currently are seeing?
here is an example based on the melt input (with a more reasonable cutoff).
units ljatom_style atomic
lattice fcc 0.8442region box block 0 5 0 5 0 3create_box 1 boxcreate_atoms 1 boxmass 1 1.0
velocity all create 3.0 87287
pair_style lj/cut 4.0pair_coeff 1 1 1.0 1.0
neighbor 0.3 binneigh_modify once yes
this takes on my desktop about 1 second.
On Sun, Dec 3, 2017 at 1:53 PM, Johannes P. Dürholt <johannes.duerholt@...7271....> wrote:
Hi Axel and Giacomo,
thanks to your suggestions I had again a look on the timings.
My question arose from the fact that when I was running 1632 (6*number of atoms as necessary for a double sided finite difference Hessian) of NVE of my system it took me around 13 seconds, and when I was calling 1632 "run 0 post no" it took around 23 seconds.
Thanks to Axels last post, I know now that I should use "run 0 pre no post no", which is much faster. But then a new problem arises: I get for every distortion the same energy and forces ...
this may have been bad advice. this may be the choice when actually doing propagation with "run 1". i rarely use the library interface and people use it in very different ways, so it is easy to get confused.
so please make sure, that you have no time integration, and then try with: "run 1 pre no post no" instead. also turning the timer and display off, can help to improve performance.
if i replace in the example above, the "run 1800" with this explicit loop:
timer offlabel loopvariable i loop 300label innervariable x loop 6run 1 pre no post nonext xjump SELF innernext ijump SELF loop
the time used increases to about 1.6 seconds.
so this would be the bottom line time consumed by running 6*natoms individual force computations.
In the manual I found the following sentence:
"If your input script changes the system between 2 runs, then the initial setup must be performed to insure the change is recognized by all parts of the code that are affected."
And this means that I have to run with "pre yes" or?possibly yes, but let's not forget about amdahl's law and determine the part of your calculation that is the most time consuming first.the first rule in high-performance computing is, that there is no point in optimizing any part of a code that is not consuming time.
now you need to determine where your largest time consumption is. is it the force computation? is it the data transfer between python and LAMMPS? is it the post-processing computation in python? if you want to replace python code by C++ code, you need to make certain, that the part of the code that you plan to convert is actually the part that is consuming the time.
also, as giacomo mentioned, there is a lot of performance to be gained by writing the python code to be NumPy/SciPy "friendly" and minimize enforced data copies and conversions as well as traversing inefficient data structures. we're talking orders of magnitude here.
since you haven't posted any representative examples, it is difficult to assess what the actual problem is. there are many details that can matter and make a difference.
Or do I do something wrong/bad/stupid?
-- Johannes P. Dürholt Computational Materials Chemistry Group Chair of Inorganic Chemistry II, NC 02/32 Ruhr-Universität Bochum Universitätsstr. 150 D-44780 Bochum Germany Tel.: +49-234-32-24372 E-mail: johannes.duerholt@...455...
-- Johannes P. Dürholt Computational Materials Chemistry Group Chair of Inorganic Chemistry II, NC 02/32 Ruhr-Universität Bochum Universitätsstr. 150 D-44780 Bochum Germany Tel.: +49-234-32-24372 E-mail: johannes.duerholt@...33....455...