LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
Re: [lammps-users] Implementation of a numerical Hessian via a new fix

# Re: [lammps-users] Implementation of a numerical Hessian via a new fix

 From: Axel Kohlmeyer Date: Mon, 4 Dec 2017 09:13:58 -0500

On Mon, Dec 4, 2017 at 4:28 AM, Johannes P. Dürholt wrote:

Good morning Axel,

your example took on my desktop around 2 seconds.

Using this example on my system with 1632 steps tooks around 13 seconds,
running it with:

for i in range(1632): lmps.command("run 0")

gives around 23 seconds. The problem with running "run 0 pre no" that by doing
so I get for every displacement the same forces and erngies, since the system
is not updated in between. I think the solution is to run "run 1 pre no" for
every displacement, which takes around 14 seconds.

Using "run 1 pre no" during the hessian evaluation gives the same result as doing
it with "run 0", so I think I should do it in the future with "run 1 pre no" and
be fine.

Or do you have any other/further suggestions?

​all suggestions and advice that i have was already included in my previous e-mail. what you do with it, what conclusions you draw, what potential improvement you can get from moving more of the computation into the C++ code, and in general how you proceed is all your choice.​ the steps i have outlined are straightforward, and the conclusions taken from that should be, too.

Best

Johannes

P.S.: Greetings from Rochus Schmid from Bochum

​please send him my regards. it's been a while...

axel.​

On 12/03/2017 08:44 PM, Axel Kohlmeyer wrote:
before getting lost in various technical details, you should establish a bottom line, i.e. how fast can your calculation at best get.

for that take your system setup, remove any time integration, tell it to only do the neighbor list build once, and then run those 1632 time steps.
the total time of that is the cost of computing forces 1632 times. is this significantly faster than the 23 seconds you currently are seeing?

here is an example based on the melt input (with a more reasonable cutoff).

units lj
atom_style atomic

lattice fcc 0.8442
region box block 0 5 0 5 0 3
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 3.0 87287

pair_style lj/cut 4.0
pair_coeff 1 1 1.0 1.0

neighbor 0.3 bin
neigh_modify once yes

run 1800

this takes on my desktop about 1 second.

On Sun, Dec 3, 2017 at 1:53 PM, Johannes P. Dürholt wrote:

Hi Axel and Giacomo,

thanks to your suggestions I had again a look on the timings.

My question arose from the fact that when I was running 1632 (6*number of atoms as necessary for a double sided finite difference Hessian) of NVE of my system it took me around 13 seconds, and when I was calling 1632 "run 0 post no" it took around 23 seconds.

Thanks to Axels last post, I know now that I should use "run 0 pre no post no", which is much faster. But then a new problem arises: I get for every distortion the same energy and forces ...

​this may have been bad advice. this may be the choice when actually doing propagation with "run 1​". i rarely use the library interface and people use it in very different ways, so it is easy to get confused.

so please make sure, that you have no time integration, and then try with: "run 1 pre no post no" instead. also turning the timer and display off, can help to improve performance.

if i replace in the example above, the "run 1800" with this explicit loop:

timer off
label loop
variable i loop 300
label inner
variable x loop 6
run 1 pre no post no
next x
jump SELF inner
next i
jump SELF loop

the time used increases to about 1.6 seconds.

so this would be the bottom line time consumed by running 6*natoms individual force computations.

In the manual I found the following sentence:

"If your input script changes the system between 2 runs, then the initial setup must be performed to insure the change is recognized by all parts of the code that are affected."

And this means that I have to run with "pre yes" or?

​possibly yes, but let's not forget about amdahl's law and determine the part of your calculation that is the most time consuming first.
the first rule in high-performance computing is, that there is no point in optimizing any part of a code that is not consuming ​time.

​now you need to determine where your largest time consumption is. is it the force computation? is it the data transfer between python and LAMMPS? is it the post-processing computation in python? if you want to replace python code by C++ code, you need to make certain, that the part of the code that you plan to convert is actually the part that is​ consuming the time.

also, as giacomo mentioned, there is a lot of performance to be gained by writing the python code to be NumPy/SciPy "friendly" and minimize enforced data copies and conversions as well as traversing inefficient data structures. we're talking orders of magnitude here.

since you haven't posted any representative examples, it is difficult to assess what the actual problem is. there are many details that can matter and make a difference.

axel.

Or do I do something wrong/bad/stupid?

Best

Johannes

```--
Johannes P. Dürholt
Computational Materials Chemistry Group
Chair of Inorganic Chemistry II, NC 02/32
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum
Germany

Tel.: +49-234-32-24372
E-mail: johannes.duerholt@...455...```

--
Dr. Axel Kohlmeyer  akohlmey@...24...  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

```--
Johannes P. Dürholt
Computational Materials Chemistry Group
Chair of Inorganic Chemistry II, NC 02/32
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum
Germany

Tel.: +49-234-32-24372
E-mail: johannes.duerholt@...33....455...```

--
Dr. Axel Kohlmeyer  akohlmey@...92......  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.