|From:||"Meij, Henk" <hmeij@...1881...>|
|Date:||Thu, 3 Aug 2017 13:11:08 +0000|
Strongly recommend you try some test drives. We made CPU benchmarks of our "typical jobs" then went to two different vendors and tried out their GPUs. Was a totally positive experience with both setups.
Thank you for the detailed reply. I will flesh out the science first in that case before worrying about the performance of the simulation.
Thank you Dr. Kohlmeyer for the advice.
On Wed, Aug 2, 2017 at 2:53 PM, ke shen <ikevinshen2@...24...> wrote:
Dear LAMMPS Developer,
Hi, I'm Ke and I'm currently using LAMMPS to run simulations on the effects of polymer additives (like PLA and PCDTBT) on the morphology of a perovskite active layer.
please avoid using acronyms unless they are obvious to your audience.
I have a couple questions regarding the acceleration of the performance of LAMMPS. I'm currently running my graphics cards on a system running Ubuntu 17.04. As someone fairly new to the doing computational chemistry, I would really appreciate your answers, however brief.
1) The GPUs I have are not specialized for computing, they are Geforce GPUs and thus lack FP64 cores (1/32 as many FP64 cores as FP32), meaning that double precision performance is pretty poor on them. If I ran the simulation in single-precision produce unacceptable amounts of error? The main goal of the simulation is to see what's happening with the PLA around the perovskite crystals. Alternatively, if double-precision is something that is generally required, what's the most cost effective GPU accelerator for this purpose?
nobody can tell up front and without making tests whether your simulation will provide accurate results. single-precision vs. double precision math is just one part of it. please note, that LAMMPS currently has two options for GPU acceleration: the GPU package and the KOKKOS package. the GPU package can be compiled for single-precision, mixed precision (most operations and in single, but critical ones, like summing the forces are done in double precision) and double precision. the KOKKOS package, as far as i remember, currently only supports double precision. which of the two, if any, is applicable to your system depends on the force field you are using. beyond that, in case you are using the GPU package, you will have to make tests and see what kind of differences you get on different observables.for example the stress tensor (and thus the accuracy of variable cell simulations) has typically a much larger error than forces.
but i also have a feeling, that you are putting the carriage before the horse. before even considering GPUs, you should validate your force field choices with small test simulations and learn how to reproduce published data on the CPU that way.
in my personal experience, for current hardware, the most cost effective GPU with full double precision support is called a CPU.
Also, from other threads I've seen LAMMPS doesn't play well with current-generation (Pascal) Nvidia graphics cards... is there something I need to play with in the GPU package to make it work correctly?
most reported GPU problems can be tracked down to people making mistakes when compiling LAMMPS or setting up their machines. to compile for and run/use GPUs correctly *and* efficiently(!) requires significantly more technical skills than running LAMMPS on the CPU.
2) Even when running lmp_serial/lmp_mpi with the appropriate environmental variable set such that it is using 12 threads (I have an old 12 core Xeon), the processor only has ~10% utilization. On the other hand, LAMMPS shows ~99% CPU usage. Which of these should I trust and am I not optimizing my system enough?
impossible to say with such limited information. you are most likely not utilizing styles that are thread enabled or have not correctly compiled LAMMPS for that.please also note, that for an MD code like LAMMPS, the MPI parallelization is - by construction - usually more efficient than multi-threading until you saturate the memory or communication infrastructure with message passing data. e.g on a dual socket 6-core xeon box, LAMMPS is often the most efficient with 4-MPI tasks per node plus 3 threads each or 6 MPI plus 2 threads each. again, what is the best choice depends a lot on your system and your hardware, so there is no simple "do this not that" type of advice. benchmarking is the best way to find out. and i have to repeat: before worrying about performance, worry about the science. a fast running simulation that produces garbage for results is useless.
3) How would I run LAMMPS at the granular level, e.g. work with groups of atoms (instead of individual ones) in order to simulate grain boundaries? I've also heard that Voronoi tessellation is necessary in order to achieve PLA woven in between perovskite crystals but I'm not sure if that would be the best way to approach it.
i cannot make any sense out of this question. it overall looks to me, that you need a *lot* of help from your adviser/supervisor and work on the science first. you seem very eager to move to advanced issues regarding your simulations, but it looks like you are skipping over far too many basic skills and exercises that you should be doing to learn the tool (i.e. simulation) properly before applying it to what appears to be a quite challenging and complex simulation task. could it be, that you are underestimating the difficulty of performing good simulations? running simulations is the easiest part. planning them and planning them so, that you can extract useful and dependable results in analysis of your simulations is what makes good MD studies.
Thank you so much for taking the time to reply,
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
lammps-users mailing list