LAMMPS WWW Site - LAMMPS Documentation - LAMMPS Mailing List Archives
[lammps-users] Notes on porting pair styles to KOKKOS
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lammps-users] Notes on porting pair styles to KOKKOS


From: Stefan Paquay <stefanpaquay@...24...>
Date: Fri, 17 Nov 2017 14:14:03 -0500

Hi all (and the kokkos folks in particular),

Today I ran into the fact that pair_yukawa was GPU-enabled but not KOKKOS-enabled, so I ported it, following the approach I followed for pair_morse.
This time, however, I was smart enough to write down notes about the porting process, which will probably be of use to anyone that is interested in porting pair styles to kokkos.

I have attached the notes here. I also ran into some questions, which I will put here:
  - Why does Pair*Kokkos check narg before calling Pair*::settings? This seems like a violation of Don't Repeat Yourself?
  - A lot of it is just boiler-plating and renaming. Can we template more?
    I know this will make compile times even worse but it will save a lot of copy paste errors. My C++ metaprogramming is not good enough to do this I think.
  - There is no factor_lj in compute_fpair/compute_evdwl. Are these applied later?
  - Which headers are really required? I just included everything pair_morse_kokkos.cpp also included, but it might be worth filtering out some.

I am thinking it would be useful to have this floating around the website as a tutorial, because I can imagine other people might want to take advantage of GPU acceleration for their pair styles. Although KOKKOS looks intimidating, it really is  mostly a copy-paste job...

I will file a pull request for pair_yukawa_kokkos.{cpp,h} later.

Hope this helps,
Stefan

P.S.: Are there any plans to have KOKKOS target OpenCL in the future?
This document briefly describes how to port simple pair styles to KOKKOS.
While KOKKOS looks very intimidating, I have come to realize that porting
simple truncated pair styles is not that hard because it is pretty much a
copy paste job.

I am writing this working with pair_style yukawa as an example.
!!   THIS EXAMPLE WILL ONLY WORK WITH TRUNCATED      !!
!!   PAIR STYLES THAT DO NOT NEED LONG RANGE SOLVERS !!

 1. Copy the header of the pair style you want to port to a kokkos-named file:
    cp pair_yukawa.h pair_yukawa_kokkos.h

    Change the include guard name, and include pair_kokkos.h,
    neigh_list_kokkos.h and the pair style you are kokkos-ifying.

 2. Your new class should
   a. inherit the original pair publicly
   b. be templated for a class DeviceType:
     template<class DeviceType>
     class PairYukawaKokkos : public PairYukawa { ... };

    Don't forget to rename the constructor and destructor!


 3. Make sure you "declare" three pair styles at the top:
    PairStyle(yukawa/kk, PairYukawaKokkos<LMPDeviceType>)
    PairStyle(yukawa/kk/device, PairYukawaKokkos<LMPDeviceType>)
    PairStyle(yukawa/kk/host, PairYukawaKokkos<LMPHostType>)

 4. Make sure the original pair you are inheriting from has its destructor
    declared virtual! THIS IS THE ONLY CHANGE YOU SHOULD BE MAKING TO THE
    ORIGINAL PAIR STYLE!

 5. Your Kokkos-style should override the following member functions:
     - void compute(int, int);
     - void settings(int, char**);
     - double init_one(int,int);

 6. It helps if your style also has the following helper member functions:
     - void init_style();

 7. Add a struct publicly for the potential parameters with some KOKKOS macros:
    struct params_yukawa {
      KOKKOS_INLINE_FUNCTION
      params_yukawa(){ cutsq=0, a = 0, offset = 0; }
      KOKKOS_INLINE_FUNCTION
      params_yukawa(int i){ params_yukawa(); }
      F_FLOAT cutsq, a, offset;
    };
    You should only add the pointer types here that store the coefficients for
    pair interactions, _not_ "global" values like kappa or cut_global that
    appear in the pair_style command instead.

------   Now we get to the KOKKOS magic   -------

 8. Add two public enums and a typedef:
     - enum {EnabledNeighFlags=FULL|HALFTHREAD|HALF|N2};
     - enum {COUL_FLAG=0};
     - typedef DeviceType device_type;

 9. Add the following protected member functions. compute_ecoul return 0 since
    this pair style only has a short-ranged "Van der Waals" (VDWL) part.
     - void cleanup_copy();
     - template<bool STACKPARAMS, class Specialisation>
       KOKKOS_INLINE_FUNCTION
       F_FLOAT compute_fpair(const F_FLOAT& rsq, const int& i, const int&j,
                             const int& itype, const int& jtype) const;

     - template<bool STACKPARAMS, class Specialisation>
       KOKKOS_INLINE_FUNCTION
       F_FLOAT compute_evdwl(const F_FLOAT& rsq, const int& i, const int&j,
                             const int& itype, const int& jtype) const;
     - template<bool STACKPARAMS, class Specialisation>
       KOKKOS_INLINE_FUNCTION
       F_FLOAT compute_ecoul(const F_FLOAT& rsq, const int& i, const int&j,
                             const int& itype, const int& jtype) const
       { return 0; }

10. Copy-paste this whole snippet, making sure you replace params_yukawa with
    the name of your param struct and PairYukawaKokkos with the name of your
    KOKKOS pair style:

    "
    Kokkos::DualView<params_yukawa**,Kokkos::LayoutRight,DeviceType> k_params;
    typename Kokkos::DualView<params_yukawa**,Kokkos::LayoutRight,DeviceType>::t_dev_const_um params;
    params_yukawa m_params[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
    F_FLOAT m_cutsq[MAX_TYPES_STACKPARAMS+1][MAX_TYPES_STACKPARAMS+1];
    typename ArrayTypes<DeviceType>::t_x_array_randomread x;
    typename ArrayTypes<DeviceType>::t_x_array c_x;
    typename ArrayTypes<DeviceType>::t_f_array f;
    typename ArrayTypes<DeviceType>::t_int_1d_randomread type;

    DAT::tdual_efloat_1d k_eatom;
    DAT::tdual_virial_array k_vatom;
    typename ArrayTypes<DeviceType>::t_efloat_1d d_eatom;
    typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
    typename ArrayTypes<DeviceType>::t_tagint_1d tag;

    int newton_pair;
    double special_lj[4];

    typename ArrayTypes<DeviceType>::tdual_ffloat_2d k_cutsq;
    typename ArrayTypes<DeviceType>::t_ffloat_2d d_cutsq;


    int neighflag;
    int nlocal,nall,eflag,vflag;

    void allocate();
    friend class PairComputeFunctor<PairYukawaKokkos,FULL,true>;
    friend class PairComputeFunctor<PairYukawaKokkos,HALF,true>;
    friend class PairComputeFunctor<PairYukawaKokkos,HALFTHREAD,true>;
    friend class PairComputeFunctor<PairYukawaKokkos,N2,true>;
    friend class PairComputeFunctor<PairYukawaKokkos,FULL,false>;
    friend class PairComputeFunctor<PairYukawaKokkos,HALF,false>;
    friend class PairComputeFunctor<PairYukawaKokkos,HALFTHREAD,false>;
    friend class PairComputeFunctor<PairYukawaKokkos,N2,false>;
    friend EV_FLOAT pair_compute_neighlist<PairYukawaKokkos,FULL,void>(PairYukawaKokkos*,NeighListKokkos<DeviceType>*);
    friend EV_FLOAT pair_compute_neighlist<PairYukawaKokkos,HALF,void>(PairYukawaKokkos*,NeighListKokkos<DeviceType>*);
    friend EV_FLOAT pair_compute_neighlist<PairYukawaKokkos,HALFTHREAD,void>(PairYukawaKokkos*,NeighListKokkos<DeviceType>*);
    friend EV_FLOAT pair_compute_neighlist<PairYukawaKokkos,N2,void>(PairYukawaKokkos*,NeighListKokkos<DeviceType>*);
    friend EV_FLOAT pair_compute<PairYukawaKokkos,void>(PairYukawaKokkos*,NeighListKokkos<DeviceType>*);
    friend void pair_virial_fdotr_compute<PairYukawaKokkos>(PairYukawaKokkos*);
    "

------  That was it for the header. Don't worry,  -------
------  it doesn't get more difficult than this.  -------

11. Add a contributing author section to get credit for your hard work.

12. Include your new kokkos header as well as some other required headers:
    #include "pair_yukawa_kokkos.h"
    #include "kokkos.h"
    #include "atom_kokkos.h"

13. Use the namespace MathConst and define the following macros:
    using namespace MathConst;
    #define KOKKOS_CUDA_MAX_THREADS 256
    #define KOKKOS_CUDA_MIN_BLOCKS 8

14. Implement the constructor and destructor:

    template<class DeviceType>
    PairYukawaKokkos<DeviceType>::PairYukawaKokkos(LAMMPS *lmp) : PairYukawa(lmp)
    {
      respa_enable = 0;

      atomKK = (AtomKokkos *) atom;
      execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
      datamask_read = X_MASK | F_MASK | TYPE_MASK | ENERGY_MASK | VIRIAL_MASK;
      datamask_modify = F_MASK | ENERGY_MASK | VIRIAL_MASK;
      cutsq = NULL;
    }

    /* ---------------------------------------------------------------------- */

    template<class DeviceType>
    PairYukawaKokkos<DeviceType>::~PairYukawaKokkos()
    {
      if (allocated) {
        memory->destroy_kokkos(k_eatom,eatom);
        memory->destroy_kokkos(k_vatom,vatom);
        k_cutsq = DAT::tdual_ffloat_2d();
        memory->sfree(cutsq);
        eatom = NULL;
        vatom = NULL;
        cutsq = NULL;
      }
    }

15. Implement the following "boiler-plate" for allocation and cleanup of copies:

    /* ---------------------------------------------------------------------- */
    template<class DeviceType>
    void PairYukawaKokkos<DeviceType>::cleanup_copy() {
      // WHY needed: this prevents parent copy from deallocating any arrays
      allocated = 0;
      cutsq = NULL;
      eatom = NULL;
      vatom = NULL;
    }

    /* ----------------------------------------------------------------------
       allocate all arrays
    ------------------------------------------------------------------------- */

    template<class DeviceType>
    void PairYukawaKokkos<DeviceType>::allocate()
    {
      PairYukawa::allocate();

      int n = atom->ntypes;
      memory->destroy(cutsq);
      memory->create_kokkos(k_cutsq,cutsq,n+1,n+1,"pair:cutsq");
      d_cutsq = k_cutsq.template view<DeviceType>();
      k_params = Kokkos::DualView<params_yukawa**,
                                  Kokkos::LayoutRight,DeviceType>(
    	                              "PairYukawa::params",n+1,n+1);
      params = k_params.template view<DeviceType>();
    }

16. Implement member functions that set up the style and coefficients. Pay
    special attention to how init_one accesses the parameters, and make sure
    settings checks for the right number of args (it should match the normal
    style, so check the normal PairStyle to make sure the numbers match.

/* ----------------------------------------------------------------------
   global settings
------------------------------------------------------------------------- */

template<class DeviceType>
void PairYukawaKokkos<DeviceType>::settings(int narg, char **arg)
{
  if (narg > 2) error->all(FLERR,"Illegal pair_style command");

  PairYukawa::settings(1,arg);
}

------  As you can see, most of the work we did is simply replacing   ------
------  one pair name for the other and changing the names of the     ------
------  pair parameters. It is a lot of typing but not "hard". The    ------
------  last part is porting the compute member function.             ------

17. Port the compute member function. In KOKKOS, this function delegates most
    of the calculations to the compute_fpair, compute_evdwl and compute_coul
    functors (the last one returning 0 because there is no long-range force).
    For compute itself you really only need to rename the pair name.

    template<class DeviceType>
    void PairYukawaKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
    {
      eflag = eflag_in;
      vflag = vflag_in;


      if (neighflag == FULL) no_virial_fdotr_compute = 1;

      if (eflag || vflag) ev_setup(eflag,vflag,0);
      else evflag = vflag_fdotr = 0;

      // reallocate per-atom arrays if necessary

      if (eflag_atom) {
        memory->destroy_kokkos(k_eatom,eatom);
        memory->create_kokkos(k_eatom,eatom,maxeatom,"pair:eatom");
        d_eatom = k_eatom.view<DeviceType>();
      }
      if (vflag_atom) {
        memory->destroy_kokkos(k_vatom,vatom);
        memory->create_kokkos(k_vatom,vatom,maxvatom,6,"pair:vatom");
        d_vatom = k_vatom.view<DeviceType>();
      }

      atomKK->sync(execution_space,datamask_read);
      k_cutsq.template sync<DeviceType>();
      k_params.template sync<DeviceType>();
      if (eflag || vflag) atomKK->modified(execution_space,datamask_modify);
      else atomKK->modified(execution_space,F_MASK);

      x = atomKK->k_x.view<DeviceType>();
      c_x = atomKK->k_x.view<DeviceType>();
      f = atomKK->k_f.view<DeviceType>();
      type = atomKK->k_type.view<DeviceType>();
      tag = atomKK->k_tag.view<DeviceType>();
      nlocal = atom->nlocal;
      nall = atom->nlocal + atom->nghost;
      newton_pair = force->newton_pair;
      special_lj[0] = force->special_lj[0];
      special_lj[1] = force->special_lj[1];
      special_lj[2] = force->special_lj[2];
      special_lj[3] = force->special_lj[3];

      // loop over neighbors of my atoms

      EV_FLOAT ev = pair_compute<PairYukawaKokkos<DeviceType>,void >(
    	  this,(NeighListKokkos<DeviceType>*)list);

      if (eflag_global) eng_vdwl += ev.evdwl;
      if (vflag_global) {
        virial[0] += ev.v[0];
        virial[1] += ev.v[1];
        virial[2] += ev.v[2];
        virial[3] += ev.v[3];
        virial[4] += ev.v[4];
        virial[5] += ev.v[5];
      }

      if (vflag_fdotr) pair_virial_fdotr_compute(this);

      if (eflag_atom) {
        k_eatom.template modify<DeviceType>();
        k_eatom.template sync<LMPHostType>();
      }

      if (vflag_atom) {
        k_vatom.template modify<DeviceType>();
        k_vatom.template sync<LMPHostType>();
      }
    }

18. Port compute_fpair, compute_evdwl and compute_coul, when/if necessary.
    Again, rename the pair and parameter names, but _do not_ delegate the
    computation to the original pair style! Just "copy-paste" the formula
    as much as possible.
    !! NOTE THAT compute_fpair DOES _NOT_ RETURN THE FORCE BUT FORCE/R !!

19. The final thing is to define the template specialization in your source
    file by adding the following at the bottom (note the LAMMPS_NS namespace!)

    namespace LAMMPS_NS {
    template class PairYukawaKokkos<LMPDeviceType>;
    #ifdef KOKKOS_HAVE_CUDA
    template class PairYukawaKokkos<LMPHostType>;
    #endif
    }


20. That, in principle, is it. Try to compile LAMMPS and see if it works.
    If you get compilation errors, something is wrong but it is a real pain
    to sift through all the compiler warning messages. Silly things like a
    typo in a variable inm compute_evdwl can lead to 400 pages of compiler
    output, but the actual kokkos errors are fairly explanatory. Just compile
    like this to filter out all the cuda warnings:
    make kokkos_cuda_mpi 2>&1 | grep error
    can be useful, and keep in mind that the chances of making mistakes in
    copying the boiler-plate should be fairly limited as
    the only real changes are in the parameter and class names.


Some general pointers:
  - If you get segfaults after a run, it is probably because some of the member
    functions you implement in the KOKKOS pair style are not virutal.
    Notorious ones are the destructor and void allocate().
    They _need_ to be virtual in the _base_ class! If allocate is not virtual
    in the base class, you tend to get segfaults during init().


Notes to self/ask Stan:
  - Why does Pair*Kokkos check narg before calling settings?
  - A lot of it is just boiler-plating and renaming. Can we template more?
    I know compile times will suck but...
  - There is no factor_lj in compute_fpair/compute_evdwl. Are these applied later?
  - Which headers are really required?