Skip to content

Manipulating populations with postprocessors on GPU

Due to recent bot attacks we have changed the sign-up process. If you want to participate in our forum, first register on this website and then send a message via our contact form.

Forums OpenLB General Topics Manipulating populations with postprocessors on GPU

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #10933
    jmctighe
    Participant

    Dear OpenLB Community,

    I have been working on creating a shear periodic (Lees-Edwards) boundary condition postprocessor in order to run homogeneous shear turbulence (HST) simulations, largely following the work of Peng, Wu, Wang, and Li (2023). I have been able to successfully run simulations on a single CPU core but I am running into issues running with MPI and on a single GPU.

    Regarding parallelization, the shear periodic boundary condition requires retrieving cell data across the domain, outside of the block lattice of the cell being operated on. It makes sense that there would be problems here without addressing the communication of the cell data across blocks.

    For using the postprocessor on a GPU, I am experiencing a different problem. There are several steps in the postprocessing algorithm that require obtaining and manipulating the populations at a given cell. For example, one step involves linearly interpolating the populations at a point in space between two cells with the following function:

    template <typename CELL, typename V = typename CELL::value_t, typename DESCRIPTOR = typename CELL::descriptor_t>
      void interpolateDF(CELL& cell, Vector<V,DESCRIPTOR::d> distance, V f_tilde[DESCRIPTOR::q]) any_platform{
        Vector<V,DESCRIPTOR::d> point1;
        for( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
          point1[iD] = util::floor(distance[iD]);
        }
        V q = distance[0] - point1[0];
        Vector<V,DESCRIPTOR::d> point2 = point1;
        point2[0] += 1;
        auto f_x1 = cell.neighbor(point1);
        auto f_x2 = cell.neighbor(point2);
    
        for(int iPop = 0; iPop < DESCRIPTOR::q; iPop++ ) {
          f_tilde[iPop] = f_x1[iPop]*(V{1}-q) + f_x2[iPop]*q;
        }
      };

    The code will compile successfully for a single GPU, but running it results in a segmentation fault due to OpenLB deciding to use olb::cpu::Cell<T,DESCRIPTOR> as the cell object. I suspect this is because I am using the operator[] for Cell which is not a part of olb::gpu::cuda::Cell<T,DESCRIPTOR>. I’ve noticed that olb::gpu::cuda::DataOnlyCell<T,DESCRIPTOR> does have the functionality of the operator[], but I am unsure of how to utilize that. There is also a later step in the algorithm that involves setting the value of the populations at a neighboring cell which looks like cell.neighbor(-1 * normal)[iPop] = f_tilde[iPop] - f_eq_xm[iPop] + f_eq_xt[iPop]; which I suspect has a similar problem as the above function.

    At the moment, I am more so interested in getting the postprocessor to work on a single GPU rather than parallelization. Any help on the GPU implementation would be greatly appreciated. I can include the entire postprocessor script if necessary.

    Thank You,
    jmctighe

    #10934
    Adrian
    Keymaster

    How do you compile exactly? If the cpu::Cell is used this is not running on GPU. Square bracket access to populations is definitely implemented for the GPU cells. The way the transparent execution works is that the operator apply is instantiated for the platform-specific cells. They all share the same outer functionality just implemented differently.

    The code you posted should work on GPU (assuming it was called correctly in the apply (can you also show the code of this or ideally your entire post processor?)

    Right now I suspect that it is just a compilation issue.

    #10937
    jmctighe
    Participant

    Hi Adrian,

    Here are the contents of the config.mk file currently setup for debugging (the same problem persists with -O3 optimization):

    CXX             := nvcc
    CC              := nvcc
    
    CXXFLAGS        := -O0
    CXXFLAGS        += -g -G -std=c++20 --forward-unknown-to-host-compiler
    
    PARALLEL_MODE   := NONE
    
    PLATFORMS       := CPU_SISD GPU_CUDA
    
    # for e.g. RTX 30* (Ampere), see table in <code>rules.mk</code> for other options
    CUDA_ARCH       := 90
    
    FLOATING_POINT_TYPE := float
    
    USE_EMBEDDED_DEPENDENCIES := ON

    Below is the code for the entire post processor. The post processor itself is added in a function called “setShearPeriodic” which is very similar to the setTurbulentWallModel function for the implementation of the turbulent wall model post processor. I am using the same portion of code in setTurbulentWallModel to compute the surface normal vectors before adding the post processor.

    #ifndef SHEAR_PERIODIC_POST_PROCESSOR_H
    #define SHEAR_PERIODIC_POST_PROCESSOR_H
    
    #include "core/util.h"
    #include <math.h>
    
    namespace olb {
    
      template <typename CELL, typename V = typename CELL::value_t, typename DESCRIPTOR = typename CELL::descriptor_t>
      void interpolateDF(CELL& cell, Vector<V,DESCRIPTOR::d> distance, V f_tilde[DESCRIPTOR::q]) any_platform{
        Vector<V,DESCRIPTOR::d> point1;
        for( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
          point1[iD] = util::floor(distance[iD]);
        }
        V q = distance[0] - point1[0];
        Vector<V,DESCRIPTOR::d> point2 = point1;
        point2[0] += 1;
        auto f_x1 = cell.neighbor(point1);
        auto f_x2 = cell.neighbor(point2);
    
        for(int iPop = 0; iPop < DESCRIPTOR::q; iPop++ ) {
          f_tilde[iPop] = f_x1[iPop]*(V{1}-q) + f_x2[iPop]*q;
        }
        
        
      };
      
    //======================================================================
    // ========== Shear Periodic (Lees-Edwards) BC PostProcessor =========//
    //======================================================================
    class shearPeriodicPostProcessor {
    public:
      static constexpr OperatorScope scope = OperatorScope::PerCellWithParameters;
      using parameters = meta::list<descriptors::LATTICE_TIME>;
    
      int getPriority() const {
        return 1;
      }
    
      template <typename CELL, typename PARAMETERS, typename V = typename CELL::value_t>
      void apply(CELL& cell, PARAMETERS& parameters) any_platform {
        // Step 1: Find location x_m (y2)
        using DESCRIPTOR = typename CELL::descriptor_t;
        int iT = parameters.template get<descriptors::LATTICE_TIME>();
        auto extents = cell.template getField<DOMAIN_EXTENT>();
        auto latticeShear = cell.template getField<LATTICE_SHEAR>();
        auto y1 = cell.template getField<descriptors::Y1>();
        Vector<V,DESCRIPTOR::d> latticeR = cell.template getField<descriptors::LOCATION>();
    
        V y1Norm = util::norm<DESCRIPTOR::d>(y1);
        Vector<V,DESCRIPTOR::d> normal;
        Vector<V,DESCRIPTOR::d> tangent;
        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
          normal[iD] = y1[iD] / y1Norm;
        }
        // This only works when shear is du/dy
        tangent[0] = normal[1];
        tangent[1] = 0.;
        tangent[2] = 0.;
        // Note that this only works for my exact case
        V x_m = fmod(3 + latticeR[0] + tangent[0]*latticeShear*extents[1]*iT,extents[0] + 6) - 3;
        if (x_m < -3){
          x_m += extents[0] + 6;
        }
        x_m -= latticeR[0];
        Vector<V,DESCRIPTOR::d> dx_m;
        dx_m[0] = x_m;
        dx_m[1] = 0.;
        dx_m[2] = 0.;
        Vector<V,DESCRIPTOR::d> distance = extents[1]*normal + dx_m;
        V f_tilde[DESCRIPTOR::q] = {0.};
        // Step 2: interpolate value f_tilde
        interpolateDF(cell, distance, f_tilde);
        
        // Step 3: Galiliean Transform of f_tilde -> f_xm
        V drho = V(0.); // density from the temporary DF    
        Vector<V,DESCRIPTOR::d> u_xm; // u from the temporary DF
    
        for (int iPop = 0; iPop < DESCRIPTOR::q; ++iPop){
          drho += f_tilde[iPop];
          u_xm += f_tilde[iPop]*descriptors::c<DESCRIPTOR>(iPop);
        }
        u_xm /= 1; 
        Vector<V,DESCRIPTOR::d> u_xt = u_xm - latticeShear*(extents[1]+1)*tangent;
    
        auto uDotU_xm = util::dotProduct(u_xm,u_xm);
        auto uDotU_xt = util::dotProduct(u_xt,u_xt);
        auto invCs2 = descriptors::invCs2<V,DESCRIPTOR>();
        // Equilibrium DFs
        V f_eq_xm = V(0.);
        V f_eq_xt = V(0.);
    
        V cDotU_xm= V(0.);
        V cDotU_xt= V(0.);
        V t = V(0.);
                
        for (int iPop = 0; iPop < DESCRIPTOR::q; iPop++){
          Vector<V,DESCRIPTOR::d> c = descriptors::c<DESCRIPTOR>(iPop);
          t = descriptors::t<V,DESCRIPTOR>(iPop);
          cDotU_xm = util::dotProduct(c,u_xm);
          cDotU_xt = util::dotProduct(c,u_xt);
          f_eq_xm = drho * t + 
                    1 * t * (cDotU_xm + util::pow(cDotU_xm,2)*(0.5 * invCs2) - uDotU_xm * 0.5)*invCs2;
          f_eq_xt = drho * t + 
                    1 * t * (cDotU_xt + util::pow(cDotU_xt,2)*(0.5 * invCs2) - uDotU_xt * 0.5)*invCs2;    
          // Step 4: Copy f to positions x_t
          cell.neighbor(-1 * normal)[iPop] = f_tilde[iPop] - f_eq_xm + f_eq_xt;
        }
      }
    };
    
    }
    #endif

    Please let me know if you need any more portions of the code and thank you for the help!
    -jmctighe

    #10965
    jmctighe
    Participant

    After running some more debugging tests, the same error persists when running with a single CPU core. I was able to successfully use run the shear periodic post processor on a M1 MacBook, but I am encountering this issue on a linux machine. Additionally, the same error is produced when using the official 1.8.1 release and the most recent release on gitlab. I am also able to compile and run the examples without issue.

    The debugging errors suggest that the problem may be related to my use of ‘neighbor’. For example, when compiling for a single CPU core, displaying the contents of cell using gdb within the interpolateDF function yields:

    cell = (olb::cpu::Cell<double, olb::descriptors::D3Q19<>, (olb::Platform)0> &) @0x7fffffffb280: {
      _lattice = 0x7fffffffb4d0, _iCell = 171}

    While the display output for f_x1 is:
    f_x1 = {_lattice = 0x7fffffffb4d0, _iCell = 2147483819}

    It looks as though the other attributes of cell are not getting carried along. Although, the type for f_x1 is still olb::cpu::Cell<double, olb::descriptors::D3Q19<>, (olb::Platform)0>. I tried modifying the definitions of f_x1 and f_x2 to be of the form
    auto f_x1 = cell.neighbor(point1).template getField<descriptors::POPULATION>();,
    but this was not successful either.

    Hoping this additional information is of use. Thanks again for your help.
    -jmctighe

    #10966
    Adrian
    Keymaster

    Thanks for these additional details and good to know that the same problem occurs on CPU. Looking at your debug output I suspect that the issue is a out-of-bounds / undefined access to a neighbor cell (leading to the most likely invalid cell index of ~2e9, especially considering the index of the original cell). Looking at your interpolation function I can not exclude the possibility that very far / out of bounds neighbors are accessed. For performance reasons the index validity is not checked by default leading to the segfault we observe.

    What is the max range of neighbors accessed from a cell? Did you consider that the overlap needs to be large enough to cover these accesses (if the post processor is executed at the outermost non-overlap cells). The issue might also be faulty initialization of your extent fields.

    Side note: It seems to me that e.g. DOMAIN_EXTENT could be a parameter instead of a per-cell field. Using parameters where possible will help reduce your memory load.

    #10967
    jmctighe
    Participant

    Hi Adrian,

    You are definitely correct, the interpolation point is causing the problem. Although, instead of it being a point outside of the domain, the value of x_m is being calculated as nan. It seems that all of the cell or parameter values used by the post processor are zeros. It might be important to note that the error is occurring during the lattice preparation/boundary setting procedure, before setting the initial condition. Interested to know your thoughts on this.

    I’ve included the prepareLattice and setShearPeriodic functions below. I updated the post processor to use DOMAIN_EXTENT and LATTICE_SHEAR as parameters, but this had no effect on the error.

    prepareLattice:

    // Set up the geometry of the simulation
    void prepareLattice(SuperLattice<T,DESCRIPTOR>& sLattice,
                        const UnitConverter<T,DESCRIPTOR>& converter,
                        SuperGeometry<T,3>& superGeometry,
                        ShearPeriodicParameters<T>& shearPeriodicParameters)
    {
      OstreamManager clout(std::cout,"prepareLattice");
      clout << "Prepare Lattice ..." << std::endl;
    
      /// Material=1 -->bulk dynamics
      sLattice.defineDynamics<BulkDynamics>(superGeometry, 1);
      sLattice.addPostProcessor<stage::PostStream>(superGeometry.getMaterialIndicator({1,2}),meta::id<TrackCovarVelocity>{});
      /// Material = 2 --> boundary node
      sLattice.defineDynamics<BulkDynamics>(superGeometry, 2);
      /// === Set Initial Conditions == ///
      sLattice.setParameter<LATTICE_SHEAR>(shearPeriodicParameters.latticeShear);
      sLattice.setParameter<DOMAIN_EXTENT>(shearPeriodicParameters.extents);
    
      setshearPeriodic(sLattice, superGeometry, 2, shearPeriodicParameters);
      setInitialConditions(sLattice, converter, superGeometry);
    
      sLattice.setParameter<descriptors::OMEGA>( converter.getLatticeRelaxationFrequency() );
      //sLattice.setParameter<collision::LES::SMAGORINSKY>(T(0.12));
    
      // Make the lattice ready for simulation
      sLattice.initialize();
    
      clout << "Prepare Lattice ... OK" << std::endl;
    }

    setShearPeriodic.h:

    #ifndef SET_SHEAR_PERIODIC_H
    #define SET_SHEAR_PERIODIC_H
    
    #include "shearPeriodicPostProcessor.h"
    
    namespace olb {
    
    //======================================================================
    // ======== Shear Periodic parameters ======//
    //======================================================================
    
    template <typename T>
    struct ShearPeriodicParameters{
      ///  distance from cell to velocity sampling point in lattice units
      Vector<T,3> extents;
      T latticeWallDistance;
      T latticeShear;
    };
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(SuperLattice<T, DESCRIPTOR>& sLattice,
                               FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>&& boundaryIndicator,
                               FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>&& bulkIndicator,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr)
    {
      int _overlap = 3;
      OstreamManager clout(std::cout, "shearPeriodicSetter");
    
      AnalyticalConst<DESCRIPTOR::d, T, T> zeroVector(T(0.), T(0.));
    
      sLattice.template defineField<descriptors::Y1>(std::move(boundaryIndicator), zeroVector);
    
      auto& load = sLattice.getLoadBalancer();
      for (int iC=0; iC < load.size(); ++iC) {
        setshearPeriodic<T,DESCRIPTOR>(sLattice.getBlock(iC),
                              (bulkIndicator->getBlockIndicatorF(iC)).getBlockGeometry(),
                              boundaryIndicator->getBlockIndicatorF(iC),
                              bulkIndicator->getBlockIndicatorF(iC),
                              shearPeriodicParameters,
                              indicatorAnalyticalBoundary);
      }
      addPoints2CommBC<T,DESCRIPTOR>(sLattice, std::forward<decltype(boundaryIndicator)>(boundaryIndicator), _overlap);
      clout.setMultiOutput(false);
      sLattice.template setParameter<descriptors::SAMPLING_DISTANCE>( shearPeriodicParameters.extents[1] );
    }
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(SuperLattice<T, DESCRIPTOR>& sLattice,
                               SuperGeometry<T,DESCRIPTOR::d>& superGeometry,
                               int materialOfSolidObstacle,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr,
                               std::vector<int> bulkMaterials = std::vector<int>(1,1))
    {
      // Getting the indicators by material numbers and calling the superLattice method via the indicators:
      setshearPeriodic<T,DESCRIPTOR>(sLattice,
                          FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>(superGeometry.getMaterialIndicator(materialOfSolidObstacle)),
                          FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>(superGeometry.getMaterialIndicator(std::move(bulkMaterials))),
                          shearPeriodicParameters,
                          indicatorAnalyticalBoundary);
    }
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(BlockLattice<T,DESCRIPTOR>& block,
                               BlockGeometry<T,DESCRIPTOR::d>& blockGeometry,
                               BlockIndicatorF<T,DESCRIPTOR::d>& boundaryIndicator,
                               BlockIndicatorF<T,DESCRIPTOR::d>& bulkIndicator,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr)
    {
      OstreamManager clout(std::cout, "shearPeriodicSetter");
      clout.setMultiOutput(true);
    
      // Defining boundary distance y1 and distance to exchange location y2 in wall normal direction for every boundary cell
      const T deltaR = blockGeometry.getDeltaR();
      // for each solid cell:
      int kMax = 2;
      for(int k=1; k < kMax; k++){
        block.forSpatialLocations([&](LatticeR<DESCRIPTOR::d> solidLatticeR) {
          // Check if cell is solid cell
          if (boundaryIndicator(solidLatticeR)) {
            for (int iPop=1; iPop < DESCRIPTOR::q; ++iPop) {
              Vector<T,DESCRIPTOR::d> boundaryLatticeR(solidLatticeR + k*descriptors::c<DESCRIPTOR>(iPop));
              if (blockGeometry.getNeighborhoodRadius(boundaryLatticeR) >= 1) {
                if (blockGeometry.isInside(boundaryLatticeR)) {
                  Vector<T,DESCRIPTOR::d> boundaryPhysR{ };
                  blockGeometry.getPhysR(boundaryPhysR,boundaryLatticeR);
                  // check if neighbor is fluid cell
                  if (bulkIndicator(boundaryLatticeR)) {
                    if (indicatorAnalyticalBoundary) {
                      // Calculate surface normal
                      Vector<T,DESCRIPTOR::d> normal = indicatorAnalyticalBoundary->surfaceNormal(boundaryPhysR, deltaR);
                      T y1 = 0.;
                      // Calculate boundary distance y1 in surface normal direction
                      indicatorAnalyticalBoundary->distance(y1, boundaryPhysR, normal, blockGeometry.getIcGlob());
                      if(y1 == T(0)) {
                        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                          normal[iD] *= T(-1);
                        }
                        indicatorAnalyticalBoundary->distance(y1, boundaryPhysR, normal, blockGeometry.getIcGlob());
                      }
                      y1 /= deltaR; // y1 in lattice units
                      if ( util::abs(y1) > T(k) ) {
                        y1 = util::sign(y1) * T(k);
                      }
                      block.get(boundaryLatticeR).template setField<descriptors::Y1>(-y1*normal);
                    } else {
                      Vector<T,DESCRIPTOR::d> normal;
                      for(int jPop = 0; jPop < DESCRIPTOR::q; jPop++){
                        Vector<T,DESCRIPTOR::d> boundaryLatticeR2(boundaryLatticeR + k*descriptors::c<DESCRIPTOR>(jPop));
                        if(boundaryIndicator(boundaryLatticeR2)){
                          for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                            normal[iD] += k*descriptors::c<DESCRIPTOR>(jPop,iD);
                          }
                        }
                      }
                      T normalNorm = util::norm<DESCRIPTOR::d>(normal);
                      if(normalNorm != T(0)) {
                        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                          normal[iD] /= normalNorm;
                        }
                      }
                      for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                        if(util::abs(normal[iD]) < T(0.4)) {
                          normal[iD] = T(0);
                        }
                      }
                      normalNorm = util::norm<DESCRIPTOR::d>(normal);
                      for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                        normal[iD] /= normalNorm;
                      }
                      if(normal[0] != T(0) && normal[0] > T(0)) {
                        normal[0] /= normal[0];
                      }
                      else if(normal[0] != T(0) && normal[0] < T(0)) {
                        normal[0] /= normal[0];
                        normal[0] *= T(-1.);
                      }
                      if(normal[1] != T(0) && normal[1] > T(0)) {
                        normal[1] /= normal[1];
                      }
                      else if(normal[1] != T(0) && normal[1] < T(0)) {
                        normal[1] /= normal[1];
                        normal[1] *= T(-1.);
                      }
                      if ( DESCRIPTOR::d == 3) {
                        if(normal[2] != T(0) && normal[2] > T(0)) {
                          normal[2] /= normal[2];
                        }
                        else if(normal[2] != T(0) && normal[2] < T(0)) {
                          normal[2] /= normal[2];
                          normal[2] *= T(-1.);
                        }
                      }
                      if(normalNorm != T(0)) {
                        auto field = block.get(boundaryLatticeR).template getField<descriptors::Y1>();
                        if( util::norm<DESCRIPTOR::d>(field) == T(0)) {
                          T y1 = (k-1) + shearPeriodicParameters.latticeWallDistance;
                          block.get(boundaryLatticeR).template setField<descriptors::Y1>(-y1*normal);
                        }
                      }
                    }
                    //T pi[util::TensorVal<DESCRIPTOR>::n] {T(0)};
                    //block.get(boundaryLatticeR).template setField<descriptors::TENSOR>(pi);
                    // Setting shear periodic post processor
                    block.get(boundaryLatticeR).template setField<descriptors::LOCATION>(boundaryLatticeR);
                    block.addPostProcessor(typeid(stage::PreCollide),
                            boundaryLatticeR,
                            meta::id<shearPeriodicPostProcessor>{});
                  }
                }
              }
            }
          }
        });
      }
      
    }
    
    }
    
    #endif
    #10970
    Adrian
    Keymaster

    Did you confirm that the data is set up how you expect e.g. by viewing the fields in Paraview?

    For the GPU case: Did you switch the processing context s.t. the data is synced to GPU before you execute the post processor?

    #10972
    jmctighe
    Participant

    The data is set up as it should be in Paraview. I believe the issue with the post processor was that the surface normal vector was being initially computed as nan. I noticed that there is an if-statement in the turbulentWallModelPostProcessor to only execute the necessary computations if the magnitude of the surface normal vector is non-zero. The shear periodic BC works when I include the same if-statement, and now the code runs on the CPU. A minor success!

    On the GPU, the code will begin to run but encounters the following error at the beginning of the time stepping loop: Thread 1 "hst3d" received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. I assume the error is encountered while executing the post processor.

    To answer one of your earlier questions, the overlap is kept at 3, as done in the setTurbulentWallModel function. The furthest possible neighbor cell would be in the opposite corner of the domain. Any given boundary cell needs to ‘see’ all of the cells in the flow on the other side of the domain. If I change the overlap to encompass the entire domain, the code runs into the same error. I am wondering if the error on the GPU is related to the range over which neighboring cell data needs to be retrieved.

    How would I go about switching the processing context before execution of the post processor? At this point, I can send you the entire code if you think it would be helpful.

    #10973
    Adrian
    Keymaster

    Overlap communication is not designed to encompass the entire domain. You will need custom code to realize such a complex scheme. E.g. for our FSI integrals we implemented custom PerBlock scoped post processors with CPU / GPU specific implementations.

    Can you explain what you want to implement in more detail? I do not quite get why you need to access cells at the opposite side of the domain (unless it is for periodicity which is a different question and doesn’t require the overlap to span the entire domain).

    #10979
    jmctighe
    Participant

    The boundary condition I am trying to implement is a periodic condition which enforces a constant shear rate across the domain. For a given boundary cell at position x_t, the point in the flow at the other side of the domain the cell needs to reference is time dependent, given by x_m = [mod(x_t + S*L_y*t,L_x),L_y,z_t], where S is the shear rate (S=du/dy), L_x and L_y are the respective x and y dimensions of the domain, and z_t is the z position of the given cell. The populations are then linearly interpolated at point x_m since it does not necessarily fall on a lattice node. Figure 2 in Peng, Wu, Wang, and Li (2023) gives a sketch of this procedure. The final steps involve taking a Galilean transformation of the populations and copying them to the boundary cell before executing the collision step.

    #10980
    jmctighe
    Participant

    After thinking on this a little bit, would it be possible to store the cell data adjacent to the boundaries in such a way that they are globally available? The cell data could be stored before executing the boundary condition post processor. Something like this would also more easily support domain decomposition if each block could access all of the necessary cell data.

    An alternative approach, if possible, might be to execute the boundary condition post processor on the host and then perform the collision and streaming on the device. Obviously, this would not be as efficient and does not support domain decomposition, but could give a computational boost over the single-CPU serial process.

    Interested to know if either of these approaches might be possible within the current framework.

    #10981
    Adrian
    Keymaster

    I see, this is interesting!

    An efficient general implementation will require custom MPI communication I think, executing parts on CPU won’t change the fundamental problem of the data not being available where it needs to be.

    For an intermediary step: How large of a simulation domain / how many cells do you currently plan to use this on? If you can e.g. be sure that this will fit on a single node or even single GPU (which can fit a lot of cells nowadays) the implementation will be much simpler.

Viewing 12 posts - 1 through 12 (of 12 total)
  • You must be logged in to reply to this topic.