Skip to content

jmctighe

Due to recent bot attacks we have changed the sign-up process. If you want to participate in our forum please send a message via our contact form.

Forum Replies Created

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • in reply to: Manipulating populations with postprocessors on GPU #10980
    jmctighe
    Participant

    After thinking on this a little bit, would it be possible to store the cell data adjacent to the boundaries in such a way that they are globally available? The cell data could be stored before executing the boundary condition post processor. Something like this would also more easily support domain decomposition if each block could access all of the necessary cell data.

    An alternative approach, if possible, might be to execute the boundary condition post processor on the host and then perform the collision and streaming on the device. Obviously, this would not be as efficient and does not support domain decomposition, but could give a computational boost over the single-CPU serial process.

    Interested to know if either of these approaches might be possible within the current framework.

    in reply to: Manipulating populations with postprocessors on GPU #10979
    jmctighe
    Participant

    The boundary condition I am trying to implement is a periodic condition which enforces a constant shear rate across the domain. For a given boundary cell at position x_t, the point in the flow at the other side of the domain the cell needs to reference is time dependent, given by x_m = [mod(x_t + S*L_y*t,L_x),L_y,z_t], where S is the shear rate (S=du/dy), L_x and L_y are the respective x and y dimensions of the domain, and z_t is the z position of the given cell. The populations are then linearly interpolated at point x_m since it does not necessarily fall on a lattice node. Figure 2 in Peng, Wu, Wang, and Li (2023) gives a sketch of this procedure. The final steps involve taking a Galilean transformation of the populations and copying them to the boundary cell before executing the collision step.

    in reply to: Manipulating populations with postprocessors on GPU #10972
    jmctighe
    Participant

    The data is set up as it should be in Paraview. I believe the issue with the post processor was that the surface normal vector was being initially computed as nan. I noticed that there is an if-statement in the turbulentWallModelPostProcessor to only execute the necessary computations if the magnitude of the surface normal vector is non-zero. The shear periodic BC works when I include the same if-statement, and now the code runs on the CPU. A minor success!

    On the GPU, the code will begin to run but encounters the following error at the beginning of the time stepping loop: Thread 1 "hst3d" received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. I assume the error is encountered while executing the post processor.

    To answer one of your earlier questions, the overlap is kept at 3, as done in the setTurbulentWallModel function. The furthest possible neighbor cell would be in the opposite corner of the domain. Any given boundary cell needs to ‘see’ all of the cells in the flow on the other side of the domain. If I change the overlap to encompass the entire domain, the code runs into the same error. I am wondering if the error on the GPU is related to the range over which neighboring cell data needs to be retrieved.

    How would I go about switching the processing context before execution of the post processor? At this point, I can send you the entire code if you think it would be helpful.

    in reply to: Manipulating populations with postprocessors on GPU #10967
    jmctighe
    Participant

    Hi Adrian,

    You are definitely correct, the interpolation point is causing the problem. Although, instead of it being a point outside of the domain, the value of x_m is being calculated as nan. It seems that all of the cell or parameter values used by the post processor are zeros. It might be important to note that the error is occurring during the lattice preparation/boundary setting procedure, before setting the initial condition. Interested to know your thoughts on this.

    I’ve included the prepareLattice and setShearPeriodic functions below. I updated the post processor to use DOMAIN_EXTENT and LATTICE_SHEAR as parameters, but this had no effect on the error.

    prepareLattice:

    // Set up the geometry of the simulation
    void prepareLattice(SuperLattice<T,DESCRIPTOR>& sLattice,
                        const UnitConverter<T,DESCRIPTOR>& converter,
                        SuperGeometry<T,3>& superGeometry,
                        ShearPeriodicParameters<T>& shearPeriodicParameters)
    {
      OstreamManager clout(std::cout,"prepareLattice");
      clout << "Prepare Lattice ..." << std::endl;
    
      /// Material=1 -->bulk dynamics
      sLattice.defineDynamics<BulkDynamics>(superGeometry, 1);
      sLattice.addPostProcessor<stage::PostStream>(superGeometry.getMaterialIndicator({1,2}),meta::id<TrackCovarVelocity>{});
      /// Material = 2 --> boundary node
      sLattice.defineDynamics<BulkDynamics>(superGeometry, 2);
      /// === Set Initial Conditions == ///
      sLattice.setParameter<LATTICE_SHEAR>(shearPeriodicParameters.latticeShear);
      sLattice.setParameter<DOMAIN_EXTENT>(shearPeriodicParameters.extents);
    
      setshearPeriodic(sLattice, superGeometry, 2, shearPeriodicParameters);
      setInitialConditions(sLattice, converter, superGeometry);
    
      sLattice.setParameter<descriptors::OMEGA>( converter.getLatticeRelaxationFrequency() );
      //sLattice.setParameter<collision::LES::SMAGORINSKY>(T(0.12));
    
      // Make the lattice ready for simulation
      sLattice.initialize();
    
      clout << "Prepare Lattice ... OK" << std::endl;
    }

    setShearPeriodic.h:

    #ifndef SET_SHEAR_PERIODIC_H
    #define SET_SHEAR_PERIODIC_H
    
    #include "shearPeriodicPostProcessor.h"
    
    namespace olb {
    
    //======================================================================
    // ======== Shear Periodic parameters ======//
    //======================================================================
    
    template <typename T>
    struct ShearPeriodicParameters{
      ///  distance from cell to velocity sampling point in lattice units
      Vector<T,3> extents;
      T latticeWallDistance;
      T latticeShear;
    };
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(SuperLattice<T, DESCRIPTOR>& sLattice,
                               FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>&& boundaryIndicator,
                               FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>&& bulkIndicator,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr)
    {
      int _overlap = 3;
      OstreamManager clout(std::cout, "shearPeriodicSetter");
    
      AnalyticalConst<DESCRIPTOR::d, T, T> zeroVector(T(0.), T(0.));
    
      sLattice.template defineField<descriptors::Y1>(std::move(boundaryIndicator), zeroVector);
    
      auto& load = sLattice.getLoadBalancer();
      for (int iC=0; iC < load.size(); ++iC) {
        setshearPeriodic<T,DESCRIPTOR>(sLattice.getBlock(iC),
                              (bulkIndicator->getBlockIndicatorF(iC)).getBlockGeometry(),
                              boundaryIndicator->getBlockIndicatorF(iC),
                              bulkIndicator->getBlockIndicatorF(iC),
                              shearPeriodicParameters,
                              indicatorAnalyticalBoundary);
      }
      addPoints2CommBC<T,DESCRIPTOR>(sLattice, std::forward<decltype(boundaryIndicator)>(boundaryIndicator), _overlap);
      clout.setMultiOutput(false);
      sLattice.template setParameter<descriptors::SAMPLING_DISTANCE>( shearPeriodicParameters.extents[1] );
    }
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(SuperLattice<T, DESCRIPTOR>& sLattice,
                               SuperGeometry<T,DESCRIPTOR::d>& superGeometry,
                               int materialOfSolidObstacle,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr,
                               std::vector<int> bulkMaterials = std::vector<int>(1,1))
    {
      // Getting the indicators by material numbers and calling the superLattice method via the indicators:
      setshearPeriodic<T,DESCRIPTOR>(sLattice,
                          FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>(superGeometry.getMaterialIndicator(materialOfSolidObstacle)),
                          FunctorPtr<SuperIndicatorF<T,DESCRIPTOR::d>>(superGeometry.getMaterialIndicator(std::move(bulkMaterials))),
                          shearPeriodicParameters,
                          indicatorAnalyticalBoundary);
    }
    
    template<typename T, typename DESCRIPTOR>
    void setshearPeriodic(BlockLattice<T,DESCRIPTOR>& block,
                               BlockGeometry<T,DESCRIPTOR::d>& blockGeometry,
                               BlockIndicatorF<T,DESCRIPTOR::d>& boundaryIndicator,
                               BlockIndicatorF<T,DESCRIPTOR::d>& bulkIndicator,
                               ShearPeriodicParameters<T>& shearPeriodicParameters,
                               IndicatorF<T,DESCRIPTOR::d>* indicatorAnalyticalBoundary = nullptr)
    {
      OstreamManager clout(std::cout, "shearPeriodicSetter");
      clout.setMultiOutput(true);
    
      // Defining boundary distance y1 and distance to exchange location y2 in wall normal direction for every boundary cell
      const T deltaR = blockGeometry.getDeltaR();
      // for each solid cell:
      int kMax = 2;
      for(int k=1; k < kMax; k++){
        block.forSpatialLocations([&](LatticeR<DESCRIPTOR::d> solidLatticeR) {
          // Check if cell is solid cell
          if (boundaryIndicator(solidLatticeR)) {
            for (int iPop=1; iPop < DESCRIPTOR::q; ++iPop) {
              Vector<T,DESCRIPTOR::d> boundaryLatticeR(solidLatticeR + k*descriptors::c<DESCRIPTOR>(iPop));
              if (blockGeometry.getNeighborhoodRadius(boundaryLatticeR) >= 1) {
                if (blockGeometry.isInside(boundaryLatticeR)) {
                  Vector<T,DESCRIPTOR::d> boundaryPhysR{ };
                  blockGeometry.getPhysR(boundaryPhysR,boundaryLatticeR);
                  // check if neighbor is fluid cell
                  if (bulkIndicator(boundaryLatticeR)) {
                    if (indicatorAnalyticalBoundary) {
                      // Calculate surface normal
                      Vector<T,DESCRIPTOR::d> normal = indicatorAnalyticalBoundary->surfaceNormal(boundaryPhysR, deltaR);
                      T y1 = 0.;
                      // Calculate boundary distance y1 in surface normal direction
                      indicatorAnalyticalBoundary->distance(y1, boundaryPhysR, normal, blockGeometry.getIcGlob());
                      if(y1 == T(0)) {
                        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                          normal[iD] *= T(-1);
                        }
                        indicatorAnalyticalBoundary->distance(y1, boundaryPhysR, normal, blockGeometry.getIcGlob());
                      }
                      y1 /= deltaR; // y1 in lattice units
                      if ( util::abs(y1) > T(k) ) {
                        y1 = util::sign(y1) * T(k);
                      }
                      block.get(boundaryLatticeR).template setField<descriptors::Y1>(-y1*normal);
                    } else {
                      Vector<T,DESCRIPTOR::d> normal;
                      for(int jPop = 0; jPop < DESCRIPTOR::q; jPop++){
                        Vector<T,DESCRIPTOR::d> boundaryLatticeR2(boundaryLatticeR + k*descriptors::c<DESCRIPTOR>(jPop));
                        if(boundaryIndicator(boundaryLatticeR2)){
                          for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                            normal[iD] += k*descriptors::c<DESCRIPTOR>(jPop,iD);
                          }
                        }
                      }
                      T normalNorm = util::norm<DESCRIPTOR::d>(normal);
                      if(normalNorm != T(0)) {
                        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                          normal[iD] /= normalNorm;
                        }
                      }
                      for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                        if(util::abs(normal[iD]) < T(0.4)) {
                          normal[iD] = T(0);
                        }
                      }
                      normalNorm = util::norm<DESCRIPTOR::d>(normal);
                      for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
                        normal[iD] /= normalNorm;
                      }
                      if(normal[0] != T(0) && normal[0] > T(0)) {
                        normal[0] /= normal[0];
                      }
                      else if(normal[0] != T(0) && normal[0] < T(0)) {
                        normal[0] /= normal[0];
                        normal[0] *= T(-1.);
                      }
                      if(normal[1] != T(0) && normal[1] > T(0)) {
                        normal[1] /= normal[1];
                      }
                      else if(normal[1] != T(0) && normal[1] < T(0)) {
                        normal[1] /= normal[1];
                        normal[1] *= T(-1.);
                      }
                      if ( DESCRIPTOR::d == 3) {
                        if(normal[2] != T(0) && normal[2] > T(0)) {
                          normal[2] /= normal[2];
                        }
                        else if(normal[2] != T(0) && normal[2] < T(0)) {
                          normal[2] /= normal[2];
                          normal[2] *= T(-1.);
                        }
                      }
                      if(normalNorm != T(0)) {
                        auto field = block.get(boundaryLatticeR).template getField<descriptors::Y1>();
                        if( util::norm<DESCRIPTOR::d>(field) == T(0)) {
                          T y1 = (k-1) + shearPeriodicParameters.latticeWallDistance;
                          block.get(boundaryLatticeR).template setField<descriptors::Y1>(-y1*normal);
                        }
                      }
                    }
                    //T pi[util::TensorVal<DESCRIPTOR>::n] {T(0)};
                    //block.get(boundaryLatticeR).template setField<descriptors::TENSOR>(pi);
                    // Setting shear periodic post processor
                    block.get(boundaryLatticeR).template setField<descriptors::LOCATION>(boundaryLatticeR);
                    block.addPostProcessor(typeid(stage::PreCollide),
                            boundaryLatticeR,
                            meta::id<shearPeriodicPostProcessor>{});
                  }
                }
              }
            }
          }
        });
      }
      
    }
    
    }
    
    #endif
    in reply to: Manipulating populations with postprocessors on GPU #10965
    jmctighe
    Participant

    After running some more debugging tests, the same error persists when running with a single CPU core. I was able to successfully use run the shear periodic post processor on a M1 MacBook, but I am encountering this issue on a linux machine. Additionally, the same error is produced when using the official 1.8.1 release and the most recent release on gitlab. I am also able to compile and run the examples without issue.

    The debugging errors suggest that the problem may be related to my use of ‘neighbor’. For example, when compiling for a single CPU core, displaying the contents of cell using gdb within the interpolateDF function yields:

    cell = (olb::cpu::Cell<double, olb::descriptors::D3Q19<>, (olb::Platform)0> &) @0x7fffffffb280: {
      _lattice = 0x7fffffffb4d0, _iCell = 171}

    While the display output for f_x1 is:
    f_x1 = {_lattice = 0x7fffffffb4d0, _iCell = 2147483819}

    It looks as though the other attributes of cell are not getting carried along. Although, the type for f_x1 is still olb::cpu::Cell<double, olb::descriptors::D3Q19<>, (olb::Platform)0>. I tried modifying the definitions of f_x1 and f_x2 to be of the form
    auto f_x1 = cell.neighbor(point1).template getField<descriptors::POPULATION>();,
    but this was not successful either.

    Hoping this additional information is of use. Thanks again for your help.
    -jmctighe

    in reply to: Manipulating populations with postprocessors on GPU #10937
    jmctighe
    Participant

    Hi Adrian,

    Here are the contents of the config.mk file currently setup for debugging (the same problem persists with -O3 optimization):

    CXX             := nvcc
    CC              := nvcc
    
    CXXFLAGS        := -O0
    CXXFLAGS        += -g -G -std=c++20 --forward-unknown-to-host-compiler
    
    PARALLEL_MODE   := NONE
    
    PLATFORMS       := CPU_SISD GPU_CUDA
    
    # for e.g. RTX 30* (Ampere), see table in <code>rules.mk</code> for other options
    CUDA_ARCH       := 90
    
    FLOATING_POINT_TYPE := float
    
    USE_EMBEDDED_DEPENDENCIES := ON

    Below is the code for the entire post processor. The post processor itself is added in a function called “setShearPeriodic” which is very similar to the setTurbulentWallModel function for the implementation of the turbulent wall model post processor. I am using the same portion of code in setTurbulentWallModel to compute the surface normal vectors before adding the post processor.

    #ifndef SHEAR_PERIODIC_POST_PROCESSOR_H
    #define SHEAR_PERIODIC_POST_PROCESSOR_H
    
    #include "core/util.h"
    #include <math.h>
    
    namespace olb {
    
      template <typename CELL, typename V = typename CELL::value_t, typename DESCRIPTOR = typename CELL::descriptor_t>
      void interpolateDF(CELL& cell, Vector<V,DESCRIPTOR::d> distance, V f_tilde[DESCRIPTOR::q]) any_platform{
        Vector<V,DESCRIPTOR::d> point1;
        for( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
          point1[iD] = util::floor(distance[iD]);
        }
        V q = distance[0] - point1[0];
        Vector<V,DESCRIPTOR::d> point2 = point1;
        point2[0] += 1;
        auto f_x1 = cell.neighbor(point1);
        auto f_x2 = cell.neighbor(point2);
    
        for(int iPop = 0; iPop < DESCRIPTOR::q; iPop++ ) {
          f_tilde[iPop] = f_x1[iPop]*(V{1}-q) + f_x2[iPop]*q;
        }
        
        
      };
      
    //======================================================================
    // ========== Shear Periodic (Lees-Edwards) BC PostProcessor =========//
    //======================================================================
    class shearPeriodicPostProcessor {
    public:
      static constexpr OperatorScope scope = OperatorScope::PerCellWithParameters;
      using parameters = meta::list<descriptors::LATTICE_TIME>;
    
      int getPriority() const {
        return 1;
      }
    
      template <typename CELL, typename PARAMETERS, typename V = typename CELL::value_t>
      void apply(CELL& cell, PARAMETERS& parameters) any_platform {
        // Step 1: Find location x_m (y2)
        using DESCRIPTOR = typename CELL::descriptor_t;
        int iT = parameters.template get<descriptors::LATTICE_TIME>();
        auto extents = cell.template getField<DOMAIN_EXTENT>();
        auto latticeShear = cell.template getField<LATTICE_SHEAR>();
        auto y1 = cell.template getField<descriptors::Y1>();
        Vector<V,DESCRIPTOR::d> latticeR = cell.template getField<descriptors::LOCATION>();
    
        V y1Norm = util::norm<DESCRIPTOR::d>(y1);
        Vector<V,DESCRIPTOR::d> normal;
        Vector<V,DESCRIPTOR::d> tangent;
        for ( int iD = 0; iD < DESCRIPTOR::d; iD++ ) {
          normal[iD] = y1[iD] / y1Norm;
        }
        // This only works when shear is du/dy
        tangent[0] = normal[1];
        tangent[1] = 0.;
        tangent[2] = 0.;
        // Note that this only works for my exact case
        V x_m = fmod(3 + latticeR[0] + tangent[0]*latticeShear*extents[1]*iT,extents[0] + 6) - 3;
        if (x_m < -3){
          x_m += extents[0] + 6;
        }
        x_m -= latticeR[0];
        Vector<V,DESCRIPTOR::d> dx_m;
        dx_m[0] = x_m;
        dx_m[1] = 0.;
        dx_m[2] = 0.;
        Vector<V,DESCRIPTOR::d> distance = extents[1]*normal + dx_m;
        V f_tilde[DESCRIPTOR::q] = {0.};
        // Step 2: interpolate value f_tilde
        interpolateDF(cell, distance, f_tilde);
        
        // Step 3: Galiliean Transform of f_tilde -> f_xm
        V drho = V(0.); // density from the temporary DF    
        Vector<V,DESCRIPTOR::d> u_xm; // u from the temporary DF
    
        for (int iPop = 0; iPop < DESCRIPTOR::q; ++iPop){
          drho += f_tilde[iPop];
          u_xm += f_tilde[iPop]*descriptors::c<DESCRIPTOR>(iPop);
        }
        u_xm /= 1; 
        Vector<V,DESCRIPTOR::d> u_xt = u_xm - latticeShear*(extents[1]+1)*tangent;
    
        auto uDotU_xm = util::dotProduct(u_xm,u_xm);
        auto uDotU_xt = util::dotProduct(u_xt,u_xt);
        auto invCs2 = descriptors::invCs2<V,DESCRIPTOR>();
        // Equilibrium DFs
        V f_eq_xm = V(0.);
        V f_eq_xt = V(0.);
    
        V cDotU_xm= V(0.);
        V cDotU_xt= V(0.);
        V t = V(0.);
                
        for (int iPop = 0; iPop < DESCRIPTOR::q; iPop++){
          Vector<V,DESCRIPTOR::d> c = descriptors::c<DESCRIPTOR>(iPop);
          t = descriptors::t<V,DESCRIPTOR>(iPop);
          cDotU_xm = util::dotProduct(c,u_xm);
          cDotU_xt = util::dotProduct(c,u_xt);
          f_eq_xm = drho * t + 
                    1 * t * (cDotU_xm + util::pow(cDotU_xm,2)*(0.5 * invCs2) - uDotU_xm * 0.5)*invCs2;
          f_eq_xt = drho * t + 
                    1 * t * (cDotU_xt + util::pow(cDotU_xt,2)*(0.5 * invCs2) - uDotU_xt * 0.5)*invCs2;    
          // Step 4: Copy f to positions x_t
          cell.neighbor(-1 * normal)[iPop] = f_tilde[iPop] - f_eq_xm + f_eq_xt;
        }
      }
    };
    
    }
    #endif

    Please let me know if you need any more portions of the code and thank you for the help!
    -jmctighe

    in reply to: Implementing the Cumulant Collision Operator #10688
    jmctighe
    Participant

    FBukreev,

    I’ve written the following postprocessor for computing the full covariance matrix of velocity, making use of the already existing fields LATTICE_TIME and AVERAGE_VELOCITY:

    struct TrackCovarVelocity {
      static constexpr OperatorScope scope = OperatorScope::PerCellWithParameters;
    
      int getPriority() const {
        return std::numeric_limits<int>::max();
      }
     
      using parameters = meta::list<descriptors::LATTICE_TIME>;
    
      template <typename CELL,typename PARAMETERS>
      void apply(CELL& cell, PARAMETERS& parameters) any_platform {
        using V = typename CELL::value_t;
        using DESCRIPTOR = typename CELL::descriptor_t;
    
        int iT = parameters.template get<descriptors::LATTICE_TIME>();
        auto uCovar = cell.template getField<COVARIANCE_VELOCITY>();
        if (iT > 1) { 
          auto uAvg = cell.template getField<AVERAGE_VELOCITY>();
    
          V u[DESCRIPTOR::d] {};
          cell.computeU(u);
          
          int k;
          for (int i=0; i < DESCRIPTOR::d; i++){
            for (int j = 0; j < DESCRIPTOR::d; j++){
              k = j + i * DESCRIPTOR::d;
             uCovar[k] = (uCovar[k]*(iT - 1) + (u[i] - uAvg[i]) * (u[j] - uAvg[j])*(iT/(iT-1))) / iT;
            }
          }     
        }
        else {
          for(int i=0; i<(DESCRIPTOR::d * DESCRIPTOR::d); i++){
            uCovar[i] = 0.;
          }
        }
        cell.template setField<COVARIANCE_VELOCITY>(uCovar);
    
      }
      
      
    };

    The postprocessor is working as expected. Thank you again for all your help!

    Best,

    jmctighe

    in reply to: Implementing the Cumulant Collision Operator #10675
    jmctighe
    Participant

    OK, got it. Just so I know I am looking in the right place, the averaging is being performed using either the TrackAverageVelocity or StoreAndTrackAverageVelocity structs located in src/dynamics/collisionModifiers.h, correct? And then I should be able to write a postprocessor with a similar structure that operates on a new field, i.e. RMS_VELOCITY for rms data? but this post-processor does not have to be defined within the COLLISION namespace, or should I use the collision namespace to make use of parameters such as LATTICE_TIME? I have limited experience with C++, so this is all a bit of a learning process for me.

    Thanks again for all your help.

    in reply to: Implementing the Cumulant Collision Operator #10673
    jmctighe
    Participant

    The release is 1.8.1. I did notice that these functors are not used in any of the provided examples. Should I be using something different to collect the field statistics? I know the averages can be obtained with the AVERAGE_VELOCITY field, but I am particularly interested in the variance and covariance terms.

    in reply to: Implementing the Cumulant Collision Operator #10670
    jmctighe
    Participant

    Perhaps there is something wrong on my end then… When running the simulation with the same descriptor, collision operator, and wall-model parameters as in the provided channel3d example, the performance is about 390 MLUPs before starting the SuperLatticeTimeAveraged3D calculations (performed using .addEnsemble()). Once the calculations begin, the performance dramatically drops to under 2.5 MLUPs, the volatile GPU utilization goes to zero on average for both GPUs, and the CPU utilization goes to 100% for the two CPUs in use.

    Could this be at all related to the compilation configuration? I can provide the contents of my config.mk file and the simulation setup if you would like.

    • This reply was modified 9 months ago by jmctighe. Reason: typo
    in reply to: Implementing the Cumulant Collision Operator #10668
    jmctighe
    Participant

    Hi FBukreev,

    Thank you for the clarification. I was also wondering about the SuperLatticeTimeAveraged3D and SuperLatticeTimeAveragedCrossCorrelationF3D functors.

    For the turbulent channel flow, I would like to perform first and second order statistics on the velocity field data over ~200,000 consecutive time steps with a grid that consists of ~25M voxels. I am using the “gpu_openmpi.mk” configuration with two NVIDIA H100 GPUs.

    When I use the time averaged functors, the simulation becomes severely bottlenecked, since the functors are running on the CPU. Are there any work arounds for this? Is it possible to carry out these operations on the GPU using the current framework? Greatly appreciate your help!

    -jmctighe

Viewing 11 posts - 1 through 11 (of 11 total)