Skip to content

Modifying field values for neighboring cells inside a postprocessor

OpenLB – Open Source Lattice Boltzmann Code Forums on OpenLB General Topics Modifying field values for neighboring cells inside a postprocessor

Viewing 15 posts - 16 through 30 (of 40 total)
  • Author
    Posts
  • #9121
    Danial.Khazaeipoul
    Participant

    I did though about storing the cell IDs, instead of storing pointers to the cell interfaces. However, I would need to create a Cell interface to be able to modify a global data which requires a blockLattice to be passed as an argument:

    Cell<T,DESCRIPTOR> cell(blockLattice, iCell);

    From the user-guide, I can see that the OperatorScope can be set to PerBlock instead of PerCell or PerCellWithParameters. However, I am not sure if this works with the freeSurfacePostProcessor or if OpenLB provides a similar method for block lattice operator such as:

    template <typename T, typename DESCRIPTOR>
    template <typename BLOCK>
    void TestPostProcessor3D<T, DESCRIPTOR>::apply(BLOCK& block)
    {}

    #9122
    Danial.Khazaeipoul
    Participant

    I could potentially simplify this by implementing the algorithm directly within the top-level example. However, I believe doing so would introduce unnecessary complexity and complicate future integration efforts. Instead, I’m working on a more general integration approach, allowing users to easily utilize it in the same way they currently use the free-surface dynamics feature when the code is ready to merge.

    #9125
    Danial.Khazaeipoul
    Participant

    In regard to this part of your question:

    “I assume you want to preserve the queue across operator applications?”

    No, I don’t need to preserve the queue, it is only used as a temporary container to store the cells or their ids to search for the connected gas or interface cells for which the global data, e.g., TEST_ID, is to be modified. Each operator call is going to build its own queue and discard it when the global data is modified.

    #9128
    Adrian
    Keymaster

    Ok, good. This means that you can build on your previous post and store e.g. copies of the cell interfaces instead of pointers or just plain cell IDs and apply them using the CELL::setCellID method.

    #9131
    Danial.Khazaeipoul
    Participant

    Thank you Adrian,

    I attempted to follow your suggestion and only stored the “CellIDs” in the queue. However, it seems that the CUDA platform does not provide the “CELL::setCellId”. I did check the core/platform directory and only the CPU platform implements the “CELL::setCellId” method to be used. The code compiles and runs on CPU platform but it fails to compile on GPUs:

    error: class “olb::gpu::cuda::Cell<T, NavierStokesDescriptor>” has no member “setCellId”

    #9133
    Adrian
    Keymaster

    You are right. Luckily this is easy to fix.

    #9144
    Danial.Khazaeipoul
    Participant

    Thank you for your assistance. FYI, the cell interface for cuda does also miss a getCellId() method which I added per your suggestion:

    CellID getCellId() const __device__ {
    return this->_iCell;
    }

    #9235
    Danial.Khazaeipoul
    Participant

    Dear Adrian,

    Is it possible to perform atomic operations within the operators?
    I am attempting to do atomicMin within the apply() operator as below but I get a compile time error that no instances of the overloaded function “atomicMin” matches the argument provided:

    reducedID = atomicMin(cell.template getFieldPointer<FreeSurface::TEST_ID>(), nbrID);

    Note that, nbrID and the TEST_ID fields are of the type CellID. Moreover, the function itself requires a memory address and a value as below:

    int atomicMin(int* address, int val);
    unsigned int atomicMin(unsigned int* address, unsigned int val);
    unsigned long long int atomicMin(unsigned long long int* address, unsigned long long int val);
    long long int atomicMin(long long int* address, long long int val);
    #9250
    Adrian
    Keymaster

    Assuming that CUDA’s atomicMin is found at all (and there is not some weird namespace issue due to CUDA placing this stuff in the root namespace) the issue is that CELL::getFieldPointer returns a OpenLB (and platform) specific wrapper to resolve the underlying Structure of Arrays (preventing us from returning a raw pointer to some field value, see e.g. ColumnVector::ptr on the CPU side).

    I would like to discourage again this usage of CUDA specifics in OpenLB Per-Cell operators. It very likely won’t do what you expect, definitely won’t work across blocks (without additional custom communication work on your side), only work on GPUs and may break at any time in future OpenLB releases (as the entire point of this abstract operator style is to be platform agnostic).

    The intended way of implementing such custom logic is in OperatorScope::PerBlock operators. E.g. this is how the statistics aggregation is implemented for the different block platforms.

    If you still want to go ahead with this: Yes, if you provide the correct arguments to atomicMin this will be executed when calling this operator on GPU in OpenLB.

    #9268
    Danial.Khazaeipoul
    Participant

    Thank you Adrian for your detailed response.

    The algorithm I’m implementing is a variant of the Connected Component Labeling (CCL) algorithm that eliminates the need for a queue or stack, as used in traditional flood-fill algorithms, which are not feasible on GPUs. Additionally, fixed-size arrays are not optimal in this context. In general, a CCL algorithm consists of three distinct kernels implemented using the apply operator within the freeSurfacePostProcessor3D. In the first two kernels, only local cell data is modified. However, the third kernel modifies a non-local cell data, which can result in overlapping updates, potentially causing a race condition. Therefore, modifications in this kernel need to be atomic. Despite not yet using atomic operations in the third kernel, the algorithm still assigns a unique ID to each connected region. For example, when tested with 15 connected regions, it successfully assigned a unique ID to each one, tested on both CPUs and GPUs.

    The only issue at the moment is that if a connected region is split across partitions when the domain is partitioned, each section of the region receives a different unique ID in each partition. However, if the connected region remains entirely within a single partition, the parallel runs work correctly. The problem arises when the region moves and crosses partition boundaries, leading to the scenario where the region gets split and different unique IDs are assigned.

    Regarding the OperatorScope::PerBlock method you suggested, running grep -r "OperatorScope::PerBlock" didn’t return many relevant results for me to review. I’ll continue digging deeper to find more information on this topic. In the meantime, I would appreciate it if you could point me to an example that demonstrates the use of this method within OpenLB.

    #9274
    Adrian
    Keymaster

    Ok, you obviously invested a lot of thought into how to implement this already (thanks for sharing and sorry for my repeated doubts, I am not used to people in this forum working at this level in the code)!

    In this case I suggest you go ahead with the atomic-based approach you outlined and we go from there.

    The PerBlock-scope simply allows you to access the entire concrete block instance (i.e. without most of the virtual interface layers so you will be able to access the raw memory from there). Currently, the only public user of this approach is the StatisticsPostProcessor, you can check out the GPU implementation in src/core/platform/gpu/cuda/statistics.hh.

    #9280
    Danial.Khazaeipoul
    Participant

    I understand answering a diverse range of questions here in the forum could be quite demanding and appreciate the fact that you’re taking your time to assist us getting started with OpenLB. I’m also considering attending the upcoming spring school.

    In this case I suggest you go ahead with the atomic-based approach you outlined and we go from there.

    I am going to try adopting the code to use the atomicMin operation in the third kernel. For now, I’m performing a few preliminary code test before moving to the next step. However, the following test code gives a compile-time error which I believe it’s due to the fact that getComponentPointer is defined as protected:

    auto bubbleIDPtr = cell.template getFieldPointer<FreeSurface::TEST_ID>().getComponentPointer(0);

    error: function "olb::ColumnVector<COLUMN, D>::ptr::getComponentPointer(unsigned int) [with COLUMN=olb::cpu::sisd::Column<olb::CellID>, D=1U]" ../../../src/core/columnVector.h(238): here is inaccessible

    error: function "olb::gpu::cuda::FieldPtr<T, DESCRIPTOR, FIELD>::getComponentPointer(unsigned int) [with T=double, DESCRIPTOR=olb::descriptors::D3Q27<olb::descriptors::FORCE, olb::FreeSurface::MASS, olb::FreeSurface::EPSILON, olb::FreeSurface::CELL_TYPE, olb::FreeSurface::CELL_FLAGS, olb::FreeSurface::TEMP_MASS_EXCHANGE, olb::FreeSurface::PREVIOUS_VELOCITY, olb::FreeSurface::TEST_ID>, FIELD=olb::FreeSurface::TEST_ID]"
    ../../../src/core/platform/gpu/cuda/context.hh(113): here is inaccessible
    #9287
    Danial.Khazaeipoul
    Participant

    FYI, I have modified getComponentPointer to be a public member rather than protected in the (1) core/columnVector.h, (2) core/vector.h, (3) core/fieldArrayD.h, and (4) core/platform/gpu/cuda/context.hh. With these modifications, the atomicMin function can now be used as below:

    auto nextID = atomicMin(nextCell.template getFieldPointer<FreeSurface::TEST_ID>().getComponentPointer(0), nbrID);

    With the mentioned modifications and the use of atomicMin operation, the code now compiles and runs correctly, just as it did with the non-atomic version. I can guess that you prefer not to modify the core/ in this way as it might introduce memory-safety issues with underlying data now exposed through public methods. However, I am not sure what would be a better alternative to be able to access raw pointers with in an apply operator without modifying the core/ as above.

    #9320
    Danial.Khazaeipoul
    Participant

    Dear Adrian,

    I believe I’m encountering an issue with the cell IDs returned by the getCellId method when the domain is divided into separate partitions. Does this method return a cell ID that is local to a specific block?

    After running a few tests, it appears that the method indeed returns a cell ID that is local within each block. Is there an existing method that can provide a global cell ID, unique to each cell across the entire domain, regardless of partitioning?

    #9321
    mathias
    Keymaster

    The global CellId is just the localR = {cuboidNumber, localCellId}, so it is 4 dimentional vecotor. However, usually one works with ccordinates in SI units (physR).

    Best
    Mathias

Viewing 15 posts - 16 through 30 (of 40 total)
  • You must be logged in to reply to this topic.