Modifying field values for neighboring cells inside a postprocessor
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Modifying field values for neighboring cells inside a postprocessor
- This topic has 39 replies, 3 voices, and was last updated 1 week, 4 days ago by Adrian.
-
AuthorPosts
-
August 19, 2024 at 7:32 pm #9121Danial.KhazaeipoulParticipant
I did though about storing the cell IDs, instead of storing pointers to the cell interfaces. However, I would need to create a Cell interface to be able to modify a global data which requires a blockLattice to be passed as an argument:
Cell<T,DESCRIPTOR> cell(blockLattice, iCell);
From the user-guide, I can see that the OperatorScope can be set to PerBlock instead of PerCell or PerCellWithParameters. However, I am not sure if this works with the freeSurfacePostProcessor or if OpenLB provides a similar method for block lattice operator such as:
template <typename T, typename DESCRIPTOR>
template <typename BLOCK>
void TestPostProcessor3D<T, DESCRIPTOR>::apply(BLOCK& block)
{}August 19, 2024 at 7:39 pm #9122Danial.KhazaeipoulParticipantI could potentially simplify this by implementing the algorithm directly within the top-level example. However, I believe doing so would introduce unnecessary complexity and complicate future integration efforts. Instead, I’m working on a more general integration approach, allowing users to easily utilize it in the same way they currently use the free-surface dynamics feature when the code is ready to merge.
August 20, 2024 at 8:14 pm #9125Danial.KhazaeipoulParticipantIn regard to this part of your question:
“I assume you want to preserve the queue across operator applications?”
No, I don’t need to preserve the queue, it is only used as a temporary container to store the cells or their ids to search for the connected gas or interface cells for which the global data, e.g., TEST_ID, is to be modified. Each operator call is going to build its own queue and discard it when the global data is modified.
August 21, 2024 at 9:21 am #9128AdrianKeymasterOk, good. This means that you can build on your previous post and store e.g. copies of the cell interfaces instead of pointers or just plain cell IDs and apply them using the
CELL::setCellID
method.August 22, 2024 at 12:25 am #9131Danial.KhazaeipoulParticipantThank you Adrian,
I attempted to follow your suggestion and only stored the “CellIDs” in the queue. However, it seems that the CUDA platform does not provide the “CELL::setCellId”. I did check the core/platform directory and only the CPU platform implements the “CELL::setCellId” method to be used. The code compiles and runs on CPU platform but it fails to compile on GPUs:
error: class “olb::gpu::cuda::Cell<T, NavierStokesDescriptor>” has no member “setCellId”
August 22, 2024 at 7:47 am #9133AdrianKeymasterYou are right. Luckily this is easy to fix.
August 22, 2024 at 9:12 pm #9144Danial.KhazaeipoulParticipantThank you for your assistance. FYI, the cell interface for cuda does also miss a getCellId() method which I added per your suggestion:
CellID getCellId() const __device__ {
return this->_iCell;
}September 12, 2024 at 7:21 pm #9235Danial.KhazaeipoulParticipantDear Adrian,
Is it possible to perform atomic operations within the operators?
I am attempting to do atomicMin within the apply() operator as below but I get a compile time error that no instances of the overloaded function “atomicMin” matches the argument provided:reducedID = atomicMin(cell.template getFieldPointer<FreeSurface::TEST_ID>(), nbrID);
Note that, nbrID and the TEST_ID fields are of the type CellID. Moreover, the function itself requires a memory address and a value as below:
int atomicMin(int* address, int val); unsigned int atomicMin(unsigned int* address, unsigned int val); unsigned long long int atomicMin(unsigned long long int* address, unsigned long long int val); long long int atomicMin(long long int* address, long long int val);
- This reply was modified 2 months, 3 weeks ago by Danial.Khazaeipoul.
September 17, 2024 at 10:50 am #9250AdrianKeymasterAssuming that CUDA’s
atomicMin
is found at all (and there is not some weird namespace issue due to CUDA placing this stuff in the root namespace) the issue is thatCELL::getFieldPointer
returns a OpenLB (and platform) specific wrapper to resolve the underlying Structure of Arrays (preventing us from returning a raw pointer to some field value, see e.g.ColumnVector::ptr
on the CPU side).I would like to discourage again this usage of CUDA specifics in OpenLB Per-Cell operators. It very likely won’t do what you expect, definitely won’t work across blocks (without additional custom communication work on your side), only work on GPUs and may break at any time in future OpenLB releases (as the entire point of this abstract operator style is to be platform agnostic).
The intended way of implementing such custom logic is in
OperatorScope::PerBlock
operators. E.g. this is how the statistics aggregation is implemented for the different block platforms.If you still want to go ahead with this: Yes, if you provide the correct arguments to
atomicMin
this will be executed when calling this operator on GPU in OpenLB.September 19, 2024 at 7:48 pm #9268Danial.KhazaeipoulParticipantThank you Adrian for your detailed response.
The algorithm I’m implementing is a variant of the
Connected Component Labeling (CCL)
algorithm that eliminates the need for a queue or stack, as used in traditional flood-fill algorithms, which are not feasible on GPUs. Additionally, fixed-size arrays are not optimal in this context. In general, a CCL algorithm consists of three distinct kernels implemented using the apply operator within thefreeSurfacePostProcessor3D
. In the first two kernels, only local cell data is modified. However, the third kernel modifies a non-local cell data, which can result in overlapping updates, potentially causing a race condition. Therefore, modifications in this kernel need to be atomic. Despite not yet using atomic operations in the third kernel, the algorithm still assigns a unique ID to each connected region. For example, when tested with 15 connected regions, it successfully assigned a unique ID to each one, tested on both CPUs and GPUs.The only issue at the moment is that if a connected region is split across partitions when the domain is partitioned, each section of the region receives a different unique ID in each partition. However, if the connected region remains entirely within a single partition, the parallel runs work correctly. The problem arises when the region moves and crosses partition boundaries, leading to the scenario where the region gets split and different unique IDs are assigned.
Regarding the
OperatorScope::PerBlock
method you suggested, runninggrep -r "OperatorScope::PerBlock"
didn’t return many relevant results for me to review. I’ll continue digging deeper to find more information on this topic. In the meantime, I would appreciate it if you could point me to an example that demonstrates the use of this method within OpenLB.September 23, 2024 at 9:30 am #9274AdrianKeymasterOk, you obviously invested a lot of thought into how to implement this already (thanks for sharing and sorry for my repeated doubts, I am not used to people in this forum working at this level in the code)!
In this case I suggest you go ahead with the atomic-based approach you outlined and we go from there.
The PerBlock-scope simply allows you to access the entire concrete block instance (i.e. without most of the virtual interface layers so you will be able to access the raw memory from there). Currently, the only public user of this approach is the
StatisticsPostProcessor
, you can check out the GPU implementation insrc/core/platform/gpu/cuda/statistics.hh
.September 23, 2024 at 10:45 pm #9280Danial.KhazaeipoulParticipantI understand answering a diverse range of questions here in the forum could be quite demanding and appreciate the fact that you’re taking your time to assist us getting started with OpenLB. I’m also considering attending the upcoming spring school.
In this case I suggest you go ahead with the atomic-based approach you outlined and we go from there.
I am going to try adopting the code to use the atomicMin operation in the third kernel. For now, I’m performing a few preliminary code test before moving to the next step. However, the following test code gives a compile-time error which I believe it’s due to the fact that
getComponentPointer
is defined asprotected
:auto bubbleIDPtr = cell.template getFieldPointer<FreeSurface::TEST_ID>().getComponentPointer(0);
error: function "olb::ColumnVector<COLUMN, D>::ptr::getComponentPointer(unsigned int) [with COLUMN=olb::cpu::sisd::Column<olb::CellID>, D=1U]" ../../../src/core/columnVector.h(238): here is inaccessible
error: function "olb::gpu::cuda::FieldPtr<T, DESCRIPTOR, FIELD>::getComponentPointer(unsigned int) [with T=double, DESCRIPTOR=olb::descriptors::D3Q27<olb::descriptors::FORCE, olb::FreeSurface::MASS, olb::FreeSurface::EPSILON, olb::FreeSurface::CELL_TYPE, olb::FreeSurface::CELL_FLAGS, olb::FreeSurface::TEMP_MASS_EXCHANGE, olb::FreeSurface::PREVIOUS_VELOCITY, olb::FreeSurface::TEST_ID>, FIELD=olb::FreeSurface::TEST_ID]" ../../../src/core/platform/gpu/cuda/context.hh(113): here is inaccessible
- This reply was modified 2 months, 1 week ago by Danial.Khazaeipoul.
- This reply was modified 2 months, 1 week ago by Danial.Khazaeipoul.
- This reply was modified 2 months, 1 week ago by Danial.Khazaeipoul.
September 24, 2024 at 6:02 am #9287Danial.KhazaeipoulParticipantFYI, I have modified
getComponentPointer
to be apublic
member rather thanprotected
in the (1)core/columnVector.h
, (2)core/vector.h
, (3)core/fieldArrayD.h
, and (4)core/platform/gpu/cuda/context.hh
. With these modifications, theatomicMin
function can now be used as below:auto nextID = atomicMin(nextCell.template getFieldPointer<FreeSurface::TEST_ID>().getComponentPointer(0), nbrID);
With the mentioned modifications and the use of
atomicMin
operation, the code now compiles and runs correctly, just as it did with the non-atomic version. I can guess that you prefer not to modify thecore/
in this way as it might introduce memory-safety issues with underlying data now exposed through public methods. However, I am not sure what would be a better alternative to be able to access raw pointers with in anapply
operator without modifying thecore/
as above.- This reply was modified 2 months, 1 week ago by Danial.Khazaeipoul.
October 5, 2024 at 7:02 am #9320Danial.KhazaeipoulParticipantDear Adrian,
I believe I’m encountering an issue with the cell IDs returned by the
getCellId
method when the domain is divided into separate partitions. Does this method return a cell ID that is local to a specific block?After running a few tests, it appears that the method indeed returns a cell ID that is local within each block. Is there an existing method that can provide a global cell ID, unique to each cell across the entire domain, regardless of partitioning?
October 5, 2024 at 11:08 am #9321mathiasKeymasterThe global CellId is just the localR = {cuboidNumber, localCellId}, so it is 4 dimentional vecotor. However, usually one works with ccordinates in SI units (physR).
Best
Mathias -
AuthorPosts
- You must be logged in to reply to this topic.