Number of Cuboids for Geometry related to Parallel Compilation?
January 24, 2022 at 7:24 pm #6272
I would like to know the significance of the number of cuboids defined for the geometry. Is it related to parallelization of the code? If then, what would be an ideal number of cuboids while running the code on the cluster?
In one of the code examples,
int noOfCuboids = std::max( 16, 4 * singleton::mpi().getSize() );
I would appreciate your suggestion on this matter.
Abhijeet C.January 25, 2022 at 2:19 pm #6277AdrianKeymaster
Yes, the cuboid decomposition of the geometry and as such also the number of these cuboids is essential for the parallelization. Each cuboid needs to communicate only with its direct neighbors (w.r.t. a single timestep). This communication is realized via overlap areas that are synchronized between the blocks. This is a very common pattern for parallelizing lattice-based codes.
While this is not strictly necessary for OpenMP-only (i.e. shared memory) parallelization, it is often still advantageous to do so in order to reduce the number of superfluous cells.
As such, the ideal number of cuboids is rather problem-dependent. Choosing one cuboid per process is a good base choice but in general you’ll have to benchmark if you want to optimize performance under this parameter.
For more information you can check out e.g. our overview paper, starting with figures 2 and 3.January 25, 2022 at 5:30 pm #6279
Thank you for your prompt response and explanation. You mentioned earlier that we need to assign each cuboid per process. So, Will the number of cuboids be equivalent to the number of the processes(i.e tasks) defined in the SLURM script? When you say that it is problem dependent, does it mean that it is geometry dependent? If the geometry is complex, do we need to define more number of cuboids? Did the group conduct any testing to define the range of the number of cuboids to be used in regard to the geometry?
Abhijeet C.January 27, 2022 at 10:55 am #6286AdrianKeymaster
At a minimum you need one cuboid per process – otherwise processes without a cuboid can perform no work as no part of the simulation space belongs to them. If you initialize the number of cuboids with some multiple of
singleton::mpi().getSize()this is automatically the case. This is also how it is implemented for our examples, you do not need to change anything here to run it with SLURM / on a HPC cluster.
Yes, “problem dependent” meant to include “geometry dependent”. You do not necessarily need to increase the cuboid count for complex geometries (what complex even means is not well defined of course :-))
In general I would advise to use one cuboid per process. You can then check if there are lots of unused areas and refine the number of cuboids further. However all of this needs to happen while benchmarking as a lower overhead of empty cells will not necessarily translate into better performance. e.g. more cuboids will require more communication.January 27, 2022 at 5:34 pm #6289
This explanation helps a lot. Thank you. I really appreciate your help.
- You must be logged in to reply to this topic.