We have just released a new video on our OpenLB YouTube Channel:
Heterogeneous Load Balancing in OpenLB: Cooperatively Utilizing CPUs and GPUs for a Turbulent Mixing Simulation
Following up on the turbulent micromixer simulation showcased here, the present video illustrates OpenLB’s heterogeneous computation capabilities.
The performance of the simulation case is improved by up to 87% when using heterogeneous CPU-GPU based compared to GPU-only execution. This is achived by distributing the two computationally expensive turbulent inlet regions onto CPUs while the comparatively cheaper bulk regions are processed on GPUs. The underlying inhomogeneous spatial domain decomposition was obtained using a novel genetic algorithm for cost-aware optimization.
A single accelerated CPU-GPU node of the HoreKa supercomputer (2x Intel Xeon Platinum 8368, 4x NVIDIA A100) was used for the showcased simulation consisting of 355 million lattice cells.
OpenLB enabled the cooperative usage of MPI, OpenMP, AVX-512 vectorization and CUDA, reaching a throughput of ~19.25 billion (NSE-only) resp. ~4.79 billion cell updates per second for the fully coupled case.
Simulation setup: Fedor Bukreev
Heterogeneous Load Balancing, Performance engineering, Visualization: Adrian Kummerländer
For further information please visit the associated show case: Heterogeneous Load Balancing