Adrian, Author at OpenLB - Open source lattice Boltzmann code

New Video on our upcoming Fluid Structure Interaction module

Published by Adrian on July 14, 2024

We have just released a new video on our OpenLB YouTube Channel:

OpenLB Development Preview: Large Eddy Lattice Boltzmann Simulation of a Wind Park

This is a first experimental showcase of OpenLB’s upcoming general purpose fluid structure interaction (FSI) capabilities. Visualized are various viewpoints on the vorticity norm of a two-way coupled four-turbine wind park setup with Reynolds number 1.2 Million. The simulation consisting of 1.5 billion cells utilized a single accelerated compute node of 4x NVIDIA H100 GPGPUs.

Computed on HoreKa Teal at KIT, the world’s sixth most energy efficient supercomputer.

Simulation & Visualization by Adrian Kummerländer

Visualization was generated in ParaView.

OpenLB Release 1.7 available for download

OpenLB Release 1.7 available for download

Published by Adrian on February 29, 2024

The developer team is very happy to announce the release of the next version of OpenLB. The updated open-source Lattice Boltzmann (LB) code is now available for download.

Download

Major changes include the adaptation of many existing models into the GPU-supporting operator style, a validated turbulent velocity inlet condition and a special focus on new multi phase and particle models. This is augmented by a collection of bugfixes and general usability improvements.

For the first time, the new release is also available in a new public Git repository together with all previous releases. We encourage everyone to submit contributions as merge requests and report issues there.

Core development continues within the existing private repository which is available to consortium members.

Release notes

New features and improvements

Many existing models converted to the operator-style (“GPU support”)
New multi phase models, interaction potentials and examples
New Unit Converter for multi phase simulations
New validated turbulent inlet condition Vortex Method
New particle decomposition scheme that improves parallel performance of fully resolved particulate flow simulations using HLBM
New boundary condition zero gradient
Tidy up, (performance) improvements of optimization code
Optional support for loading porosity data using OpenVDB voxel volumes

New examples

multiComponent/airBubbleCoalescence3d
multiComponent/waterAirflatInterface2d
advectionDiffusionReaction/longitudinalMixing3d
advectionDiffusionReaction/convectedPlate3d
porousMedia/city3d
porousMedia/resolvedRock3d

Examples with full GPU support

turbulence/tgv3d
turbulence/nozzle3d
turbulence/venturi3d
turbulence/aorta3d
laminar/poiseuille(2,3)d
laminar/poiseuille(2,3)dEoc
laminar/cylinder(2,3)d
laminar/bstep(2,3)d
laminar/cavity(2,3)d
laminar/cavity3dBenchmark
laminar/testFlow3dSolver
laminar/powerLaw2d
laminar/cavity2dSolver
multiComponent/fourRollMill2d
multiComponent/rayleighTaylor3d
multiComponent/youngLaplace3d
multiComponent/binaryShearFlow2d
multiComponent/microFluidics2d
multiComponent/contactAngle(2,3)d
multiComponent/phaseSeparation(2,3)d
multiComponent/rayleighTaylor2d
multiComponent/airBubbleCoalescence3d
multiComponent/waterAirflatInterface2d
multiComponent/youngLaplace2d
advectionDiffusionReaction/advectionDiffusion(1,2,3)d
advectionDiffusionReaction/convectedPlate3d
thermal/squareCavity2d
thermal/porousPlate(2,3)d
thermal/squareCavity3d
thermal/rayleighBenard(2,3)d
porousMedia/city3d
porousMedia/resolvedRock3d
freeSurface/fallingDrop(2,3)d
freeSurface/breakingDam(2,3)d
freeSurface/rayleighInstability3d
freeSurface/deepFallingDrop2d

Citation

If you want to cite OpenLB 1.7 you can use:

A. Kummerländer, T. Bingert, F. Bukreev, L. Czelusniak, D. Dapelo, N. Hafen, M. Heinzelmann, S. Ito, J. Jeßberger, H. Kusumaatmaja, J.E. Marquardt, M. Rennick, T. Pertzel, F. Prinz, M. Sadric, M. Schecher, S. Simonis, P. Sitter, D. Teutscher, M. Zhong, and M.J. Krause.

OpenLB Release 1.7: Open Source Lattice Boltzmann Code.

Version 1.7. Feb. 2024.

DOI: 10.5281/zenodo.10684609

General metadata is also available as a CITATION.cff file following the standard Citation File Format (CFF).

Supported Systems

OpenLB is able to utilize vectorization (AVX2/AVX-512) on x86 CPUs [1] and NVIDIA GPUs for block-local processing. CPU targets may additionally utilize OpenMP for shared memory parallelization while any communication between individual processes is performed using MPI.

It has been successfully employed for simulations on computers ranging from low-end smartphones over multi-GPU workstations up to supercomputers and even runs in your browser.

The present release has been explicitly tested in the following environments:

Red Hat Enterprise Linux 8.x (HoreKa, BwUniCluster2)
NixOS 22.11, 23.11 and unstable (Nix Flake provided)
Ubuntu 20.04 and newer
Windows 10, 11 via WSL
Mac OS Ventura 13.6.3

[1]: Other CPU targets are also supported, e.g. common Smartphone ARM CPUs and Apple M1/M2.

OpenLB Release 1.6 available for download

OpenLB Release 1.6 available for download

Published by Adrian on April 5, 2023

The developer team is very happy to announce the release of the next version of OpenLB. The updated open-source Lattice Boltzmann (LB) code is now available for download.

Download

Major new features include performance-optimized and GPU-enabled multi-lattice coupling alongside a new subgrid-scale particle system. This is augmented by a rich collection of bugfixes and general usability improvements.

Release notes

Major new features

New performance-optimized and GPU-enabled multi-lattice coupling
New subgrid-scale particle system

General improvements

New GPU-enabled Bouzidi implementation
Alternative handling of Bouzidi distances using new Yu post processor
GPU support for 3D free surface simulations
General usability improvements to dynamics, non-local, coupling operator parameterization
Support for asynchronous background post-processing / VTK output in GPU-based simulations
Support for heterogeneous simulations
Mixed compilation mode enabling different compilers for SIMD / GPU platforms
Reproducible compilation environments declared using Nix Flakes

New examples

adsorption/adsorption3d
adsorption/microMixer3d
reaction/advectionDiffusionReaction2d(Solver)
reaction/reaction2d
optimization/domainIdentification3d
optimization/domainIdentificationPoiseuille2d
optimization/showcaseADf
optimization/showcaseRosenbrock
optimization/testFlowOpti3d
freeSurface/breakingDam3d

Examples with full GPU support

turbulence/nozzle3d
turbulence/aorta3d
turbulence/venturi3d
turbulence/tgv3d
laminar/powerLaw2d
laminar/poiseuille(2,3)d
laminar/bstep(2,3)d
laminar/cylinder(2,3)d
laminar/cavity(2,3)d
laminar/cavity3dBenchmark
laminar/poiseuille(2,3)dEoc
freeSurface/fallingDrop(2,3)d
freeSurface/deepFallingDrop2d
freeSurface/rayleighInstability3d
freeSurface/breakingDam(2,3)d
advectionDiffusionReaction/advectionDiffusion(1,2,3)d
multiComponent/phaseSeparation(2,3)d
multiComponent/rayleighTaylor(2,3)d
thermal/squareCavity(2,3)d
thermal/rayleighBenard(2,3)d

Coupling in Action

Analogously to lattice-local post processors, inter-lattice coupling operators may now be declared as plain classes consisting of application scope, parameters and a generic apply method. For illustration we can consider the coupling between two lattices, targeting Navier Stokes and Advection Diffusion respectively, using the Boussinesq approximation:

struct NavierStokesAdvectionDiffusionCoupling {
  // Declare that we want cell-wise coupling with some global parameters
  static constexpr OperatorScope scope = OperatorScope::PerCellWithParameters;

  // Declare the two parameters custom to this coupling operator
  struct FORCE_PREFACTOR : public descriptors::FIELD_BASE<0,1> { };
  struct T0 : public descriptors::FIELD_BASE<1> { };

  // Declare which parameters are required
  using parameters = meta::list<FORCE_PREFACTOR,T0>;

  template <typename CELLS, typename PARAMETERS>
  void apply(CELLS& cells, PARAMETERS& parameters) any_platform
  {
    // Get the cell of the NavierStokes lattice
    auto& cellNSE = cells.template get<names::NavierStokes>();
    // Get the cell of the Temperature lattice
    auto& cellADE = cells.template get<names::Temperature>();

    // Computation of the Bousinessq force
    auto forcePrefactor = parameters.template get<FORCE_PREFACTOR>();
    auto temperatureDifference = cellADE.computeRho() - parameters.template get<T0>();
    auto bousinessqForce = forcePrefactor * temperatureDifference;
    cellNSE.template setField<descriptors::FORCE>(boussinesqForce);

    // Velocity coupling
    auto u = cellADE.template getField<descriptors::VELOCITY>();
    cellNSE.computeU(u.data());
    cellADE.template setField<descriptors::VELOCITY>(u);
  }
};

Coupling operators are instantiated using the SuperLatticeCoupling class template provided with a list of names and assigned lattices.

SuperLattice<T,DESCRIPTOR_NSE> sLatticeNSE(sGeometry);
SuperLattice<T,DESCRIPTOR_ADE> sLatticeADE(sGeometry);
// [...]
SuperLatticeCoupling coupling(
  NavierStokesAdvectionDiffusionCoupling{},
  names::NavierStokes{}, sLatticeNSE,  // `sLatticeNSE` will be referred to by `names::NavierStokes`
  names::Temperature{},  sLatticeADE); // `sLatticeADE` will be referred to by `names::Temperature`
coupling.setParameter<NavierStokesAdvectionDiffusionCoupling::T0>(...);
coupling.setParameter<NavierStokesAdvectionDiffusionCoupling::FORCE_PREFACTOR>(...);
// [...]
coupling.execute();

All coupling operators that are implemented in this new more compact style will transparently work on all of OpenLB’s target platforms, including GPUs.

Mixed compilation mode

Different from the initial GPU-supporting release OpenLB 1.5, where the entire code had to be compiled using nvcc and MPI support required manual definition of the relevant include and linker flags, this new release offers a more fine grained mixed compilation mode.

Specifically, it is possible to specify different compilers for the GPU_CUDA platform and the CPU-targeting platforms within the same build. This way, the GPU-side of things is automatically compiled into a separate shared library that is linked to the core application. Such separation is essential for fully supporting the vectorized CPU_SIMD platform alongside GPU_CUDA in a single heterogeneous executable.

Analogously to other compilation modes, example configs are provided in config/.

CXX             := mpic++
CC              := gcc

# Compiler flags for the core application and `CPU_*` platform support
CXXFLAGS        := -O3 -Wall -march=native -mtune=native
CXXFLAGS        += -std=c++17

# Parallel mode, one of `NONE`, `MPI` or `HYBRID`
PARALLEL_MODE   := MPI

# Platforms, optionally add `CPU_SIMD` for vectorized CPU execution
PLATFORMS       := CPU_SISD GPU_CUDA

# Compiler to use for the `GPU_CUDA` platform
CUDA_CXX        := nvcc
CUDA_CXXFLAGS   := -O3 -std=c++17
# Adjust to enable resolution of libcuda, libcudart, libcudadevrt
CUDA_LDFLAGS    := -L/run/opengl-driver/lib
# for e.g. RTX 30* (Ampere), see table in `rules.mk` for other options
CUDA_ARCH       := 86

# Default floating point type
FLOATING_POINT_TYPE := float

# Set to `OFF` if tinyxml and zlib are provided by the environment
USE_EMBEDDED_DEPENDENCIES := ON

The mixed mode is automatically enabled as soon as a separate CUDA compiler is specified using the CUDA_CXX environment variable. Following this, the compilation of the core library and individual applications is identical from the user’s perspective.

One additional advantage is that the compilation-time-intensive GPU kernels do not need to be recompiled for every code change. Instead make no-cuda-recompile allows for compiling the core application without GPU re-instantiation as long as no new operators are introduced (e.g. if only the geometry setup, parameters or post processing is changed after an initial full compilation).

For convenience, various tested compilation environments are reproducibly declared using Nix Flakes. E.g. instantiating a Multi-GPU compilation environment is as easy as removing the default config.mk and calling nix develop .#env-gcc-openmpi-cuda in the OpenLB root.

A guide for setting up (Multi-)GPU support for OpenLB on Windows WSL is also available (PDF).

Citation

If you want to cite OpenLB 1.6 you can use:

A. Kummerländer, S. Avis, H. Kusumaatmaja, F. Bukreev, M. Crocoll, D. Dapelo, N. Hafen, S. Ito, J. Jeßberger, J.E. Marquardt, J. Mödl, T. Pertzel, F. Prinz, F. Raichle, M. Schecher, S. Simonis, D. Teutscher, and M.J. Krause.

OpenLB Release 1.6: Open Source Lattice Boltzmann Code.

Version 1.6. Apr. 2023.

DOI: 10.5281/zenodo.7773497

General metadata is also available as a CITATION.cff file following the standard Citation File Format (CFF).

olb16.bib Download

Supported Systems

It has been successfully employed for simulations on computers ranging from low-end smartphones up to supercomputers.

The present release has been explicitly tested in the following environments:

NixOS 22.11 and unstable (Nix Flake provided)
Ubuntu 20.04, 22.04
Red Hat Enterprise Linux 8.x (HoreKa, BwUniCluster2)
Windows 10, 11 (WSL)
MacOS 13

as well as compilers:

GCC 9 and later
Clang 13 and later
Intel C++ 2021.4 and later
NVIDIA CUDA 11.4 and later
NVIDIA HPC SDK 21.3 and later
MPI libraries OpenMPI 3.1, 4.1 (CUDA-awareness required for Multi-GPU); Intel MPI 2021.3.0 and later

[1]: Other CPU targets are also supported, e.g. common Smartphone ARM CPUs and Apple M1/M2.

Recent Performance Benchmarks of OpenLB 1.5 on the HoreKa Supercomputer at KIT

Published by Adrian on November 24, 2022

Following up on the performance-focused release of OpenLB 1.5 we updated our Performance showcases to include scalability plots on up to 128 CPU-only resp. Multi-GPU nodes of the HoreKa supercomputer at the Karlsruhe Institute of Technology (KIT). These results were presented at the 25th Results and Review Workshop of the HLRS this October and are accepted for publication in the annual proceedings on High Performance Computing in Science and Engineering.

The following plots document the per-node performance in Billions of Cell Updates per Second (GLUPs) for various problem sizes of the established lid driven cavity benchmark case. Highlights include weak scaling efficiencies up to 1.01 for hybrid AVX-512 vectorized CPU resp. up to 0.9 for CUDA GPU execution alongside a total peak performance of 1.33 Trillion Cell Updates per Second when using 512 NVIDIA A100 GPUs. Further details including individual strong scaling values are available in the performance section.

Scalability of OpenLB 1.5 on HoreKa using hybrid execution (MPI + OpenMP + AVX-512 Vectorization)

Scalability of OpenLB 1.5 on HoreKa using multi GPU execution (MPI + CUDA)

Plots, vectorization and CUDA GPU implementation contributed by Adrian Kummerländer.

A. Kummerländer, F. Bukreev, S. Berg, M. Dorn and M.J. Krause. Advances in Computational Process Engineering using Lattice Boltzmann Methods on High Performance Computers for Solving Fluid Flow Problems. In: High Performance Computing in Science and Engineering ’22 (accepted).

Author: Adrian

New Video on our upcoming Fluid Structure Interaction module

OpenLB Development Preview: Large Eddy Lattice Boltzmann Simulation of a Wind Park

OpenLB Release 1.7 available for download

Release notes

New features and improvements

New examples

Examples with full GPU support

Citation

Supported Systems

OpenLB Release 1.6 available for download

Release notes

Major new features

General improvements

New examples

Examples with full GPU support

Coupling in Action

Mixed compilation mode

Citation

Supported Systems

Recent Performance Benchmarks of OpenLB 1.5 on the HoreKa Supercomputer at KIT

Scalability of OpenLB 1.5 on HoreKa using hybrid execution (MPI + OpenMP + AVX-512 Vectorization)

Scalability of OpenLB 1.5 on HoreKa using multi GPU execution (MPI + CUDA)