Skip to content

Reply To: GPU and calculation time

#9932
sfraniatte
Participant

Yes, I can. Here is the good .mk file :

# Example build config for OpenLB using CUDA on single GPU systems
#
# Tested using CUDA 11.4
#
# Usage:
# – Copy this file to OpenLB root as config.mk
# – Adjust CUDA_ARCH to match your specifc GPU
# – Run make clean; make
# – Switch to example directory, e.g. examples/laminar/cavity3dBenchmark
# – Run make
# – Start the simulation using ./cavity3d

CXX := nvcc
CC := nvcc

CXXFLAGS := -O3
CXXFLAGS += -std=c++17 –forward-unknown-to-host-compiler

PARALLEL_MODE := NONE

PLATFORMS := CPU_SISD GPU_CUDA

# for e.g. RTX 30* (Ampere), see table in rules.mk for other options
CUDA_ARCH := 86

FLOATING_POINT_TYPE := float

USE_EMBEDDED_DEPENDENCIES := ON

###########################################################################################
Here is the first one which does not work very well :
# OpenLB build configuration
#
# This file sets up the necessary build flags for compiling OpenLB with
# the GNU C++ compiler and sequential execution. For more complex setups
# edit this file or consult the example configs provided in config/.
#
# Basic usage:
# – Edit variables to fit desired configuration
# – Run make clean; make to clean up any previous artifacts and compile the dependencies
# – Switch to example directory, e.g. examples/laminar/poiseuille2d
# – Run make
# – Start the simulation using ./poiseuille2d

# Compiler to use for C++ files, change to mpic++ when using OpenMPI and GCC
#~ #parallel CPU ou hybrid
#~ CXX := mpic++
#GPU
CXX := nvcc

# Compiler to use for C files (used for emebedded dependencies)
#parallel CPU ou hybrid
#~ CC := gcc
#GPU
CC := nvcc

# Suggested optimized build flags for GCC, consult config/ for further examples
#parallel CPU ou hybrid
#~ CXXFLAGS := -O3 -Wall -march=native -mtune=native
#GPU
CXXFLAGS := -O3
CXXFLAGS += –forward-unknown-to-host-compiler
# Uncomment to add debug symbols and enable runtime asserts
#~ #CXXFLAGS += -g -DOLB_DEBUG

# OpenLB requires support for C++17
# works in:
# * gcc 9 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx17)
# * icc 19.0 or later (https://software.intel.com/en-us/articles/c17-features-supported-by-intel-c-compiler)
# * clang 7 or later (https://clang.llvm.org/cxx_status.html#cxx17)
CXXFLAGS += -std=c++17

# optional linker flags
LDFLAGS :=

# Parallelization mode, must be one of: OFF, MPI, OMP, HYBRID
# Note that for MPI and HYBRID the compiler also needs to be adapted.
# See e.g. config/cpu_gcc_openmpi.mk
#parallel CPU
#~ PARALLEL_MODE := MPI
#GPU
PARALLEL_MODE := NONE
#~ #hybrid
#~ PARALLEL_MODE := HYBRID

# optional MPI and OpenMP flags
#parallel CPU
#~ MPIFLAGS :=
#~ OMPFLAGS := -fopenmp

# Options: CPU_SISD, CPU_SIMD, GPU_CUDA
# Both CPU_SIMD and GPU_CUDA require system-specific adjustment of compiler flags.
# See e.g. config/cpu_simd_intel_mpi.mk or config/gpu_only.mk for examples.
# CPU_SISD must always be present.
#parallel CPU
#~ PLATFORMS := CPU_SISD
#GPU
PLATFORMS := CPU_SISD GPU_CUDA
#hybrid
#~ PLATFORMS := CPU_SISD CPU_SIMD GPU_CUDA
#~ # Compiler to use for CUDA-enabled files
#~ CUDA_CXX := nvcc
#~ CUDA_CXXFLAGS := -O3 -std=c++17
#~ # Adjust to enable resolution of libcuda, libcudart, libcudadevrt
#~ CUDA_LDFLAGS := -L/run/libcuda/lib
#~ CUDA_LDFLAGS += -fopenmp
#~ #GPU ou hybrid
CUDA_ARCH := 86

# Fundamental arithmetic data type
# Common options are float or double
#parallel CPU
#~ FLOATING_POINT_TYPE := double
#GPU ou hybrid
FLOATING_POINT_TYPE := float

# Any entries are passed to the compiler as -DFEATURE_* declarations
# Used to enable some alternative code paths and dependencies
FEATURES :=

# Set to OFF if libz and tinyxml are provided by the system (optional)
USE_EMBEDDED_DEPENDENCIES := ON

###################################################################################
Also, I am trying to run my case on GPU but it is really too slow. The main difference with aorta example (which works well for me now) is the surface of the inlet (which is really bigger) and the fact that there are external edges and corners (the inlet has 5 faces). I am working on cleaning my code to have something as in the nozzle example (with stlReader). It can be that but I am not sure.

Thank you !