Skip to content

GPU and calculation time

Due to recent bot attacks, newly registered users are required to send a message via our contact form in order to be able to post in the forum.
Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #9923
    sfraniatte
    Participant

    Dear community,

    I’m trying to run some examples (cavity3dBenchmark and nozzle3d) on a GPU (NVIDIA RTX A6000), but I’m observing an unexpected evolution of MLUPS. When the number of voxels is small (matching the default values in the examples), the computation on the GPU is indeed faster than on 32 CPU cores. However, when I set the resolution to N=15 for the nozzle example, the CPU computation becomes faster than the GPU computation.

    Moreover, the MLUPS decrease as the resolution increases, and the same happens with GPU activity. The average GPU utilization percentage decreases as the number of voxels increases (I checked this using nvtop). Is this normal?

    I should mention that I am correctly using gpu_only.mk for compilation and that I run make clean; make every time I compile. Additionally, for the nozzle3d example, I reach an average of 185 MLUPS (with peaks at 1271), which seems quite low compared to the numbers achieved with NVIDIA A100 GPUs, especially since I’m running in single precision.

    Best regards,
    Sylvain

    #9924
    Adrian
    Keymaster

    This is unusual. Which is the exact command you use to run the nozzle case? Did you increase the frequency of VTK output compared to the default?
    Are gou using single or double precision for the value type?

    The low average leads me to believe that there is some issue with the VTK output (e.g. is not performed asynchronously or too often to hide it)

    The nozzle case uses a computationally complex turbulent inlet condition s.t. heterogeneous CPU-GPU use can provide an advantage to GPU only.

    I assume that you observe this only for the nozzle case and the cavity benchmark behaves as one would expect?

    #9929
    sfraniatte
    Participant

    I think I have solved my problem. The mistake was that I left the line “FEATURES :=” uncommented. Now, to be exhaustive, here the exact commands that I used :
    _ ./nozzle3d –resolution 5 –max-phys-t 10 : the calculation duration (measured time cpu) was 29.278s and an average MLUPS of 376.907
    _ ./nozzle3d –resolution 10 –max-phys-t 10 : the calculation duration (measured time cpu) was 501.328s and an average MLUPS of 686.931

    I ran these calculations today and the evolution seems good this time. Thanks a lot for your time and I hope this will be useful for someone.

    Best regards,
    Sylvain

    #9931
    Adrian
    Keymaster

    Happy to hear that it performes better now.

    However, this is very unlikely as the actual reason. The features array is not involved in any way for this.

    Cab you post your two configs?

    #9932
    sfraniatte
    Participant

    Yes, I can. Here is the good .mk file :

    # Example build config for OpenLB using CUDA on single GPU systems
    #
    # Tested using CUDA 11.4
    #
    # Usage:
    # – Copy this file to OpenLB root as config.mk
    # – Adjust CUDA_ARCH to match your specifc GPU
    # – Run make clean; make
    # – Switch to example directory, e.g. examples/laminar/cavity3dBenchmark
    # – Run make
    # – Start the simulation using ./cavity3d

    CXX := nvcc
    CC := nvcc

    CXXFLAGS := -O3
    CXXFLAGS += -std=c++17 –forward-unknown-to-host-compiler

    PARALLEL_MODE := NONE

    PLATFORMS := CPU_SISD GPU_CUDA

    # for e.g. RTX 30* (Ampere), see table in rules.mk for other options
    CUDA_ARCH := 86

    FLOATING_POINT_TYPE := float

    USE_EMBEDDED_DEPENDENCIES := ON

    ###########################################################################################
    Here is the first one which does not work very well :
    # OpenLB build configuration
    #
    # This file sets up the necessary build flags for compiling OpenLB with
    # the GNU C++ compiler and sequential execution. For more complex setups
    # edit this file or consult the example configs provided in config/.
    #
    # Basic usage:
    # – Edit variables to fit desired configuration
    # – Run make clean; make to clean up any previous artifacts and compile the dependencies
    # – Switch to example directory, e.g. examples/laminar/poiseuille2d
    # – Run make
    # – Start the simulation using ./poiseuille2d

    # Compiler to use for C++ files, change to mpic++ when using OpenMPI and GCC
    #~ #parallel CPU ou hybrid
    #~ CXX := mpic++
    #GPU
    CXX := nvcc

    # Compiler to use for C files (used for emebedded dependencies)
    #parallel CPU ou hybrid
    #~ CC := gcc
    #GPU
    CC := nvcc

    # Suggested optimized build flags for GCC, consult config/ for further examples
    #parallel CPU ou hybrid
    #~ CXXFLAGS := -O3 -Wall -march=native -mtune=native
    #GPU
    CXXFLAGS := -O3
    CXXFLAGS += –forward-unknown-to-host-compiler
    # Uncomment to add debug symbols and enable runtime asserts
    #~ #CXXFLAGS += -g -DOLB_DEBUG

    # OpenLB requires support for C++17
    # works in:
    # * gcc 9 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx17)
    # * icc 19.0 or later (https://software.intel.com/en-us/articles/c17-features-supported-by-intel-c-compiler)
    # * clang 7 or later (https://clang.llvm.org/cxx_status.html#cxx17)
    CXXFLAGS += -std=c++17

    # optional linker flags
    LDFLAGS :=

    # Parallelization mode, must be one of: OFF, MPI, OMP, HYBRID
    # Note that for MPI and HYBRID the compiler also needs to be adapted.
    # See e.g. config/cpu_gcc_openmpi.mk
    #parallel CPU
    #~ PARALLEL_MODE := MPI
    #GPU
    PARALLEL_MODE := NONE
    #~ #hybrid
    #~ PARALLEL_MODE := HYBRID

    # optional MPI and OpenMP flags
    #parallel CPU
    #~ MPIFLAGS :=
    #~ OMPFLAGS := -fopenmp

    # Options: CPU_SISD, CPU_SIMD, GPU_CUDA
    # Both CPU_SIMD and GPU_CUDA require system-specific adjustment of compiler flags.
    # See e.g. config/cpu_simd_intel_mpi.mk or config/gpu_only.mk for examples.
    # CPU_SISD must always be present.
    #parallel CPU
    #~ PLATFORMS := CPU_SISD
    #GPU
    PLATFORMS := CPU_SISD GPU_CUDA
    #hybrid
    #~ PLATFORMS := CPU_SISD CPU_SIMD GPU_CUDA
    #~ # Compiler to use for CUDA-enabled files
    #~ CUDA_CXX := nvcc
    #~ CUDA_CXXFLAGS := -O3 -std=c++17
    #~ # Adjust to enable resolution of libcuda, libcudart, libcudadevrt
    #~ CUDA_LDFLAGS := -L/run/libcuda/lib
    #~ CUDA_LDFLAGS += -fopenmp
    #~ #GPU ou hybrid
    CUDA_ARCH := 86

    # Fundamental arithmetic data type
    # Common options are float or double
    #parallel CPU
    #~ FLOATING_POINT_TYPE := double
    #GPU ou hybrid
    FLOATING_POINT_TYPE := float

    # Any entries are passed to the compiler as -DFEATURE_* declarations
    # Used to enable some alternative code paths and dependencies
    FEATURES :=

    # Set to OFF if libz and tinyxml are provided by the system (optional)
    USE_EMBEDDED_DEPENDENCIES := ON

    ###################################################################################
    Also, I am trying to run my case on GPU but it is really too slow. The main difference with aorta example (which works well for me now) is the surface of the inlet (which is really bigger) and the fact that there are external edges and corners (the inlet has 5 faces). I am working on cleaning my code to have something as in the nozzle example (with stlReader). It can be that but I am not sure.

    Thank you !

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.