Skip to content

GPU_OpenMPI usage questions

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #7355
    C.Rong
    Participant

    Hello everyone,

    I’m a beginner in OpenLB and I’m trying to use OpenMPI, GPU_only, and GPU_OpenMPI to run some examples. OpenMPI and GPU_only have been successful, but there are some issues with GPU_OpenMPI. After running “mpirun -np 2 ./cavity2d”, the following error occurred:

    [prepareGeometry] Prepare Geometry …
    [SuperGeometry2D] cleaned 0 outer boundary voxel(s)
    [SuperGeometry2D] cleaned 0 outer boundary voxel(s)
    [SuperGeometry2D] cleaned 0 inner boundary voxel(s)
    [SuperGeometry2D] the model is correct!
    [SuperGeometryStatistics2D] materialNumber=1; count=16129; minPhysR=(0.0078125,0.0078125); maxPhysR=(0.992188,0.992188)
    [SuperGeometryStatistics2D] materialNumber=2; count=385; minPhysR=(0,0); maxPhysR=(1,1)
    [SuperGeometryStatistics2D] materialNumber=3; count=127; minPhysR=(0.0078125,1); maxPhysR=(0.992188,1)
    [prepareGeometry] Prepare Geometry … OK
    [prepareLattice] Prepare Lattice …
    [prepareLattice] Prepare Lattice … OK
    terminate called after throwing an instance of ‘std::runtime_error’
    what(): an illegal memory access was encountered
    [DESKTOP-L1PUHL9:18221] *** Process received signal ***
    [DESKTOP-L1PUHL9:18221] Signal: Aborted (6)
    [DESKTOP-L1PUHL9:18221] Signal code: (-6)
    [DESKTOP-L1PUHL9:18221] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f39a75f0520]
    [DESKTOP-L1PUHL9:18221] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f39a7644a7c]
    [DESKTOP-L1PUHL9:18221] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f39a75f0476]
    [DESKTOP-L1PUHL9:18221] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f39a75d67f3]
    [DESKTOP-L1PUHL9:18221] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe)[0x7f39a7878bbe]
    [DESKTOP-L1PUHL9:18221] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c)[0x7f39a788424c]
    [DESKTOP-L1PUHL9:18221] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7)[0x7f39a78842b7]
    [DESKTOP-L1PUHL9:18221] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518)[0x7f39a7884518]
    [DESKTOP-L1PUHL9:18221] [ 8] ./cavity2d(+0xc9262)[0x55a0c9032262]
    [DESKTOP-L1PUHL9:18221] [ 9] ./cavity2d(+0x825fb)[0x55a0c8feb5fb]
    [DESKTOP-L1PUHL9:18221] [10] ./cavity2d(+0x231fb)[0x55a0c8f8c1fb]
    [DESKTOP-L1PUHL9:18221] [11] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f39a75d7d90]
    [DESKTOP-L1PUHL9:18221] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f39a75d7e40]
    [DESKTOP-L1PUHL9:18221] [13] ./cavity2d(+0x23b75)[0x55a0c8f8cb75]
    [DESKTOP-L1PUHL9:18221] *** End of error message ***
    terminate called after throwing an instance of ‘std::runtime_error’
    what(): an illegal memory access was encountered
    [DESKTOP-L1PUHL9:18222] *** Process received signal ***
    [DESKTOP-L1PUHL9:18222] Signal: Aborted (6)
    [DESKTOP-L1PUHL9:18222] Signal code: (-6)
    [DESKTOP-L1PUHL9:18222] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fd7d71f0520]
    [DESKTOP-L1PUHL9:18222] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fd7d7244a7c]
    [DESKTOP-L1PUHL9:18222] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fd7d71f0476]
    [DESKTOP-L1PUHL9:18222] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fd7d71d67f3]
    [DESKTOP-L1PUHL9:18222] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe)[0x7fd7d7478bbe]
    [DESKTOP-L1PUHL9:18222] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c)[0x7fd7d748424c]
    [DESKTOP-L1PUHL9:18222] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7)[0x7fd7d74842b7]
    [DESKTOP-L1PUHL9:18222] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518)[0x7fd7d7484518]
    [DESKTOP-L1PUHL9:18222] [ 8] ./cavity2d(+0xc9262)[0x55c1d6d31262]
    [DESKTOP-L1PUHL9:18222] [ 9] ./cavity2d(+0x825fb)[0x55c1d6cea5fb]
    [DESKTOP-L1PUHL9:18222] [10] ./cavity2d(+0x231fb)[0x55c1d6c8b1fb]
    [DESKTOP-L1PUHL9:18222] [11] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fd7d71d7d90]
    [DESKTOP-L1PUHL9:18222] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fd7d71d7e40]
    [DESKTOP-L1PUHL9:18222] [13] ./cavity2d(+0x23b75)[0x55c1d6c8bb75]
    [DESKTOP-L1PUHL9:18222] *** End of error message ***
    ————————————————————————–
    Primary job terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    ————————————————————————–
    ————————————————————————–
    mpirun noticed that process rank 0 with PID 0 on node DESKTOP-L1PUHL9 exited on signal 6 (Aborted).

    If I use gpu_openmpi.mk to run “./cavity”, the result is normal. And I have tried the following two configurations, both of which result in the same issue.What are some things I can think about?

    Configuration 1:
    1.WSL2-Unbuntu22.04
    2.CPU-Intel i7-11800H
    3.GPU-3070 Laptop
    4.GCC-11.3.0
    5.Cuda-release 12.1, V12.1.66
    6.OpenMPI-4.1.5
    7.CUDA_ARCH=86 in gpu_openmpi.mk

    Configuration 2:
    1.WSL2-Unbuntu20.04
    2.CPU-AMD 3700X
    3.GPU-1660 super
    4.GCC-9.4.0
    5.Cuda-release 12.1, V12.1.66
    6.OpenMPI-4.1.5
    7.CUDA_ARCH=75 in gpu_openmpi.mk

    gpu_openmpi.mk:
    CXX := nvcc
    CC := nvcc

    CXXFLAGS := -O3
    CXXFLAGS += -std=c++17
    CXXFLAGS += -I/usr/local/openmpi/include # Without this code, there will be an error in make “mpiManager.h:29:10: fatal error: mpi.h: No such file or directory”

    PARALLEL_MODE := MPI

    MPIFLAGS := -lmpi_cxx -lmpi

    PLATFORMS := CPU_SISD GPU_CUDA

    CUDA_ARCH := 86

    USE_EMBEDDED_DEPENDENCIES := ON

    #7356
    Adrian
    Keymaster

    It seems as if OpenMPI is not compiled as CUDA-aware (the default in Ubuntu and most other distros). You can check out a new report on how to set this up at olb-tr7.pdf (including specific mention of WSL GPU setups). I hope this helps! If not I’ll take a closer look.

    If all goes to plan we will publish OpenLB 1.6 within this week which will also improve the error messages in this specific situation.

    #7357
    C.Rong
    Participant

    Thank you !

    I just installed it according to the olb-tr7.pdf instructions and ran “ompi_info –parsable -l 9 –all | grep mpi_built_with_cuda_support:value” in both configurations and the result was “mca:mpi:base:param:mpi_built_with_cuda_support:value:true”

    #7358
    Adrian
    Keymaster

    Good, and you still get the error? If so the issue could be a mixup between a package manager provided version of OpenMPI (that doesn’t support CUDA) and the manually installed one. You can exclude this possibility by providing the full paths to the custom MPI for CXX, includes and linker flags.

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.