GPU_OpenMPI usage questions
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › GPU_OpenMPI usage questions
- This topic has 3 replies, 2 voices, and was last updated 1 year, 6 months ago by Adrian.
-
AuthorPosts
-
April 3, 2023 at 6:00 pm #7355C.RongParticipant
Hello everyone,
I’m a beginner in OpenLB and I’m trying to use OpenMPI, GPU_only, and GPU_OpenMPI to run some examples. OpenMPI and GPU_only have been successful, but there are some issues with GPU_OpenMPI. After running “mpirun -np 2 ./cavity2d”, the following error occurred:
[prepareGeometry] Prepare Geometry …
[SuperGeometry2D] cleaned 0 outer boundary voxel(s)
[SuperGeometry2D] cleaned 0 outer boundary voxel(s)
[SuperGeometry2D] cleaned 0 inner boundary voxel(s)
[SuperGeometry2D] the model is correct!
[SuperGeometryStatistics2D] materialNumber=1; count=16129; minPhysR=(0.0078125,0.0078125); maxPhysR=(0.992188,0.992188)
[SuperGeometryStatistics2D] materialNumber=2; count=385; minPhysR=(0,0); maxPhysR=(1,1)
[SuperGeometryStatistics2D] materialNumber=3; count=127; minPhysR=(0.0078125,1); maxPhysR=(0.992188,1)
[prepareGeometry] Prepare Geometry … OK
[prepareLattice] Prepare Lattice …
[prepareLattice] Prepare Lattice … OK
terminate called after throwing an instance of ‘std::runtime_error’
what(): an illegal memory access was encountered
[DESKTOP-L1PUHL9:18221] *** Process received signal ***
[DESKTOP-L1PUHL9:18221] Signal: Aborted (6)
[DESKTOP-L1PUHL9:18221] Signal code: (-6)
[DESKTOP-L1PUHL9:18221] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f39a75f0520]
[DESKTOP-L1PUHL9:18221] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f39a7644a7c]
[DESKTOP-L1PUHL9:18221] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f39a75f0476]
[DESKTOP-L1PUHL9:18221] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f39a75d67f3]
[DESKTOP-L1PUHL9:18221] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe)[0x7f39a7878bbe]
[DESKTOP-L1PUHL9:18221] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c)[0x7f39a788424c]
[DESKTOP-L1PUHL9:18221] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7)[0x7f39a78842b7]
[DESKTOP-L1PUHL9:18221] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518)[0x7f39a7884518]
[DESKTOP-L1PUHL9:18221] [ 8] ./cavity2d(+0xc9262)[0x55a0c9032262]
[DESKTOP-L1PUHL9:18221] [ 9] ./cavity2d(+0x825fb)[0x55a0c8feb5fb]
[DESKTOP-L1PUHL9:18221] [10] ./cavity2d(+0x231fb)[0x55a0c8f8c1fb]
[DESKTOP-L1PUHL9:18221] [11] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f39a75d7d90]
[DESKTOP-L1PUHL9:18221] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f39a75d7e40]
[DESKTOP-L1PUHL9:18221] [13] ./cavity2d(+0x23b75)[0x55a0c8f8cb75]
[DESKTOP-L1PUHL9:18221] *** End of error message ***
terminate called after throwing an instance of ‘std::runtime_error’
what(): an illegal memory access was encountered
[DESKTOP-L1PUHL9:18222] *** Process received signal ***
[DESKTOP-L1PUHL9:18222] Signal: Aborted (6)
[DESKTOP-L1PUHL9:18222] Signal code: (-6)
[DESKTOP-L1PUHL9:18222] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fd7d71f0520]
[DESKTOP-L1PUHL9:18222] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fd7d7244a7c]
[DESKTOP-L1PUHL9:18222] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fd7d71f0476]
[DESKTOP-L1PUHL9:18222] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fd7d71d67f3]
[DESKTOP-L1PUHL9:18222] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe)[0x7fd7d7478bbe]
[DESKTOP-L1PUHL9:18222] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c)[0x7fd7d748424c]
[DESKTOP-L1PUHL9:18222] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7)[0x7fd7d74842b7]
[DESKTOP-L1PUHL9:18222] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518)[0x7fd7d7484518]
[DESKTOP-L1PUHL9:18222] [ 8] ./cavity2d(+0xc9262)[0x55c1d6d31262]
[DESKTOP-L1PUHL9:18222] [ 9] ./cavity2d(+0x825fb)[0x55c1d6cea5fb]
[DESKTOP-L1PUHL9:18222] [10] ./cavity2d(+0x231fb)[0x55c1d6c8b1fb]
[DESKTOP-L1PUHL9:18222] [11] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fd7d71d7d90]
[DESKTOP-L1PUHL9:18222] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fd7d71d7e40]
[DESKTOP-L1PUHL9:18222] [13] ./cavity2d(+0x23b75)[0x55c1d6c8bb75]
[DESKTOP-L1PUHL9:18222] *** End of error message ***
————————————————————————–
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
————————————————————————–
————————————————————————–
mpirun noticed that process rank 0 with PID 0 on node DESKTOP-L1PUHL9 exited on signal 6 (Aborted).If I use gpu_openmpi.mk to run “./cavity”, the result is normal. And I have tried the following two configurations, both of which result in the same issue.What are some things I can think about?
Configuration 1:
1.WSL2-Unbuntu22.04
2.CPU-Intel i7-11800H
3.GPU-3070 Laptop
4.GCC-11.3.0
5.Cuda-release 12.1, V12.1.66
6.OpenMPI-4.1.5
7.CUDA_ARCH=86 in gpu_openmpi.mkConfiguration 2:
1.WSL2-Unbuntu20.04
2.CPU-AMD 3700X
3.GPU-1660 super
4.GCC-9.4.0
5.Cuda-release 12.1, V12.1.66
6.OpenMPI-4.1.5
7.CUDA_ARCH=75 in gpu_openmpi.mkgpu_openmpi.mk:
CXX := nvcc
CC := nvccCXXFLAGS := -O3
CXXFLAGS += -std=c++17
CXXFLAGS += -I/usr/local/openmpi/include # Without this code, there will be an error in make “mpiManager.h:29:10: fatal error: mpi.h: No such file or directory”PARALLEL_MODE := MPI
MPIFLAGS := -lmpi_cxx -lmpi
PLATFORMS := CPU_SISD GPU_CUDA
CUDA_ARCH := 86
USE_EMBEDDED_DEPENDENCIES := ON
April 3, 2023 at 6:06 pm #7356AdrianKeymasterIt seems as if OpenMPI is not compiled as CUDA-aware (the default in Ubuntu and most other distros). You can check out a new report on how to set this up at olb-tr7.pdf (including specific mention of WSL GPU setups). I hope this helps! If not I’ll take a closer look.
If all goes to plan we will publish OpenLB 1.6 within this week which will also improve the error messages in this specific situation.
April 3, 2023 at 6:17 pm #7357C.RongParticipantThank you !
I just installed it according to the olb-tr7.pdf instructions and ran “ompi_info –parsable -l 9 –all | grep mpi_built_with_cuda_support:value” in both configurations and the result was “mca:mpi:base:param:mpi_built_with_cuda_support:value:true”
April 3, 2023 at 6:19 pm #7358AdrianKeymasterGood, and you still get the error? If so the issue could be a mixup between a package manager provided version of OpenMPI (that doesn’t support CUDA) and the manually installed one. You can exclude this possibility by providing the full paths to the custom MPI for CXX, includes and linker flags.
-
AuthorPosts
- You must be logged in to reply to this topic.