Problem with multi GPU
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Problem with multi GPU
- This topic has 9 replies, 3 voices, and was last updated 7 months, 3 weeks ago by thanhphatvt.
-
AuthorPosts
-
June 13, 2024 at 3:34 pm #8809thanhphatvtParticipant
Dear Open LB team,
I used the gpuopenmpi config for running the case with multi GPU. But I think it did not work on my workstation. I use 2 K80 GPU card on my workstation.
Here is the Cuda and MPI version which I use.
root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# nvcc –version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# mpirun –version
mpirun (Open MPI) 4.1.6Report bugs to http://www.open-mpi.org/community/help/
root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# ompi_info –parsable -l 9 –all | grep mpi_built_with_cuda_support:value
mca:mpi:base:param:mpi_built_with_cuda_support:value:trueAfter make it I got it “nvcc cylinder3d.o -o cylinder3d -lolbcore -L../../../external/lib -lpthread -lz -ltinyxml -lcuda -lcudadevrt -lcudart -L../../../build/lib
”
And when I use the command: “mpirun -np 2 ./cylinder3d” for trying to run with 2 GPUs
Then I found only 1 GPU work but maybe it run 2 jobs at the same time ”
| 0 N/A N/A 7808 C ./cylinder3d 376MiB |
| 0 N/A N/A 7809 C ./cylinder3d 376MiB”Could you help me to deal with this problem?
Thank you so much!June 13, 2024 at 3:41 pm #8810AdrianKeymasterThe problem is that each process by default uses the first visible GPU. You can restrict this to assign each process their own GPU via:
mpirun -np 2 bash -c 'export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cylinder3d'
This is also documented in the
config/gpu_openmpi.mk
: (excerpt)# - Start the simulation using <code>mpirun -np 2 ./cavity3d</code> (All processes share default GPU, not optimal) # # Usage on a multi GPU system: (recommended when using MPI, use non-MPI version on single GPU systems) # - Run "mpirun -np 4 bash -c 'export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d'" # (for a 4 GPU system, further process mapping advisable, consult cluster documentation)
June 13, 2024 at 3:48 pm #8811thanhphatvtParticipantThank you so much. It’s my mistake. I miss one character in the command so it did not work
It’s working now.
Thank youJune 13, 2024 at 3:51 pm #8812AdrianKeymasterNo worries! Glad to hear that it works now,
June 14, 2024 at 1:36 pm #8823thanhphatvtParticipantDear Adrian,
When I use the mpi with 2 GPUs for running the simulation and I got the problem with the results. I can not open the result with Paraview. Here is the errors in ParaviewERROR: In vtkXMLParser.cxx, line 368
vtkXMLDataParser (000001E3D2A8DA40): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document elementERROR: In vtkXMLReader.cxx, line 576
vtkXMLImageDataReader (000001E3D2B3BFC0): Error parsing input file. ReadXMLInformation aborting.ERROR: In vtkExecutive.cxx, line 730
vtkCompositeDataPipeline (000001E3CCEC9130): Algorithm vtkXMLImageDataReader (000001E3D2B3BFC0) returned failure for request: vtkInformation (000001E3CD00B1E0)
Debug: Off
Modified Time: 409141
Reference Count: 1
Registered Events: (none)
Request: REQUEST_INFORMATION
FORWARD_DIRECTION: 0
ALGORITHM_AFTER_FORWARD: 1ERROR: In vtkXMLParser.cxx, line 368
vtkXMLDataParser (000001E3D2A8E610): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document elementERROR: In vtkXMLReader.cxx, line 576
vtkXMLImageDataReader (000001E3D2B3CE80): Error parsing input file. ReadXMLInformation aborting.ERROR: In vtkExecutive.cxx, line 730
vtkCompositeDataPipeline (000001E3CCEC10F0): Algorithm vtkXMLImageDataReader (000001E3D2B3CE80) returned failure for request: vtkInformation (000001E3CD00D550)
ERROR: In vtkXMLParser.cxx, line 368
vtkXMLDataParser (000001E3D2A8DA40): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document elementERROR: In vtkXMLReader.cxx, line 576
vtkXMLImageDataReader (000001E3D2B3BFC0): Error parsing input file. ReadXMLInformation aborting.ERROR: In vtkExecutive.cxx, line 730
vtkCompositeDataPipeline (000001E3CCEC9130): Algorithm vtkXMLImageDataReader (000001E3D2B3BFC0) returned failure for request: vtkInformation (000001E3CD00B1E0)
Debug: Off
Modified Time: 409141
Reference Count: 1
Registered Events: (none)
Request: REQUEST_INFORMATION
FORWARD_DIRECTION: 0
ALGORITHM_AFTER_FORWARD: 1ERROR: In vtkXMLParser.cxx, line 368
vtkXMLDataParser (000001E3D2A8E610): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document elementERROR: In vtkXMLReader.cxx, line 576
vtkXMLImageDataReader (000001E3D2B3CE80): Error parsing input file. ReadXMLInformation aborting.ERROR: In vtkExecutive.cxx, line 730
vtkCompositeDataPipeline (000001E3CCEC10F0): Algorithm vtkXMLImageDataReader (000001E3D2B3CE80) returned failure for request: vtkInformation (000001E3CD00D550)
Debug: Off
Modified Time: 409430
Reference Count: 1
Registered Events: (none)
Request: REQUEST_INFORMATION
FORWARD_DIRECTION: 0
ALGORITHM_AFTER_FORWARD: 1What should I do in this situation? Or Did I make something wrong in the simulation?
Thank you Adrian!June 14, 2024 at 1:42 pm #8824AdrianKeymasterIs this using the
laminar/cylinder3d
case? One explanation could be if the Paraview files intmp
are in a broken state due to playing around with various parallelization modes. Does this also happen if you completely remove it and restart? Did you change anything in the example case?June 14, 2024 at 1:57 pm #8825thanhphatvtParticipantYes, it’s laminar/cylinder3d case. This also happen when I remove and restart. I haven’t changed anything in the code.
I see this problem when I run with MPI by this command “”mpirun -np 4 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cylinder3d'””
If I use ./cylinder3d, I can open the file normally.Here is my workstation when I run with MPI:
| NVIDIA-SMI 470.239.06 Driver Version: 470.239.06 CUDA Version: 11.4 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:05:00.0 Off | 0 |
| N/A 44C P0 57W / 149W | 369MiB / 11441MiB | 30% Default |
| | | N/A |
+——————————-+———————-+———————-+
| 1 Tesla K80 Off | 00000000:06:00.0 Off | 0 |
| N/A 36C P0 70W / 149W | 359MiB / 11441MiB | 27% Default |
| | | N/A |
+——————————-+———————-+———————-+
| 2 Tesla K80 Off | 00000000:09:00.0 Off | 0 |
| N/A 43C P0 59W / 149W | 369MiB / 11441MiB | 28% Default |
| | | N/A |
+——————————-+———————-+———————-+
| 3 Tesla K80 Off | 00000000:0A:00.0 Off | 0 |
| N/A 36C P0 71W / 149W | 359MiB / 11441MiB | 29% Default |
| | | N/A |
+——————————-+———————-+———————-++—————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 16351 C ./cylinder3d 366MiB |
| 1 N/A N/A 16352 C ./cylinder3d 356MiB |
| 2 N/A N/A 16353 C ./cylinder3d 366MiB |
| 3 N/A N/A 16355 C ./cylinder3d 356MiBThanks
June 20, 2024 at 10:56 am #8838thanhphatvtParticipantHello,
I’m still on it but now I got this message when I make the file
“nvcc cavity3d.o -o cavity3d -lolbcore -L../../../external/lib -lmpi_cxx -lmpi -lpthread -lz -ltinyxml -lcuda -lcudadevrt -lcudart -L../../../build/lib
/usr/bin/ld: cannot find -lmpi_cxx: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [../../../default.single.mk:38: cavity3d] Error 1
”
I know the message from the directory but last time I can run but now it has the problem. Could you help me in this situation?
ThanksJune 25, 2024 at 8:15 pm #8857mathiasKeymastersee https://www.openlb.net/forum/reply/8856/ and thanks @aseidler
June 26, 2024 at 6:27 am #8859thanhphatvtParticipantThank you so much!
-
AuthorPosts
- You must be logged in to reply to this topic.