Multi-GPU usage issue
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Multi-GPU usage issue
- This topic has 6 replies, 2 voices, and was last updated 2 years, 3 months ago by Adrian.
-
AuthorPosts
-
November 8, 2022 at 1:38 am #6957achodankarParticipant
Hello Developers,
I tried running the code on multiple GPUs. I used the config file corresponding to multiple GPUs. However, after sshing on the node, it only shows running on one GPU device. What might be the possible ways to rectify this issue?Thank you.
Yours sincerely,
Abhijeet C.
November 8, 2022 at 10:44 am #6958AdrianKeymasterJust to confirm: You used the
config/gpu_openmpi.mk
example config?How exactly did you launch the application and how did you assign each process a single GPU? (Following the comments from the example config)
Usage on a multi GPU system: (recommended when using MPI, use non-MPI version on single GPU systems)
– Run `mpirun -np 4 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d’
(for a 4 GPU system, further process mapping advisable, consult cluster documentation)- This reply was modified 2 years, 3 months ago by Adrian.
November 8, 2022 at 3:25 pm #6960achodankarParticipantHello Adrian,
I did use the config/gpu_openmpi.mk example config.I used this SLURM script:
#!/bin/bash
#SBATCH –job-name=run1
#SBATCH –output=run1.out
#SBATCH –mail-type=ALL
#SBATCH –partition=gpu
#SBATCH –nodes=1
#SBATCH –gpus-per-node=a100:8
#SBATCH –mem=50gb
#SBATCH –time=5-00:00:00
#SBATCH –get-user-envCUDA_VISIBLE_DEVICES_SETTING=(“0” “0” “0,1” “0,1,2” “0,1,2,3” “0,1,2,3,4” “0,1,2,3,4,5” “0,1,2,3,4,5,6” “0,1,2,3,4,5,6,7” “0,1,2,3,4,5,6,7,8” “0” )
srun –mpi=pmix_v3 bash -c export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./poiseuille3d
————————————————————————————————-
The mpirun command doesn’t work on the cluster for me. I also tried by setting the no of cuboids to 8. Also, I tried using the cuda_visible device settings. Also, I used this line srun –mpi=pmix_v3 export env CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES_SETTING[$gpus-per-node]}; ./poiseuille3d instead of the previous one. However, none of the cases worked out for me.
I was adviced to use cudasetdevice command with the rank. How should I implement it? Is it the right approach to fix this issue?
I would really appreciate your help to resolve this issue.
Thank you.
Yours sincerely,
Abhijeet C.
November 9, 2022 at 5:30 pm #7014achodankarParticipantHello Adrian,
How exactly did you launch the application and how did you assign each process a single GPU?
1) I ran the make file.
2) Ran the slurm scriptDid I miss on anything?
November 9, 2022 at 5:35 pm #7015AdrianKeymasterIt looks to me as if only one MPI task is launched per node in your SLURM script. You can try to include a
#SBATCH --tasks-per-node=8
setting.If your application case calculates the number of cuboids w.r.t. the number of MPI processes (this is the case for OpenLB’s example cases) you do not need to manually change this to 8.
Edit: Enabling handling of more than one GPU by a single process would be a nice addition and require setting the active device via CUDA as you indicated. However this is not included in OpenLB 1.5 – there we assume that each process that hold a GPU-based block lattice has access to exactly one default GPU (as configured via the visible device environment variable)
November 9, 2022 at 8:35 pm #7018achodankarParticipantHello Adrian,
I retracted back to the original code for the number of cuboids, however, the number of cuboids is set to one after running the code.
#ifdef PARALLEL_MODE_MPI
const int noOfCuboids = 2*singleton::mpi().getSize();
#else // ifdef PARALLEL_MODE_MPI
const int noOfCuboids = 1;
#endif // ifdef PARALLEL_MODE_MPIAlso, the command #SBATCH –tasks-per-node=8 was added to the SLURM script. It didn’t make any difference.
I observe that the issue stems from the mpi process. There is something not syncing between the mpi process or ranks with the Slurm script. I would appreciate your feedback.
Thank you.
Yours sincerely,
Abhijeet C.
November 9, 2022 at 10:55 pm #7019AdrianKeymasterI am starting to suspect that the case was not actually compiled with MPI support (at the start of the output you should see the number of MPI processes printed by OpenLB, is this correct in your SLURM log?)
Replying in more detail to your previous question:
How exactly did you launch the application and how did you assign each process a single GPU?
The steps are the following:
1. Copy the example config
config/gpu_openmpi.mk
intoconfig.mk
using e.g.cp config/gpu_openmpi.mk config.mk
2. Edit the
config.mk
to use the correctCUDA_ARCH
for your target GPU3. Ensure that a CUDA-aware MPI module and CUDA 11.4 or later (for
nvcc
) is loaded in your build environment4. Edit the
config.mk
to use thempic++
providedCXXFLAGS
andLDFLAGS
per the config hint# CXXFLAGS and LDFLAGS may need to be adjusted depending on the specific MPI installation.
# Compare tompicxx --showme:compile
andmpicxx --showme:link
when in doubt.4. Compile the example using
make
5. Update the SLURM script to launch one process per GPU and assign each process a GPU via the
CUDA_VISIBLE_DEVICES
environment variable. This is whatmpirun bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./program
does.
For further investigation on where the problem is in your process it would help if you can share your exact
config.mk
, SLURM script and job output in addition to more information of your system setup.Other approaches are possible depending on the exact environment.
- This reply was modified 2 years, 3 months ago by Adrian.
-
AuthorPosts
- You must be logged in to reply to this topic.