Skip to content

Reply To: Multi-GPU usage issue

#7019
Adrian
Keymaster

I am starting to suspect that the case was not actually compiled with MPI support (at the start of the output you should see the number of MPI processes printed by OpenLB, is this correct in your SLURM log?)

Replying in more detail to your previous question:

How exactly did you launch the application and how did you assign each process a single GPU?

The steps are the following:

1. Copy the example config config/gpu_openmpi.mk into config.mk using e.g. cp config/gpu_openmpi.mk config.mk

2. Edit the config.mk to use the correct CUDA_ARCH for your target GPU

3. Ensure that a CUDA-aware MPI module and CUDA 11.4 or later (for nvcc) is loaded in your build environment

4. Edit the config.mk to use the mpic++ provided CXXFLAGS and LDFLAGS per the config hint

# CXXFLAGS and LDFLAGS may need to be adjusted depending on the specific MPI installation.
# Compare to mpicxx --showme:compile and mpicxx --showme:link when in doubt.

4. Compile the example using make

5. Update the SLURM script to launch one process per GPU and assign each process a GPU via the CUDA_VISIBLE_DEVICES environment variable. This is what


mpirun bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./program

does.

For further investigation on where the problem is in your process it would help if you can share your exact config.mk, SLURM script and job output in addition to more information of your system setup.

Other approaches are possible depending on the exact environment.

  • This reply was modified 1 year, 5 months ago by Adrian.