Reply To: GPU Examples
Good to hear that at least the CUDA_ARCH issue is solved.
As for the MPI usage: You are correct in that this is not actually parallelized in the output. Did you recompile (“make clean; make”) after switching to the MPI-enabled config?
How did you configure the SLURM (?) script on your cluster? As you used two nodes of 4 A100 each but only get two outputs instead of 8 leads me to believe that something is also wrong there (we need one MPI process per GPU).
Sadly, setting up a multi GPU simulation correctly is not as straight forward as a plain CPU-only application. However, once we have found a working config for your situation it should work for all apps.