Reply To: GPU Examples
You need to adapt the CUDA_ARCH value to match your target GPU. In case of the A100 you need to set CUDA_ARCH to 80 (lower can also work but leads to additional bytecode translation at startup). See e.g. the reference table in rules.mk and the comments in the example configs.
For multi GPU usage you will need to link against MPI with CUDA support, see e.g. config/gpu_openmpi.mk for a starting point but consult cluster documentation for further guidance on how to set up and execute MPI + CUDA on the specific system. E.g. in my tests on HoreKa the Nvidia HPC SDK provided the best results for multi node execution.
However, testing the single GPU setup first is a good idea and should work as soon as you adjust the CUDA_ARCH value and recompile.