Skip to content

Reply To: GPU Examples

#6596
Mosemb
Participant

Hey Adrian thanks. So i was able to change the config.mk file like below
CXX := nvcc
CC := nvcc
CXXFLAGS := -O3
CXXFLAGS += -std=c++17
PARALLEL_MODE := MPI
MPIFLAGS := -lmpi_cxx -lmpi
PLATFORMS := CPU_SISD GPU_CUDA
# for e.g. RTX 30* (Ampere), see table in rules.mk for other options
CUDA_ARCH := 80
USE_EMBEDDED_DEPENDENCIES := ON

The idea is to run the application with cuda in pararrel. But as i run the application with
mpirun -np 2 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./bstep2d’ . The output i get seems not to be parallized, am using 2 nodes at the point and here is the output below

[Timer]
[Timer] —————-Summary:Timer—————-
[Timer] measured time (rt) : 649.823s
[Timer] measured time (cpu): 634.863s
[Timer] average MLUPs : 181.855
[Timer] average MLUPps: 181.855
[Timer] ———————————————
[SuperPlaneIntegralFluxVelocity2D] regionSize[m]=0.00468; flowRate[m^2/s]=0.00485398; meanVelocity[m/s]=1.03718
[SuperPlaneIntegralFluxPressure2D] regionSize[m]=0.00468; force[N]=0.0182643; meanPressure[Pa]=3.90263
[Timer] step=576920; percent=99.9995; passedTime=653.267; remTime=0.00339701; MLUPs=186.57
[LatticeStatistics] step=576920; t=1.99999; uMax=0.0301884; avEnergy=9.06721e-05; avRho=1.00147
[Timer]
[Timer] —————-Summary:Timer—————-
[Timer] measured time (rt) : 653.461s
[Timer] measured time (cpu): 639.569s
[Timer] average MLUPs : 180.842
[Timer] average MLUPps: 180.842
[Timer] ———————————————

My expectation would be having one value in terms of MLUPs and MLUPps. But i get 2 values for every individual node. How can i fix this?