Skip to content

Re: Parallel (MPI/OpenMP) simulation

#2326

Hi Robin,rnrnThanks for yours answers. I changed the “”Makefile.inc”” like this:rn

Code:
rnCXX := g++rn#CXX := icpc -D__aligned__=ignoredrn#CXX := mpiCCrnCXX := mpic++rnrn###########################################rnrnPARALLEL_MODE := OFFrnPARALLEL_MODE := MPIrn#PARALLEL_MODE := OMPrn#PARALLEL_MODE := HYBRIDrn

rnrnI tested these changes from the “”cavity2d/parallel”” directory. As indicated in the user manual (pag. 29), I executed these commands:rn

    rn
Code:
make clean

rn

Code:
make cleanbuild

rn

Code:
make

rnrnrnThen, I ran the simulation with 3 and 4 cores in order to test the time taken in each case. I used

Code:
mpirun -np x ./cavity2d

The “”x”” corresponds to the number of cores, in this case 3 and 4. It seems that I obtain the same problem indicated by Padde86 in the topic

Quote:
Problem with MPI execution an LBM algorithm

The 4 cores simulation takes more time that the 3 cores. The “”timer”” results obtained only varying the number of cores in the parallel simulation is:rnrn4 Coresrn

Code:
rn[LatticeStatistics] step=18816; t=14.7; uMax=0.1; avEnergy=0.000346835; avRho=1rn[Timer] step=18816; percent=98; passedTime=151.523; remTime=3.09231; MLUPs=2.4942rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 151.619srn[Timer] measured time (cpu): 141.128srn[Timer] average MLUPs : 2.107rn[Timer] average MLUPps: 2.107rn[Timer] ———————————————rn[LatticeStatistics] step=19072; t=14.9; uMax=0.1; avEnergy=0.000349094; avRho=1rn[Timer] step=19072; percent=99.3333; passedTime=151.653; remTime=1.01781; MLUPs=2.68606rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 151.738srn[Timer] measured time (cpu): 142.437srn[Timer] average MLUPs : 2.106rn[Timer] average MLUPps: 2.106rn[Timer] ———————————————rn[LatticeStatistics] step=18944; t=14.8; uMax=0.1; avEnergy=0.000347976; avRho=1rn[Timer] step=18944; percent=98.6667; passedTime=152.063; remTime=2.05491; MLUPs=3.94453rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 152.102srn[Timer] measured time (cpu): 142.303srn[Timer] average MLUPs : 2.101rn[Timer] average MLUPps: 2.101rn[Timer] ———————————————rn[LatticeStatistics] step=19072; t=14.9; uMax=0.1; avEnergy=0.000349094; avRho=1rn[Timer] step=19072; percent=99.3333; passedTime=152.471; remTime=1.0233; MLUPs=5.22071rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 152.866srn[Timer] measured time (cpu): 140.036srn[Timer] average MLUPs : 2.090rn[Timer] average MLUPps: 2.090rn[Timer] ———————————————rn

rnrn3 Coresrn

Code:
rn[LatticeStatistics] step=18560; t=14.5; uMax=0.1; avEnergy=0.000344785; avRho=1rn[Timer] step=18560; percent=96.6667; passedTime=109.805; remTime=3.78638; MLUPs=3.22735rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 110.165srn[Timer] measured time (cpu): 108.535srn[Timer] average MLUPs : 2.900rn[Timer] average MLUPps: 2.900rn[Timer] ———————————————rn[LatticeStatistics] step=18688; t=14.6; uMax=0.1; avEnergy=0.000345858; avRho=1rn[Timer] step=18688; percent=97.3333; passedTime=110.375; remTime=3.02397; MLUPs=3.73693rn[LatticeStatistics] step=19072; t=14.9; uMax=0.1; avEnergy=0.000349094; avRho=1rn[Timer] step=19072; percent=99.3333; passedTime=110.421; remTime=0.741081; MLUPs=3.24208rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 110.860srn[Timer] measured time (cpu): 109.294srn[Timer] average MLUPs : 2.882rn[Timer] average MLUPps: 2.882rn[Timer] ———————————————rn[LatticeStatistics] step=18816; t=14.7; uMax=0.1; avEnergy=0.000346835; avRho=1rn[Timer] step=18816; percent=98; passedTime=110.844; remTime=2.26212; MLUPs=4.55138rn[LatticeStatistics] step=18944; t=14.8; uMax=0.1; avEnergy=0.000347976; avRho=1rn[Timer] step=18944; percent=98.6667; passedTime=111.24; remTime=1.50324; MLUPs=5.37891rn[LatticeStatistics] step=19072; t=14.9; uMax=0.1; avEnergy=0.000349094; avRho=1rn[Timer] step=19072; percent=99.3333; passedTime=111.627; remTime=0.749174; MLUPs=5.504rn[Timer] rn[Timer] —————-Summary:Timer—————-rn[Timer] measured time (rt) : 112.14srn[Timer] measured time (cpu): 110.124srn[Timer] average MLUPs : 2.852rn[Timer] average MLUPps: 2.852rn[Timer] ———————————————rn

rnrnIf I understand the interest of using a parallel simulation is to divide the load in the different cores to get a result in less time (I know it is a very simplified principle). rnrn1. So, why measured time in the 3 cores option is less that the 4 cores option? rnrn2. Am I making a sequential simulation?rnrn3. How could I be sure that the simulation is running in parallel and not the same case x times in sequential?rnrnThe machine that I am using has these characteristics:rn

Code:
rnArchitecture: x86_64rnMode(s) opératoire(s) des processeurs :32-bit, 64-bitrnByte Order: Little EndianrnCPU(s): 4rnOn-line CPU(s) list: 0-3rnThread(s) par cœur : 2rnCœur(s) par socket : 1rnSocket(s): 2rnNœud(s) NUMA : 1rnIdentifiant constructeur :GenuineIntelrnFamille de processeur :15rnModèle : 4rnRévision : 10rnVitesse du processeur en MHz :3200.172rnBogoMIPS: 6400.71rnCache L1d : 16KrnCache L2 : 2048KrnNUMA node0 CPU(s): 0-3rn

rnrnBest regards,rnrnAlejandro