Speedup issues with OMP • OpenLB - Open source lattice Boltzmann code

This topic has 1 reply, 2 voices, and was last updated 12 years ago by mathias.

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts
July 17, 2012 at 6:31 am #1712

jepson
Member

Dear Mathias,rnrnThe parallel implementations by MPI works very well and present a desirable speedup. rnrnHowever, when I am trying to run OpenLB with parallel mode OMP, I cannot get expected speedup (linear relationship between threads and speed in most cases).rnrnTo run LBM code by OMP, I simply modified the Makefile.inc in the home directory rnrnPARALLEL_MODE := OMPrnrnand set the environmental variables, thread number to 1, 2 , 3 ,4 ,6 ,8 respectively. By the way, there are 8 cores on my computer for sure.rnrn I monitored the performance of CPUs meanwhile. There is a good speed up from 1 thread to 2 threads, further to 3 threads, but no improvement appears when increase the threads beyond 3. rnrnI did the same work on the cluster (48 cores on each node), threads number is set as 48, there seems no improvement compared with the speed with 4 threads.rnrnCould you explain me what happened in regard to this issue. And what else should I do to improve the OMP parallel performance.rnrnLooking forward to your reply!rnrnBest regards,rnrnJepsonrn

July 19, 2012 at 11:37 am #2057

mathias
Keymaster

Dear Jepson,rnrn A few years ago we were researching this problem in detail (cf. 2009, Heuveline, V. & Krause, M.J. & Latt, J.: “”Towards a Hybrid Parallelization of Lattice Boltzmann Methods””, article in Computers and Mathematics with Applications, 58, 1071-1080). Basically we found that you need to pin the OMP-threads to specific cores to reach almost the performance obtained using MPI. Unfortunately, you need some system calls to do that. Since we want OpenLB to be as generic as possible, we decided against these optimizations. rnrnIn the mentioned paper you will also find some benchmark results to which you can compare yours. However, for a sufficient large grid size I would expect a better speed-up using 48 cores.rnrnMathias rnrn
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.