Skip to content

Speedup issues with OMP

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #1712

    Dear Mathias,rnrnThe parallel implementations by MPI works very well and present a desirable speedup. rnrnHowever, when I am trying to run OpenLB with parallel mode OMP, I cannot get expected speedup (linear relationship between threads and speed in most cases).rnrnTo run LBM code by OMP, I simply modified the in the home directory rnrnPARALLEL_MODE := OMPrnrnand set the environmental variables, thread number to 1, 2 , 3 ,4 ,6 ,8 respectively. By the way, there are 8 cores on my computer for sure.rnrn I monitored the performance of CPUs meanwhile. There is a good speed up from 1 thread to 2 threads, further to 3 threads, but no improvement appears when increase the threads beyond 3. rnrnI did the same work on the cluster (48 cores on each node), threads number is set as 48, there seems no improvement compared with the speed with 4 threads.rnrnCould you explain me what happened in regard to this issue. And what else should I do to improve the OMP parallel performance.rnrnLooking forward to your reply!rnrnBest regards,rnrnJepsonrn


    Dear Jepson,rnrn A few years ago we were researching this problem in detail (cf. 2009, Heuveline, V. & Krause, M.J. & Latt, J.: “”Towards a Hybrid Parallelization of Lattice Boltzmann Methods””, article in Computers and Mathematics with Applications, 58, 1071-1080). Basically we found that you need to pin the OMP-threads to specific cores to reach almost the performance obtained using MPI. Unfortunately, you need some system calls to do that. Since we want OpenLB to be as generic as possible, we decided against these optimizations. rnrnIn the mentioned paper you will also find some benchmark results to which you can compare yours. However, for a sufficient large grid size I would expect a better speed-up using 48 cores.rnrnMathias rnrn

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.