Re: Speedup in multi-calculation-nodes

Markus Mohrhard

So and finally a quick run with an adapted cylinder2d example has finished. I only changed the value of N to 8 in examples/cylinder2d/cylinder2d.cpp and left the rest to the normal OpenLB 1.1 state.

On the HPC cluster I ran the job with 1 node/8 cores, 1 node/16 cores and 2 nodes/16 cores each. The following performance results can be obtained:

1n/8c : 131,4 MLUPs => 16.4 MLUPps
1n/16c: 240.7 MLUPs => 15.0 MLUPps
2n/32c: 450.4 MLUPs => 14.1 MLUPps

Another result that I still have for a test run with N as 20 and 16 nodes each using 16 cores (256 total cores):

16n/256c: 2018.4 MLUPs => 7.9 MLUPps

This shows that while we don’t have a perfect scaling (especially for such small problems, the last one were less than 14k grid points per core) we still scale quite well to several nodes and a few hundred cores.

To help you with your scaling problem I would need some more info.

  • What type of cluster are you using?
  • Which code are you running?
  • Which compiler options are you using in