Re: Speedup in multi-calculation-nodes
So and finally a quick run with an adapted cylinder2d example has finished. I only changed the value of N to 8 in examples/cylinder2d/cylinder2d.cpp and left the rest to the normal OpenLB 1.1 state.
On the HPC cluster I ran the job with 1 node/8 cores, 1 node/16 cores and 2 nodes/16 cores each. The following performance results can be obtained:
1n/8c : 131,4 MLUPs => 16.4 MLUPps
1n/16c: 240.7 MLUPs => 15.0 MLUPps
2n/32c: 450.4 MLUPs => 14.1 MLUPps
Another result that I still have for a test run with N as 20 and 16 nodes each using 16 cores (256 total cores):
16n/256c: 2018.4 MLUPs => 7.9 MLUPps
This shows that while we don’t have a perfect scaling (especially for such small problems, the last one were less than 14k grid points per core) we still scale quite well to several nodes and a few hundred cores.
To help you with your scaling problem I would need some more info.
- What type of cluster are you using?
- Which code are you running?
- Which compiler options are you using in Makefile.inc?