Markus Mohrhard
Forum Replies Created
-
AuthorPosts
-
Markus MohrhardParticipant
Hello Juliaan,
Quote:Quote from juliaan on January 24, 2018, 13:14
Hi Albert,Thanks again.
Except for a small difference on the line “int mult = 2 / (d…” I didn’t find anything wrong. I actually think the problem is not with the slip-free bc, but potentially with the pressure outlet.Because I want to run high Reynolds number cases I have opted for a pressure driven + periodic domain approach and I have implemented a fringe region to set the inflow.
However, I have two questions:
1. when i try to run my simulations with MPI they fail over the following line of code (that I use to initialize the fringe region). For a simulation on 2 processors, the code fails when iCloc = 1.
Code:for (int iCloc = 0; iCloc < noOfCuboids; iCloc++) {
BlockGeometryStructure2D<T>& tmp = superGeometry.getBlockGeometry(iCloc);
dom_origin = tmp.getOrigin();
…
}Something goes wrong when I ask for the origin. Any suggestion what could be wrong here?
You are not allowed to just access all the data when you use MPI. In MPI mode your data is distributed across multiple processes so you need to use a concept like:
Code:for (int iC=0; iC<this->_loadBalancer.size(); ++iC) {
BlockGeometryStructure2D<T>& tmp = superGeometry.getBlockGeometry(iC);
…
}If you execute this code now in each MPI process you will process each BlockLattice by the correct MPI process.
Markus MohrhardParticipantHey,
already the absolute performance seems to be way too low. A performance of 0.00 MLUPs points to some other problems.
Additionally cavity2d is known to scale quite well even for small grid sizes. For larger grid sizes even strong scaling should be fairly good. One thing that might be a problem for you might be the connection between the nodes. I think we see serious scaling problems if we switch from our infiniband network to our normal ethernet network.
In general I would start inspecting why the parallel version of cavity2d or cavity3d does not scale on your hardware. These examples are known to scale quite well.
Regards,
MarkusMarkus MohrhardParticipantSo and finally a quick run with an adapted cylinder2d example has finished. I only changed the value of N to 8 in examples/cylinder2d/cylinder2d.cpp and left the rest to the normal OpenLB 1.1 state.
On the HPC cluster I ran the job with 1 node/8 cores, 1 node/16 cores and 2 nodes/16 cores each. The following performance results can be obtained:
1n/8c : 131,4 MLUPs => 16.4 MLUPps
1n/16c: 240.7 MLUPs => 15.0 MLUPps
2n/32c: 450.4 MLUPs => 14.1 MLUPpsAnother result that I still have for a test run with N as 20 and 16 nodes each using 16 cores (256 total cores):
16n/256c: 2018.4 MLUPs => 7.9 MLUPps
This shows that while we don’t have a perfect scaling (especially for such small problems, the last one were less than 14k grid points per core) we still scale quite well to several nodes and a few hundred cores.
To help you with your scaling problem I would need some more info.
- What type of cluster are you using?
- Which code are you running?
- Which compiler options are you using in Makefile.inc?
Regards,
MarkusMarkus MohrhardParticipantHey steed188,
are you using OpenMP or MPI for your simulations?
In general our current OpenMP code is not really efficient and it is generally recommended to use MPI for the current releases (we are working on an improved hybrid OpenMP + MPI mode).
In general we scale quite well (at least for weak scaling) but obviously as soon as you move from one node to two nodes you will get an overhead through the communication that now can no longer be implemented through shared memory copy operations.
However I think in general the performance for most cases should be somewhat stable. I will try to post some numbers from our own HPC system soon.
Markus MohrhardParticipantQuote:Quote from Kai on July 29, 2017, 11:01
With mpich-3.1.3, openLB 1.1r0 has the same problem when running in parallel.“Error parsing XML in stream at line 25, column 7, byte index 4045797: not well-formed(invalid token)” …
The example is multiComponent2d.
Hi Kai,
can you paste your Makefile.inc and make sure that you used a MPI frontend compiler, e.g. mpic++ or mpiCC and used the MPI mode.
In Makefile.inc the following two lines are important:
CXX := mpic++
PARALLEL_MODE := MPIAfter making sure that these are the only CXX and PARALLEL_MODE lines that are not prefixed with a # go to examples/multComponent2d and call:
make clean cleanbuild && make && mpirun -np 4 ./rayleighTaylor2d
This should make sure that the code is built with MPI support and that the code is run in parallel through MPI. If you still see the problem please paste the Makefile.inc file so that we can continue searching the problem.
Regards,
Markus -
AuthorPosts