Skip to content

Markus Mohrhard

Forum Replies Created

Viewing 5 posts - 16 through 20 (of 20 total)
  • Author
    Posts
  • in reply to: Slip-free and pressure corner boundary condition #2776
    Markus Mohrhard
    Participant

    Hello Juliaan,

    Quote:
    Quote from juliaan on January 24, 2018, 13:14
    Hi Albert,

    Thanks again.
    Except for a small difference on the line “int mult = 2 / (d…” I didn’t find anything wrong. I actually think the problem is not with the slip-free bc, but potentially with the pressure outlet.

    Because I want to run high Reynolds number cases I have opted for a pressure driven + periodic domain approach and I have implemented a fringe region to set the inflow.

    However, I have two questions:

    1. when i try to run my simulations with MPI they fail over the following line of code (that I use to initialize the fringe region). For a simulation on 2 processors, the code fails when iCloc = 1.

    Code:
    for (int iCloc = 0; iCloc < noOfCuboids; iCloc++) {
    BlockGeometryStructure2D<T>& tmp = superGeometry.getBlockGeometry(iCloc);
    dom_origin = tmp.getOrigin();

    }

    Something goes wrong when I ask for the origin. Any suggestion what could be wrong here?

    You are not allowed to just access all the data when you use MPI. In MPI mode your data is distributed across multiple processes so you need to use a concept like:

    Code:
    for (int iC=0; iC<this->_loadBalancer.size(); ++iC) {
    BlockGeometryStructure2D<T>& tmp = superGeometry.getBlockGeometry(iC);

    }

    If you execute this code now in each MPI process you will process each BlockLattice by the correct MPI process.

    in reply to: Speedup in multi-calculation-nodes #2763
    Markus Mohrhard
    Participant

    Hey,

    already the absolute performance seems to be way too low. A performance of 0.00 MLUPs points to some other problems.

    Additionally cavity2d is known to scale quite well even for small grid sizes. For larger grid sizes even strong scaling should be fairly good. One thing that might be a problem for you might be the connection between the nodes. I think we see serious scaling problems if we switch from our infiniband network to our normal ethernet network.

    In general I would start inspecting why the parallel version of cavity2d or cavity3d does not scale on your hardware. These examples are known to scale quite well.

    Regards,
    Markus

    in reply to: Speedup in multi-calculation-nodes #2761
    Markus Mohrhard
    Participant

    So and finally a quick run with an adapted cylinder2d example has finished. I only changed the value of N to 8 in examples/cylinder2d/cylinder2d.cpp and left the rest to the normal OpenLB 1.1 state.

    On the HPC cluster I ran the job with 1 node/8 cores, 1 node/16 cores and 2 nodes/16 cores each. The following performance results can be obtained:

    1n/8c : 131,4 MLUPs => 16.4 MLUPps
    1n/16c: 240.7 MLUPs => 15.0 MLUPps
    2n/32c: 450.4 MLUPs => 14.1 MLUPps

    Another result that I still have for a test run with N as 20 and 16 nodes each using 16 cores (256 total cores):

    16n/256c: 2018.4 MLUPs => 7.9 MLUPps

    This shows that while we don’t have a perfect scaling (especially for such small problems, the last one were less than 14k grid points per core) we still scale quite well to several nodes and a few hundred cores.

    To help you with your scaling problem I would need some more info.

    • What type of cluster are you using?
    • Which code are you running?
    • Which compiler options are you using in Makefile.inc?

    Regards,
    Markus

    in reply to: Speedup in multi-calculation-nodes #2760
    Markus Mohrhard
    Participant

    Hey steed188,

    are you using OpenMP or MPI for your simulations?

    In general our current OpenMP code is not really efficient and it is generally recommended to use MPI for the current releases (we are working on an improved hybrid OpenMP + MPI mode).

    In general we scale quite well (at least for weak scaling) but obviously as soon as you move from one node to two nodes you will get an overhead through the communication that now can no longer be implemented through shared memory copy operations.

    However I think in general the performance for most cases should be somewhat stable. I will try to post some numbers from our own HPC system soon.

    in reply to: MPI run — pvd file fault #2681
    Markus Mohrhard
    Participant
    Quote:
    Quote from Kai on July 29, 2017, 11:01
    With mpich-3.1.3, openLB 1.1r0 has the same problem when running in parallel.

    “Error parsing XML in stream at line 25, column 7, byte index 4045797: not well-formed(invalid token)” …

    The example is multiComponent2d.

    Hi Kai,

    can you paste your Makefile.inc and make sure that you used a MPI frontend compiler, e.g. mpic++ or mpiCC and used the MPI mode.

    In Makefile.inc the following two lines are important:

    CXX := mpic++
    PARALLEL_MODE := MPI

    After making sure that these are the only CXX and PARALLEL_MODE lines that are not prefixed with a # go to examples/multComponent2d and call:

    make clean cleanbuild && make && mpirun -np 4 ./rayleighTaylor2d

    This should make sure that the code is built with MPI support and that the code is run in parallel through MPI. If you still see the problem please paste the Makefile.inc file so that we can continue searching the problem.

    Regards,
    Markus

Viewing 5 posts - 16 through 20 (of 20 total)