Skip to content

Adrian

Due to recent bot attacks we have changed the sign-up process. If you want to participate in our forum, first register on this website and then send a message via our contact form.

Forum Replies Created

Viewing 15 posts - 16 through 30 (of 539 total)
  • Author
    Posts
  • in reply to: Grid refinement #10920
    Adrian
    Keymaster

    Yes, HLBM based FSI interacts well with refinement. I have e.g. a working setup where I coarsen away from a wall-modeled rotor.

    There will however be issues when the FSI element moves through refinement regions (due to this not being implemented yet and refinement in general being a very manual process right now in OpenLB). Can you describe your setup in more detail? I assume you converted the static sphere into some moving FSI element?

    in reply to: Simulation Volume with large Voxels count #10919
    Adrian
    Keymaster

    Ok, weird. Does the simulation proceed to the first VTK output and/or did you modify anything else. Did you check the initial conditions? Or does the OOM error happen during initialization (this would make sense if you have very low RAM per core for some reason (e.g. SLURM defaults).

    in reply to: Multi-GPU with grid refinement in OpenLB 1.8-1 #10912
    Adrian
    Keymaster

    I’m running multi-GPU tests in OpenLB 1.8-1 and wanted to confirm whether grid refinement is officially supported on multiple GPUs, and to ask for help with a crash I’m seeing.

    Yes, you can use multi-GPU for the grid refinement (there was a bug in the setter in the release that has since been fixed, I just re-confirmed that the sphere case works on multi-GPU in the current head of the public repository.

    Both fail on two H200 GPUs at the coupler construction:

    Just a side note: Using two H200 is overkill for these cases unless you scale to many hundreds of millions of cells (of course for testing using two is useful).

    If yes, are there known limitations or configuration requirements for refinement::lagrava::makeCoarseToFineCoupler on multi-GPU?

    Any guidance on fixes or patches would be much appreciated.

    I am using the Rohde scheme for large applications on multiple GPUs internally without problems with the coupling itself. The main challenge right now is that the block decomposition and mesh in general needs to be set up manually to fit for refinement. e.g. you can not scale this up easily / adapt the mesh easily without manual intervention. The API for this will be improved in future releases.

    [GPU_CUDA:0] Found 2 CUDA devices but only one can be used per MPI process.

    It seems like the mpirun command is not quite right (CUDA_VISIBLE_DEVICES is not adjusted, so each process sees both GPUs and selects the first one). In the example configs there is a example SLURM script with a example command for multi-GPU use.

    in reply to: Simulation Volume with large Voxels count #10904
    Adrian
    Keymaster

    Can you describe your setup and execution mode in more detail?

    100e6 cells is “nothing”, this is usually easy to compute on a local machine.

    What is especially unexpected is that you get an OOM error only after some simulation time, this means that additional data is allocated during the run which is not standard behavior.

    in reply to: Grid refinement #10891
    Adrian
    Keymaster

    Can you provide a sketch of your refinement domain?

    In general, BCs crossing the refinement frontier can be an issue. The BCs between the adjacent resolution levels need to be consistent and coupling needs to either take them into account or be disabled on the actual BC cells (common).

    We have also been increasing the updates to the public Git repository lately. Amongst other things there is now also an implementation of the cell-centered approach by Rohde that you might want to check out.

    in reply to: Multi-GPU MPI library is not CUDA-aware #10769
    Adrian
    Keymaster

    Ok, weird. I just re-tested the release on my dual GPU system and the example works as it should.

    One other possibility is that the nvcc selected in the environment is a different one than in your /usr/local/cuda. You could also try the “mixed” mode (see the example configs in config/) to directly use your mpic++ together with nvcc.

    Did you test any other CUDA-aware MPI apps in the same environment?

    in reply to: Multi-GPU MPI library is not CUDA-aware #10767
    Adrian
    Keymaster

    The command outputs (thanks!) all look fine so this may just be the automated check failing despite CUDA-awareness being available. The logic we use to check this

    
    #ifdef PARALLEL_MODE_MPI
    #if defined(MPIX_CUDA_AWARE_SUPPORT) && MPIX_CUDA_AWARE_SUPPORT
      if (!MPIX_Query_cuda_support()) {
        clout << "The used MPI Library is not CUDA-aware. Multi-GPU execution will fail." << std::endl;
      }
    #endif
    #if defined(MPIX_CUDA_AWARE_SUPPORT) && !MPIX_CUDA_AWARE_SUPPORT
      clout << "The used MPI Library is not CUDA-aware. Multi-GPU execution will fail." << std::endl;
    #endif
    #if !defined(MPIX_CUDA_AWARE_SUPPORT)
      clout << "Unable to check for CUDA-aware MPI support. Multi-GPU execution may fail." << std::endl;
    #endif
    #endif // PARALLEL_MODE_MPI
    

    can definitely have gaps. Does the program proceed as usual in multi-GPU after this? (If CUDA-awareness is indeed not working for some reason despite the command output I would expect it to instantly segfault on communication).

    in reply to: scheduleBackgroundOutputVTK for 2D case #10764
    Adrian
    Keymaster

    Yes, currently this only works for 3D cases (which is what we do most of the time). For historical reasons the VTK writer is separate between 3D and 2D which is why this has to be adapted separately (preferably be de-dimensionalizing the VTK writer, which is why it not done yet).

    in reply to: Compile openLB 1.8 in Cluster #10760
    Adrian
    Keymaster

    What is the problem?

    in reply to: FSI on a rigid 3D flap #10756
    Adrian
    Keymaster

    Rigid valve definitely works on GPU on my system – can you post the error?

    Deformable particle doesn’t work in the public version on GPUs but will be ported in the next release.

    In terms of FSI there is also the GPU enabled centrifugal pump case.

    in reply to: FSI on a rigid 3D flap #10754
    Adrian
    Keymaster

    Currently the idea is that OpenLB provides the framework for efficiently embedding structures into porosities and obtaining their boundary integrals while the user plugs in their own structure model. For a rotating flap with resisting torque I would consider the flap as a point mass with orientation and inertia. You can start with the rigid valve 2d case and add such a torque there to test (~ 1 line of code)

    in reply to: Parallelization usig MPI #10716
    Adrian
    Keymaster

    In principle you can just stick to using g++ as the backend compiler for mpic++. Of course, on Intel CPUs you get quite nice speedups using Intel’s C++ compilers. The new error indicates that the ICC version is too old and / or pulls in a old standard library, specifically it doesn’t support C++20.

    Which cluster are you using? We also have some example configs in the config/ directory.

    in reply to: Parallelization usig MPI #10713
    Adrian
    Keymaster

    The usual source of this behavior is that MPI was not enabled in the config. As this is done in your case the next best explanation is that you did not recompile. Did you execute a full rebuild after the config change?

    in reply to: Using MRT with GPU #10684
    Adrian
    Keymaster

    Sorry for the late reply – the issue was that the MRT collision implementation was missing the any_platform macro that is needed to trigger device code generation (and unfortunately nvcc silentlly produces garbage instead of failing). I pushed an update to the release repository. In my tests aorta3d now works with MRT on GPUs.

    in reply to: Domain Partition Scheme #10461
    Adrian
    Keymaster

    Yes, there are also different schemes already implemented (e.g. by-value and by-weight). It is also possible to load decompositions from XML files (see e.g. turbulence/nozzle3d) and of course to implement your own scheme.

    The implemented schemes will already give you a non uniform partition if you e.g. adjust the weights. You can find more details in the user guide and our preprint.

Viewing 15 posts - 16 through 30 (of 539 total)