Skip to content

CUDA MPI usage in two GeForce RTX 2080 Ti GPUs

OpenLB – Open Source Lattice Boltzmann Code Forums on OpenLB General Topics CUDA MPI usage in two GeForce RTX 2080 Ti GPUs

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #6938
    jflorezgi
    Participant

    Hi,
    I have a program that is running properly in a single gpu using gpu_only.mk as a config.mk, but now I want to run the same program in two GeForce RTX 2080 Ti graphic cards, so I changed the config.mk by gpu_openmpi.mk and I started the simulation using
    mpirun -np 2 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./GPU-ESVcase01’ command, but it appears an error that you can see below. If I run the program with the same command but only using one gpu
    mpirun -np 1 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./GPU-ESVcase01’
    the program has no problems running. I appreciate any help you can give me

    [prepareLattice] Prepare Lattice … OK
    [medusa16:398642] Read -1, expected 4561900, errno = 14
    [medusa16:398642] *** Process received signal ***
    [medusa16:398642] Signal: Segmentation fault (11)
    [medusa16:398642] Signal code: Invalid permissions (2)
    [medusa16:398642] Failing at address: 0x7f89fa000000
    [medusa16:398643] Read -1, expected 4561900, errno = 14
    [medusa16:398643] *** Process received signal ***
    [medusa16:398643] Signal: Segmentation fault (11)
    [medusa16:398643] Signal code: Invalid permissions (2)
    [medusa16:398643] Failing at address: 0x7fb2ac000000
    [medusa16:398642] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f8a8a744090]
    [medusa16:398642] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18b733)[0x7f8a8a88c733]
    [medusa16:398642] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x31c4)[0x7f8a889d61c4]
    [medusa16:398642] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1c6)[0x7f8a889fc926]
    [medusa16:398642] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x1a9)[0x7f8a889f5429]
    [medusa16:398642] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f8a889d7ed5]
    [medusa16:398643] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7fb340836090]
    [medusa16:398643] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18b733)[0x7fb34097e733]
    [medusa16:398643] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x31c4)[0x7fb33eac81c4]
    [medusa16:398643] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1c6)[0x7fb33eaee926]
    [medusa16:398643] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x1a9)[0x7fb33eae7429]
    [medusa16:398643] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fb33eac9ed5]
    [medusa16:398643] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7fb33eaca3a3]
    [medusa16:398643] [medusa16:398642] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7f8a889d83a3]
    [medusa16:398642] [ 7] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fb3406b3854]
    [medusa16:398643] [ 8] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_test+0x31)[0x7fb3428671b1]
    [medusa16:398643] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f8a8a5c1854]
    [medusa16:398642] [ 8] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_test+0x31)[0x7f8a8c7751b1]
    [ 9] [medusa16:398642] [ 9] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Test+0x52)[0x7f8a8c7b38d2]
    [medusa16:398642] [10] ./GPU-ESVcase01(+0x187ba9)[0x55a902bd4ba9]
    [medusa16:398642] [11] ./GPU-ESVcase01(+0xdad5a)[0x55a902b27d5a]
    /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Test+0x52)[0x7fb3428a58d2]
    [medusa16:398643] [10] ./GPU-ESVcase01(+0x187ba9)[0x56487ededba9]
    [medusa16:398643] [11] ./GPU-ESVcase01(+0xdad5a)[0x56487ed40d5a]
    [medusa16:398643] [12] ./GPU-ESVcase01(+0xe2c42)[0x56487ed48c42]
    [medusa16:398643] [13] ./GPU-ESVcase01(+0x69eba)[0x56487eccfeba]
    [medusa16:398643] [14] [medusa16:398642] [12] ./GPU-ESVcase01(+0xe2c42)[0x55a902b2fc42]
    [medusa16:398642] [13] ./GPU-ESVcase01(+0x69eba)[0x55a902ab6eba]
    [medusa16:398642] ./GPU-ESVcase01(+0x471e8)[0x56487ecad1e8]
    [medusa16:398643] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fb340817083]
    [medusa16:398643] [16] ./GPU-ESVcase01(+0x4736e)[0x56487ecad36e]
    [medusa16:398643] *** End of error message ***
    [14] ./GPU-ESVcase01(+0x471e8)[0x55a902a941e8]
    [medusa16:398642] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f8a8a725083]
    [medusa16:398642] [16] ./GPU-ESVcase01(+0x4736e)[0x55a902a9436e]
    [medusa16:398642] *** End of error message ***
    bash: line 1: 398642 Segmentation fault (core dumped) ./GPU-ESVcase01
    ————————————————————————–
    Primary job terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    ————————————————————————–
    bash: line 1: 398643 Segmentation fault (core dumped) ./GPU-ESVcase01
    ————————————————————————–
    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:

    Process name: [[29179,1],0]
    Exit code: 139
    ————————————————————————–

    #6939
    Adrian
    Keymaster

    Your OpenMPI build likely wasn’t compiled with CUDA support. CUDA-aware MPI is required for multi GPU simulations in release 1.5. You can check whether this is available using e.g. ompi_info --parsable --all | grep mpi_built_with_cuda_support:value which should return:

    mca:mpi:base:param:mpi_built_with_cuda_support:value:true

    If you run this on a cluster there likely is a module already available, otherwise you’ll have to check how this can be installed on your particular distribution (I’ll still be happy to help further). If no package / build option (as e.g. for the declarative Nix shell environment included in the release) is available on your system you’ll have to compile OpenMPI / some other CUDA-aware MPI library manually. One additional option is to use Nvidia’s HPC SDK which includes a CUDA-aware build of OpenMPI (this is the environment I commonly use on our cluster).

    Sorry for the unhelpful error message, this will be improved with 1.6 – the latest release was only the first step in OpenLB GPU support.

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.