Skip to content

Issues to run code examples with Nvidia A100 GPU

OpenLB – Open Source Lattice Boltzmann Code Forums on OpenLB General Topics Issues to run code examples with Nvidia A100 GPU

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #6798
    jflorezgi
    Participant

    hi everyone,

    I have carried out several tests running example codes and my applications with different Nvidia cards without problem using the config.mk suggested by you in config file. Now I have access to two Nvidia A100 cards (Ampere Architecture) and I want to start running my applications on them.
    According to the rules.mk file, there are three versions for this architecture, I’m not sure which one is indicated for my graphics card, I’ve tried all three versions (doing the steps indicated in gpu_only.mk, make clean, make, etc.) and version 80 is the only one that seems to try to compile but in all three cases it appears an error and does not finish compiling correctly, below I show the errors in the three cases:

    Version 86, 87:
    make -C ../../../external
    make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
    make -C zlib
    make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
    make[2]: Nothing to be done for ‘all’.
    make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
    cp zlib/build/libz.a lib/
    make -C tinyxml
    make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
    make[2]: Nothing to be done for ‘all’.
    make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
    cp tinyxml/build/libtinyxml.a lib/
    make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
    nvcc -O3 -std=c++17 –generate-code=arch=compute_86,code=[compute_86,sm_86] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
    nvcc fatal : Unsupported gpu architecture ‘compute_86’
    make: *** [../../../default.mk:31: cavity3d.o] Error 1.

    Version 80:
    make -C ../../../external
    make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
    make -C zlib
    make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
    make[2]: Nothing to be done for ‘all’.
    make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
    cp zlib/build/libz.a lib/
    make -C tinyxml
    make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
    make[2]: Nothing to be done for ‘all’.
    make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
    cp tinyxml/build/libtinyxml.a lib/
    make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
    nvcc -O3 -std=c++17 –generate-code=arch=compute_80,code=[compute_80,sm_80] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
    Command-line error #614: invalid error number in diagnostic control option: 20014

    1 catastrophic error detected in this compilation.
    Compilation terminated.
    make: *** [../../../default.mk:31: cavity3d.o] Error 1.

    The idea is finally to be able to run the programs on both cards using the gpu_openmpi.mk file. I appreciate any help you can give me.

    #6799
    jflorezgi
    Participant

    I’m checking a little more and I need to update some packages, so for now I can’t guarantee problems in the compilation with the graphics card, sorry, I’ll write when I have solved it if the problem persists

    #6915
    Adrian
    Keymaster

    In case this is still an issue: For our A100s we use the CUDA_ARCH := 80. 86 is used for e.g. RTX 30* series GPUs and available starting 11.1 which indicates that the error source in your log was a lower CUDA version. For the CUDA MPI usage I have made the best experiences using the Nvidia HPC SDK which bundles CUDA and a matching MPI library.

    #8169
    thanhphatvt
    Participant

    Hi Adrian,
    I had the same error but I use K80 card. Could you help me to solve it.
    Command-line error #614: invalid error number in diagnostic control option: 20014
    1 catastrophic error detected in this compilation.
    Thank you so much!

    #8171
    Adrian
    Keymaster

    The issue here is likely that the CUDA release is too old (the diagnostic control option should be independent of any architecture setting). To confirm you can remove the option in line 78 of rules.mk.

    The CUDA_ARCH value for a K80 should be 30 as it belongs to the Kepler generation.

    #8172
    thanhphatvt
    Participant

    Hi Adrian,
    Thanks for your reply. I use the cuda_arch value is 37. And also I use the cuda version is 11.
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Wed_Jul_22_19:09:09_PDT_2020
    Cuda compilation tools, release 11.0, V11.0.221
    Build cuda_11.0_bu.TC445_37.28845127_0

    Thank you so much!

    #8173
    thanhphatvt
    Participant

    Hi Adrian,
    The problem is like this. After I removed the line 78. Thank you
    nvcc -c -o build/trees.o ./trees.c
    nvcc -c -o build/zutil.o ./zutil.c
    nvcc -c -o build/compress.o ./compress.c
    nvcc -c -o build/uncompr.o ./uncompr.c
    nvcc -c -o build/gzclose.o ./gzclose.c
    nvcc -c -o build/gzlib.o ./gzlib.c
    nvcc -c -o build/gzread.o ./gzread.c
    nvcc -c -o build/gzwrite.o ./gzwrite.c
    ar rc build//libz.a ./build/adler32.o ./build/crc32.o ./build/deflate.o ./build/infback.o ./build/inffast.o ./build/inflate.o ./build/inftrees.o ./build/trees.o ./build/zutil.o ./build/compress.o ./build/uncompr.o ./build/gzclose.o ./build/gzlib.o ./build/gzread.o ./build/gzwrite.o
    make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/zlib’
    cp zlib/build/libz.a lib/
    make -C tinyxml
    make[2]: Entering directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
    nvcc -c tinystr.cpp -o build/tinystr.o
    In file included from tinystr.cpp:32:0:
    tinystr.h: In copy constructor ‘TiXmlString::TiXmlString(const TiXmlString&)’:
    tinystr.h:82:50: error: ‘nullptr’ was not declared in this scope
    TiXmlString ( const TiXmlString & copy) : rep_(nullptr)
    ^
    tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*)’:
    tinystr.h:89:58: error: ‘nullptr’ was not declared in this scope
    TIXML_EXPLICIT TiXmlString ( const char * copy) : rep_(nullptr)
    ^
    tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*, TiXmlString::size_type)’:
    tinystr.h:96:72: error: ‘nullptr’ was not declared in this scope
    TIXML_EXPLICIT TiXmlString ( const char * str, size_type len) : rep_(nullptr)
    ^
    make[2]: *** [build/tinystr.o] Error 1
    make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
    make[1]: *** [tinyxml] Error 2
    make[1]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external’
    make: *** [dependencies] Error 2

    #8174
    Adrian
    Keymaster

    The error you are encountering is that your nvcc uses an older standard than C++11 (where nullptr was added) by default. Does a manual execution of nvcc -std=c++11 -c tinystr.cpp -o build/tinystr.o work? Is the environment where you build OpenLB using CUDA 11’s nvcc for sure? (It is quite easy to mix this up depending on where and how you installed CUDA, there may also be multiple versions in parallel)

    In any case, as per the release notes OpenLB 1.6 requires at least CUDA 11.4.

Viewing 8 posts - 1 through 8 (of 8 total)
  • You must be logged in to reply to this topic.