Issues to run code examples with Nvidia A100 GPU • OpenLB

This topic has 7 replies, 3 voices, and was last updated 6 months ago by Adrian.

Viewing 8 posts - 1 through 8 (of 8 total)

Author

Posts
September 13, 2022 at 4:21 pm #6798

jflorezgi
Participant

hi everyone,

I have carried out several tests running example codes and my applications with different Nvidia cards without problem using the config.mk suggested by you in config file. Now I have access to two Nvidia A100 cards (Ampere Architecture) and I want to start running my applications on them.
According to the rules.mk file, there are three versions for this architecture, I’m not sure which one is indicated for my graphics card, I’ve tried all three versions (doing the steps indicated in gpu_only.mk, make clean, make, etc.) and version 80 is the only one that seems to try to compile but in all three cases it appears an error and does not finish compiling correctly, below I show the errors in the three cases:

Version 86, 87:
make -C ../../../external
make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
make -C zlib
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
cp tinyxml/build/libtinyxml.a lib/
make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
nvcc -O3 -std=c++17 –generate-code=arch=compute_86,code=[compute_86,sm_86] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
nvcc fatal : Unsupported gpu architecture ‘compute_86’
make: *** [../../../default.mk:31: cavity3d.o] Error 1.

Version 80:
make -C ../../../external
make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
make -C zlib
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
cp tinyxml/build/libtinyxml.a lib/
make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
nvcc -O3 -std=c++17 –generate-code=arch=compute_80,code=[compute_80,sm_80] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
Command-line error #614: invalid error number in diagnostic control option: 20014

1 catastrophic error detected in this compilation.
Compilation terminated.
make: *** [../../../default.mk:31: cavity3d.o] Error 1.

The idea is finally to be able to run the programs on both cards using the gpu_openmpi.mk file. I appreciate any help you can give me.

September 13, 2022 at 5:26 pm #6799

jflorezgi
Participant

I’m checking a little more and I need to update some packages, so for now I can’t guarantee problems in the compilation with the graphics card, sorry, I’ll write when I have solved it if the problem persists

October 27, 2022 at 11:30 am #6915

Adrian
Keymaster

In case this is still an issue: For our A100s we use the CUDA_ARCH := 80. 86 is used for e.g. RTX 30* series GPUs and available starting 11.1 which indicates that the error source in your log was a lower CUDA version. For the CUDA MPI usage I have made the best experiences using the Nvidia HPC SDK which bundles CUDA and a matching MPI library.

January 24, 2024 at 11:07 am #8169

thanhphatvt
Participant

Hi Adrian,
I had the same error but I use K80 card. Could you help me to solve it.
Command-line error #614: invalid error number in diagnostic control option: 20014
1 catastrophic error detected in this compilation.
Thank you so much!

January 24, 2024 at 12:01 pm #8171

Adrian
Keymaster

The issue here is likely that the CUDA release is too old (the diagnostic control option should be independent of any architecture setting). To confirm you can remove the option in line 78 of rules.mk.

The CUDA_ARCH value for a K80 should be 30 as it belongs to the Kepler generation.

January 24, 2024 at 12:20 pm #8172

thanhphatvt
Participant

Hi Adrian,
Thanks for your reply. I use the cuda_arch value is 37. And also I use the cuda version is 11.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Thank you so much!

January 24, 2024 at 12:32 pm #8173

thanhphatvt
Participant

Hi Adrian,
The problem is like this. After I removed the line 78. Thank you
nvcc -c -o build/trees.o ./trees.c
nvcc -c -o build/zutil.o ./zutil.c
nvcc -c -o build/compress.o ./compress.c
nvcc -c -o build/uncompr.o ./uncompr.c
nvcc -c -o build/gzclose.o ./gzclose.c
nvcc -c -o build/gzlib.o ./gzlib.c
nvcc -c -o build/gzread.o ./gzread.c
nvcc -c -o build/gzwrite.o ./gzwrite.c
ar rc build//libz.a ./build/adler32.o ./build/crc32.o ./build/deflate.o ./build/infback.o ./build/inffast.o ./build/inflate.o ./build/inftrees.o ./build/trees.o ./build/zutil.o ./build/compress.o ./build/uncompr.o ./build/gzclose.o ./build/gzlib.o ./build/gzread.o ./build/gzwrite.o
make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
nvcc -c tinystr.cpp -o build/tinystr.o
In file included from tinystr.cpp:32:0:
tinystr.h: In copy constructor ‘TiXmlString::TiXmlString(const TiXmlString&)’:
tinystr.h:82:50: error: ‘nullptr’ was not declared in this scope
TiXmlString ( const TiXmlString & copy) : rep_(nullptr)
^
tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*)’:
tinystr.h:89:58: error: ‘nullptr’ was not declared in this scope
TIXML_EXPLICIT TiXmlString ( const char * copy) : rep_(nullptr)
^
tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*, TiXmlString::size_type)’:
tinystr.h:96:72: error: ‘nullptr’ was not declared in this scope
TIXML_EXPLICIT TiXmlString ( const char * str, size_type len) : rep_(nullptr)
^
make[2]: *** [build/tinystr.o] Error 1
make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
make[1]: *** [tinyxml] Error 2
make[1]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external’
make: *** [dependencies] Error 2

January 25, 2024 at 10:41 am #8174

Adrian
Keymaster

The error you are encountering is that your nvcc uses an older standard than C++11 (where nullptr was added) by default. Does a manual execution of nvcc -std=c++11 -c tinystr.cpp -o build/tinystr.o work? Is the environment where you build OpenLB using CUDA 11’s nvcc for sure? (It is quite easy to mix this up depending on where and how you installed CUDA, there may also be multiple versions in parallel)

In any case, as per the release notes OpenLB 1.6 requires at least CUDA 11.4.
Author

Posts

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.