Issues to run code examples with Nvidia A100 GPU
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Issues to run code examples with Nvidia A100 GPU
- This topic has 7 replies, 3 voices, and was last updated 8 months, 2 weeks ago by Adrian.
-
AuthorPosts
-
September 13, 2022 at 4:21 pm #6798jflorezgiParticipant
hi everyone,
I have carried out several tests running example codes and my applications with different Nvidia cards without problem using the config.mk suggested by you in config file. Now I have access to two Nvidia A100 cards (Ampere Architecture) and I want to start running my applications on them.
According to the rules.mk file, there are three versions for this architecture, I’m not sure which one is indicated for my graphics card, I’ve tried all three versions (doing the steps indicated in gpu_only.mk, make clean, make, etc.) and version 80 is the only one that seems to try to compile but in all three cases it appears an error and does not finish compiling correctly, below I show the errors in the three cases:Version 86, 87:
make -C ../../../external
make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
make -C zlib
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
cp tinyxml/build/libtinyxml.a lib/
make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
nvcc -O3 -std=c++17 –generate-code=arch=compute_86,code=[compute_86,sm_86] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
nvcc fatal : Unsupported gpu architecture ‘compute_86’
make: *** [../../../default.mk:31: cavity3d.o] Error 1.Version 80:
make -C ../../../external
make[1]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
make -C zlib
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
make[2]: Nothing to be done for ‘all’.
make[2]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external/tinyxml’
cp tinyxml/build/libtinyxml.a lib/
make[1]: Leaving directory ‘/home/jflorez/proyecto/olb-1.5r0/external’
nvcc -O3 -std=c++17 –generate-code=arch=compute_80,code=[compute_80,sm_80] –extended-lambda –expt-relaxed-constexpr -x cu -Xcudafe “–diag_suppress=implicit_return_from_non_void_function –display_error_number –diag_suppress=20014 –diag_suppress=20011” -DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -I../../../src -I../../../external/zlib -I../../../external/tinyxml -c -o cavity3d.o cavity3d.cpp
Command-line error #614: invalid error number in diagnostic control option: 200141 catastrophic error detected in this compilation.
Compilation terminated.
make: *** [../../../default.mk:31: cavity3d.o] Error 1.The idea is finally to be able to run the programs on both cards using the gpu_openmpi.mk file. I appreciate any help you can give me.
September 13, 2022 at 5:26 pm #6799jflorezgiParticipantI’m checking a little more and I need to update some packages, so for now I can’t guarantee problems in the compilation with the graphics card, sorry, I’ll write when I have solved it if the problem persists
October 27, 2022 at 11:30 am #6915AdrianKeymasterIn case this is still an issue: For our A100s we use the
CUDA_ARCH := 80
. 86 is used for e.g. RTX 30* series GPUs and available starting 11.1 which indicates that the error source in your log was a lower CUDA version. For the CUDA MPI usage I have made the best experiences using the Nvidia HPC SDK which bundles CUDA and a matching MPI library.January 24, 2024 at 11:07 am #8169thanhphatvtParticipantHi Adrian,
I had the same error but I use K80 card. Could you help me to solve it.
Command-line error #614: invalid error number in diagnostic control option: 20014
1 catastrophic error detected in this compilation.
Thank you so much!January 24, 2024 at 12:01 pm #8171AdrianKeymasterThe issue here is likely that the CUDA release is too old (the diagnostic control option should be independent of any architecture setting). To confirm you can remove the option in line 78 of
rules.mk
.The
CUDA_ARCH
value for a K80 should be 30 as it belongs to the Kepler generation.January 24, 2024 at 12:20 pm #8172thanhphatvtParticipantHi Adrian,
Thanks for your reply. I use the cuda_arch value is 37. And also I use the cuda version is 11.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0Thank you so much!
January 24, 2024 at 12:32 pm #8173thanhphatvtParticipantHi Adrian,
The problem is like this. After I removed the line 78. Thank you
nvcc -c -o build/trees.o ./trees.c
nvcc -c -o build/zutil.o ./zutil.c
nvcc -c -o build/compress.o ./compress.c
nvcc -c -o build/uncompr.o ./uncompr.c
nvcc -c -o build/gzclose.o ./gzclose.c
nvcc -c -o build/gzlib.o ./gzlib.c
nvcc -c -o build/gzread.o ./gzread.c
nvcc -c -o build/gzwrite.o ./gzwrite.c
ar rc build//libz.a ./build/adler32.o ./build/crc32.o ./build/deflate.o ./build/infback.o ./build/inffast.o ./build/inflate.o ./build/inftrees.o ./build/trees.o ./build/zutil.o ./build/compress.o ./build/uncompr.o ./build/gzclose.o ./build/gzlib.o ./build/gzread.o ./build/gzwrite.o
make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/zlib’
cp zlib/build/libz.a lib/
make -C tinyxml
make[2]: Entering directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
nvcc -c tinystr.cpp -o build/tinystr.o
In file included from tinystr.cpp:32:0:
tinystr.h: In copy constructor ‘TiXmlString::TiXmlString(const TiXmlString&)’:
tinystr.h:82:50: error: ‘nullptr’ was not declared in this scope
TiXmlString ( const TiXmlString & copy) : rep_(nullptr)
^
tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*)’:
tinystr.h:89:58: error: ‘nullptr’ was not declared in this scope
TIXML_EXPLICIT TiXmlString ( const char * copy) : rep_(nullptr)
^
tinystr.h: In constructor ‘TiXmlString::TiXmlString(const char*, TiXmlString::size_type)’:
tinystr.h:96:72: error: ‘nullptr’ was not declared in this scope
TIXML_EXPLICIT TiXmlString ( const char * str, size_type len) : rep_(nullptr)
^
make[2]: *** [build/tinystr.o] Error 1
make[2]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external/tinyxml’
make[1]: *** [tinyxml] Error 2
make[1]: Leaving directory `/home/gigabyte_cuda_03/Downloads/olb-1.6r0/external’
make: *** [dependencies] Error 2January 25, 2024 at 10:41 am #8174AdrianKeymasterThe error you are encountering is that your
nvcc
uses an older standard than C++11 (wherenullptr
was added) by default. Does a manual execution ofnvcc -std=c++11 -c tinystr.cpp -o build/tinystr.o
work? Is the environment where you build OpenLB using CUDA 11’snvcc
for sure? (It is quite easy to mix this up depending on where and how you installed CUDA, there may also be multiple versions in parallel)In any case, as per the release notes OpenLB 1.6 requires at least CUDA 11.4.
-
AuthorPosts
- You must be logged in to reply to this topic.