sfraniatte
Forum Replies Created
-
AuthorPosts
-
sfraniatteParticipant
I forgot to mention that it’s only the outlet pressure condition that doesn’t work (the flow remains zero) with the MRT method.
sfraniatteParticipantHello,
I achieved a significant performance gain by changing the number of cuboids. I started from the aorta3d example, which used too many cuboids (8 per CPU core in parallel), and I greatly reduced this number (1 per core with 32 cores).
Hope this helps. Good luck!
Sylvain
sfraniatteParticipantHello,
I have solved my problem by looking in the file “stlReader.hh”. Then I have seen these lines :
template<typename T>
bool STLtriangle<T>::isPointInside(const PhysR<T,3>& pt) const
{
// tests with T=double and T=float show that the epsilon must be increased
const T epsilon = std::numeric_limits<BaseType<T>>::epsilon()*T(10);const T beta = pt * uBeta + kBeta;
const T gamma = pt * uGamma + kGamma;// check if approximately equal
if ( util::nearZero(norm(pt – (point[0].coords + beta*(point[1].coords-point[0].coords) + gamma*(point[2].coords-point[0].coords))), epsilon) ) {
const T alpha = T(1) – beta – gamma;
return (beta >= T(0) || util::nearZero(beta, epsilon))
&& (gamma >= T(0) || util::nearZero(gamma, epsilon))
&& (alpha >= T(0) || util::nearZero(alpha, epsilon));
}
return false;
}There is a first solution which is change the FLOATING_POINT_TYPE in the file “config.mk” from double to float. But the calculation is slower. So, I am trying another solution which is to change the “epsilon” used by the STLreader in two files : “stlReader.hh” and “octree.hh”
sfraniatteParticipantI do not know where is the problem exactly. If it is during the stlReader operation or rename operation. I have some indicator’s voxels which are wrong and the only solution seems to do not use GPU. I am sure that the problem comes from the GPU usage or the nvcc (whtih an nvidia card on Ampere architecture) usage to compile the code because I tested with gcc and mpirun on CPU whithout any problem.
Thank you for your time !
sfraniatteParticipantYes, I can. Here is the good .mk file :
# Example build config for OpenLB using CUDA on single GPU systems
#
# Tested using CUDA 11.4
#
# Usage:
# – Copy this file to OpenLB root asconfig.mk
# – Adjust CUDA_ARCH to match your specifc GPU
# – Runmake clean; make
# – Switch to example directory, e.g.examples/laminar/cavity3dBenchmark
# – Runmake
# – Start the simulation using./cavity3d
CXX := nvcc
CC := nvccCXXFLAGS := -O3
CXXFLAGS += -std=c++17 –forward-unknown-to-host-compilerPARALLEL_MODE := NONE
PLATFORMS := CPU_SISD GPU_CUDA
# for e.g. RTX 30* (Ampere), see table in
rules.mk
for other options
CUDA_ARCH := 86FLOATING_POINT_TYPE := float
USE_EMBEDDED_DEPENDENCIES := ON
###########################################################################################
Here is the first one which does not work very well :
# OpenLB build configuration
#
# This file sets up the necessary build flags for compiling OpenLB with
# the GNU C++ compiler and sequential execution. For more complex setups
# edit this file or consult the example configs provided inconfig/
.
#
# Basic usage:
# – Edit variables to fit desired configuration
# – Runmake clean; make
to clean up any previous artifacts and compile the dependencies
# – Switch to example directory, e.g.examples/laminar/poiseuille2d
# – Runmake
# – Start the simulation using./poiseuille2d
# Compiler to use for C++ files, change to
mpic++
when using OpenMPI and GCC
#~ #parallel CPU ou hybrid
#~ CXX := mpic++
#GPU
CXX := nvcc# Compiler to use for C files (used for emebedded dependencies)
#parallel CPU ou hybrid
#~ CC := gcc
#GPU
CC := nvcc# Suggested optimized build flags for GCC, consult
config/
for further examples
#parallel CPU ou hybrid
#~ CXXFLAGS := -O3 -Wall -march=native -mtune=native
#GPU
CXXFLAGS := -O3
CXXFLAGS += –forward-unknown-to-host-compiler
# Uncomment to add debug symbols and enable runtime asserts
#~ #CXXFLAGS += -g -DOLB_DEBUG# OpenLB requires support for C++17
# works in:
# * gcc 9 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx17)
# * icc 19.0 or later (https://software.intel.com/en-us/articles/c17-features-supported-by-intel-c-compiler)
# * clang 7 or later (https://clang.llvm.org/cxx_status.html#cxx17)
CXXFLAGS += -std=c++17# optional linker flags
LDFLAGS :=# Parallelization mode, must be one of: OFF, MPI, OMP, HYBRID
# Note that for MPI and HYBRID the compiler also needs to be adapted.
# See e.g.config/cpu_gcc_openmpi.mk
#parallel CPU
#~ PARALLEL_MODE := MPI
#GPU
PARALLEL_MODE := NONE
#~ #hybrid
#~ PARALLEL_MODE := HYBRID# optional MPI and OpenMP flags
#parallel CPU
#~ MPIFLAGS :=
#~ OMPFLAGS := -fopenmp# Options: CPU_SISD, CPU_SIMD, GPU_CUDA
# Both CPU_SIMD and GPU_CUDA require system-specific adjustment of compiler flags.
# See e.g.config/cpu_simd_intel_mpi.mk
orconfig/gpu_only.mk
for examples.
# CPU_SISD must always be present.
#parallel CPU
#~ PLATFORMS := CPU_SISD
#GPU
PLATFORMS := CPU_SISD GPU_CUDA
#hybrid
#~ PLATFORMS := CPU_SISD CPU_SIMD GPU_CUDA
#~ # Compiler to use for CUDA-enabled files
#~ CUDA_CXX := nvcc
#~ CUDA_CXXFLAGS := -O3 -std=c++17
#~ # Adjust to enable resolution of libcuda, libcudart, libcudadevrt
#~ CUDA_LDFLAGS := -L/run/libcuda/lib
#~ CUDA_LDFLAGS += -fopenmp
#~ #GPU ou hybrid
CUDA_ARCH := 86# Fundamental arithmetic data type
# Common options are float or double
#parallel CPU
#~ FLOATING_POINT_TYPE := double
#GPU ou hybrid
FLOATING_POINT_TYPE := float# Any entries are passed to the compiler as
-DFEATURE_*
declarations
# Used to enable some alternative code paths and dependencies
FEATURES :=# Set to OFF if libz and tinyxml are provided by the system (optional)
USE_EMBEDDED_DEPENDENCIES := ON###################################################################################
Also, I am trying to run my case on GPU but it is really too slow. The main difference with aorta example (which works well for me now) is the surface of the inlet (which is really bigger) and the fact that there are external edges and corners (the inlet has 5 faces). I am working on cleaning my code to have something as in the nozzle example (with stlReader). It can be that but I am not sure.Thank you !
sfraniatteParticipantHello,
Thank you for your reply (which I’ve only just seen now because I forgot to check the notification option). In the meantime, I started working on implementing such a condition. That said, I feel like the approach you’re talking about complements the one I’ve coded. I’ll start testing my development tomorrow. In any case, I do plan to come to Marseille. Thanks again!
Best regards,
Sylvain
sfraniatteParticipantI think I have solved my problem. The mistake was that I left the line “FEATURES :=” uncommented. Now, to be exhaustive, here the exact commands that I used :
_ ./nozzle3d –resolution 5 –max-phys-t 10 : the calculation duration (measured time cpu) was 29.278s and an average MLUPS of 376.907
_ ./nozzle3d –resolution 10 –max-phys-t 10 : the calculation duration (measured time cpu) was 501.328s and an average MLUPS of 686.931I ran these calculations today and the evolution seems good this time. Thanks a lot for your time and I hope this will be useful for someone.
Best regards,
SylvainDecember 6, 2024 at 10:10 am in reply to: Set Pressure Boundary Conditions for Inlet and Outlet #9589sfraniatteParticipantHello,
I attempted to apply these boundary conditions, but it turned out not to be a good idea in my case, as the simulation became less stable. I believe this might explain why there is so little information available on how to implement them. You have likely done everything correctly; it’s just that this approach doesn’t seem to work as intended in this particular situation.
For instance, when I applied these boundary conditions in the aorta example, the flow reversed for certain pressure inlet values.
sfraniatteParticipantI am sorry because I understood my error… The number of cores used was to high which slow down the calculation. Stupid mistake !
Thanks a lot for your time !
Sylvain
sfraniatteParticipantOk, I have two computers :
_ an Ubuntu 22.04 VM on windows 11 with 6 CPU cores (Intel(R) Core(TM) i5-10400T CPU @ 2.00GHz)
_ an Ubuntu 24.04 with 32 CPU cores (AMD Ryzen Threadripper PRO 5975WX 32-Cores) and with a GPU cardMy goal, today, is to launch a calculation on the computer which run on Ubuntu 24.04 on the CPU cores to run a biger simulation. However, when I test with the same case (my own case), the calculation is slower than when it is done whith the VM on windows.
I am wondering if the problem comes from CPUs which does not have the same brand.
Sylvain
sfraniatteParticipantI mean that the calculation is not faster than in sequential mode. The calculation is faster on my other computer with 6 Threads…
Yes, indeed, I would like to use the CPUs only for now and one GPU in the future when my code will be ready for that.
Thank you for the details.
Sylvain
sfraniatteParticipantOk, the first lines after this call are :
[MpiManager] Sucessfully initialized, numThreads=32
[ThreadPool] Sucessfully initialized, numThreads=1Yes, it returned true. However, I uninstalled CUDA then to have Openmpi which works properly. Because, it was not working. And it still does not work…
My main goal is to have Openmpi to use CPU. But I have the feeling that it is not possible due to the presence of the GPU card.
So, to sum up, I uninstalled CUDA and I install again Openmpi as explained in the user manual (sudo apt-get install openmpi-bin openmpi-doc libopenmpi-dev) but it still does not work properly.
Sylavin
sfraniatteParticipantWhere can I find the terminal log ? Is it when I compile Openlb at he installation ?
Yes, I did follow this section…
Thank you for your answer !
SylvainsfraniatteParticipantDo you use a debugger like gdb ? I think it can help you
sfraniatteParticipantWhich error/bug did you get ? Take care to locate the inlet and outlet at the correct places…
-
AuthorPosts