Multiple GPU on HPC Calculation Signal: Segmentation fault (11)
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Multiple GPU on HPC Calculation Signal: Segmentation fault (11)
- This topic has 3 replies, 2 voices, and was last updated 2 months, 3 weeks ago by aseidler.
-
AuthorPosts
-
June 3, 2024 at 4:56 pm #8759aseidlerParticipant
Hello,
I have managed to run a simulation on 2 GPUs. Unfortunately, the simulation got an error during the calculation. It works fine with 1 GPU, but when I try to run it with multiple GPUs, it always crashes at that point. Maybe something is wrong with the cuboids or mpi?
Perhaps someone had the same problem or more experience.error message:
[main] starting simulation…
[i8013:267141] *** Process received signal ***
[i8013:267141] Signal: Segmentation fault (11)
[i8013:267141] Signal code: Invalid permissions (2)
[i8013:267141] Failing at address: 0x36b1fa800
[i8013:267141] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14ebcc1a4cf0]
[i8013:267141] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14ebc95cd1e5]
[i8013:267141] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14ebc278f221]
[i8013:267141] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14ebc27b5707]
[i8013:267141] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14ebc2734786]
[i8013:267141] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14ebc27b4e4f]
[i8013:267141] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14ebc27be55d]
[i8013:267141] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14ebc27bf248]
[i8013:267141] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14ebc801e021]
[i8013:267141] [ 9] ./dfxSISM[0x464b99]
[i8013:267141] [10] ./dfxSISM[0x46efba]
[i8013:267141] [11] ./dfxSISM[0x49112f]
[i8013:267141] [12] ./dfxSISM[0x40fdc7]
[i8013:267141] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14ebc9537d85]
[i8013:267141] [14] ./dfxSISM[0x41173e]
[i8013:267141] *** End of error message ***
[i8013:267142] *** Process received signal ***
[i8013:267142] Signal: Segmentation fault (11)
[i8013:267142] Signal code: Invalid permissions (2)
[i8013:267142] Failing at address: 0x341ffec00
[i8013:267142] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14f51a58acf0]
[i8013:267142] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14f5179b31e5]
[i8013:267142] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14f514b83221]
[i8013:267142] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14f514ba9707]
[i8013:267142] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14f514b28786]
[i8013:267142] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14f514ba8e4f]
[i8013:267142] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14f514bb255d]
[i8013:267142] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14f514bb3248]
[i8013:267142] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14f514bf7021]
[i8013:267142] [ 9] ./dfxSISM[0x464b99]
[i8013:267142] [10] ./dfxSISM[0x46efba]
[i8013:267142] [11] ./dfxSISM[0x49112f]
[i8013:267142] [12] ./dfxSISM[0x40fdc7]
[i8013:267142] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14f51791dd85]
[i8013:267142] [14] ./dfxSISM[0x41173e]
[i8013:267142] *** End of error message ***
bash: line 1: 267141 Segmentation fault (core dumped) ./dfxSISM
————————————————————————–
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
————————————————————————–
————————————————————————–
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:Process name: [[40450,1],0]
Exit code: 139########################################################################################################
My Setup:
config:CXX := nvcc
CC := nvccCXXFLAGS := -O3
CXXFLAGS += -std=c++17# –forward-unknown-to-host-compiler
CXXFLAGS += -Xcompiler -I/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/include#Single GPU
#PARALLEL_MODE := NONE#Parallel GPU
PARALLEL_MODE := MPIMPIFLAGS := -L/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib -L/software/rome/r23.10/hwloc/2.7.1-GCCcore-11.3.0/lib -L/software/rome/r23.10/libevent/2.1.12-GCCcore-11.3.0/lib -lmpi
PLATFORMS := CPU_SISD GPU_CUDA
# for e.g. RTX 30* (Ampere), see table in
rules.mk
for other options
CUDA_ARCH :=80FLOATING_POINT_TYPE := float
USE_EMBEDDED_DEPENDENCIES := ON
Simulation:
#include “olb3D.h”
#include “olb3D.hh”
#include <string>
using namespace olb;
using namespace olb::descriptors;
//using namespace olb::graphics;
//using namespace olb::util;//#define Smagorinsky
using T = FLOATING_POINT_TYPE;
const T Cs = 0.12;
using DESCRIPTOR = D3Q19<>;
using BulkDynamics = SmagorinskyBGKdynamics<T,DESCRIPTOR>;const T Re = 2*6421. ;//1429.;
const T charPhysNu = 1.0034e-6;//0.6828e-6;// Water by 38 Grad Celcius //1.0034e-6;
const T phsyRefL = 0.01;
const T physU = Re*charPhysNu/0.01;//Re*charPhysNu/0.0074;
const T physRho = 998.;//993.;//998.;
const T latticeWallDistance = 0.001;
const T adaptedPhysSimulatedLength = phsyRefL;//-(2.*latticeWallDistance/T(N+2*latticeWallDistance));
const T maxPhysT = 2.;//pow(phsyRefL,2.)*pow((Re*charPhysNu),-1.0);
T meanVelo_inlet = 0.0;
bool IsInletCharPhysU = false;
// Stores data from stl file in geometry in form of material numbers
void prepareGeometry( UnitConverter<T,DESCRIPTOR> const& converter, IndicatorF3D<T>& indicator,
STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
{OstreamManager clout( std::cout,”prepareGeometry” );
clout << “Prepare Geometry …” << std::endl;superGeometry.rename( 0,2,indicator );
superGeometry.rename( 2,1,stlReader );superGeometry.clean();
IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01); //[0, 5.547, -25] mm
IndicatorCylinder3D<T> layerOutflow( outflow, 2.*converter.getConversionFactorLength() );
superGeometry.rename( 2,4,1,layerOutflow );// Set material number for outflow0
IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725, 0., 0.,1., 0.01 ); //[0, -5.547, 55] mm
IndicatorCylinder3D<T> layerInflow( inflow, 2.*converter.getConversionFactorLength() );
superGeometry.rename( 2,3,1,layerInflow );superGeometry.clean(1);
superGeometry.innerClean();
superGeometry.outerClean();
superGeometry.checkForErrors();superGeometry.print();
clout << “Prepare Geometry … OK” << std::endl;
}
void prepareLattice( SuperLattice<T, DESCRIPTOR>& lattice,
UnitConverter<T,DESCRIPTOR> const& converter,
STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
{OstreamManager clout( std::cout,”prepareLattice” );
clout << “Prepare Lattice …” << std::endl;const T omega = converter.getLatticeRelaxationFrequency();
// material=1 –> bulk dynamics
lattice.defineDynamics<BulkDynamics>(superGeometry, 1);
lattice.setParameter<collision::LES::Smagorinsky>(Cs); //Bis 0.18 üblich bis 0.4 möglich
// material=2 –> no dynamics + bouzidi zero velocity
setBouzidiBoundary<T,DESCRIPTOR>(lattice, superGeometry, 2, stlReader);// material=3 –> no dynamics + bouzidi velocity (inflow)
setBouzidiBoundary<T,DESCRIPTOR,BouzidiVelocityPostProcessor>(lattice, superGeometry, 3, stlReader);// material=4,5 –> bulk dynamics + pressure (outflow)
lattice.defineDynamics<BulkDynamics>(superGeometry.getMaterialIndicator({4}));
setInterpolatedPressureBoundary<T,DESCRIPTOR>(lattice, omega, superGeometry.getMaterialIndicator({4}));
//PoiseulleInletCirclePoiseuille3D<T> poisseuilleU (superGeometry,3,converter.getCharLatticeVelocity(),converter.getConversionFactorLength());
lattice.defineU(superGeometry,3,poisseuilleU);
AnalyticalConst3D<T,T> rhoF( 1 );lattice.setParameter<descriptors::OMEGA>(omega);
lattice.initialize();clout << “Prepare Lattice … OK” << std::endl;
}// Generates a slowly increasing sinuidal inflow
void setBoundaryValues( SuperLattice<T, DESCRIPTOR>& sLattice,
UnitConverter<T,DESCRIPTOR> const& converter, int iT,
SuperGeometry<T,3>& superGeometry )
{
int iTmaxStart = converter.getLatticeTime( maxPhysT*0.5);
int iTperiod = converter.getLatticeTime( 0.5);
int iTupdate = 50;
T maxUphys = physU*-1;//sLattice.getStatistics().getMaxU()*converter.getConversionFactorVelocity();
//OstreamManager clout( std::cout,”Debug” );
if ( iT%iTupdate == 0 && iT<=iTmaxStart){//eanVelo_inlet < physU&& meanVelo_inlet < physU){//(inletMeanVelocity <= physU || std::isnan(inletMeanVelocity))) {
// Smooth start curve, sinus
//SinusStartScale<T,int> nSinusStartScale( iTperiod,converter.getCharLatticeVelocity() );PolynomialStartScale<T,int> startScale( iTmaxStart, T(1) );
int iTvec[1]= {iT};T frac[1] = {};//T();
startScale( frac,iTvec );
//clout<<“frac is: “<< frac[0] << std::endl;
T meanVelocity = frac[0]*converter.getCharLatticeVelocity();//*converter.getConversionFactorVelocity()*converter.getCharLatticeVelocity();//Poiseulle Boundary INlet velocity profile
//factorlength
CirclePoiseuille3D<T> velocity( true,superGeometry,3,meanVelocity,T(), T(1));//converter.getConversionFactorLength());
//clout<<“InletVelo is: “<< maxVelocity << std::endl;
setBouzidiVelocity(sLattice, superGeometry, 3, velocity);
sLattice.setProcessingContext<Array<descriptors::BOUZIDI_VELOCITY>>(ProcessingContext::Simulation);
//}
// Creates and sets the Poiseuille inflow profile using functors}
}// Computes flux at inflow and outflow
void getResults( SuperLattice<T, DESCRIPTOR>& sLattice,
UnitConverter<T,DESCRIPTOR>const& converter, int iT,
SuperGeometry<T,3>& superGeometry, util::Timer<T>& timer, STLreader<T>& stlReader )
{OstreamManager clout( std::cout,”getResults” );
const int vtkIter = converter.getLatticeTime( .5);
const int statIter = converter.getLatticeTime( .5);if ( iT==0 ) {
SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);// Writes the geometry, cuboid no. and rank no. as vti file for visualization
SuperLatticeGeometry3D<T, DESCRIPTOR> geometry( sLattice, superGeometry );
SuperLatticeCuboid3D<T, DESCRIPTOR> cuboid( sLattice );
SuperLatticeRank3D<T, DESCRIPTOR> rank( sLattice );
vtmWriter.write( geometry );
vtmWriter.write( cuboid );
vtmWriter.write( rank );vtmWriter.createMasterFile();
}// Writes the vtk files
if ( iT%vtkIter==0 ) {
sLattice.setProcessingContext(ProcessingContext::Evaluation);
sLattice.scheduleBackgroundOutputVTK([&,iT](auto task) {
SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);
SuperLatticePhysVelocity3D velocity(sLattice, converter);
SuperLatticePhysPressure3D pressure(sLattice, converter);
vtmWriter.addFunctor(velocity);
vtmWriter.addFunctor(pressure);
task(vtmWriter, iT);
});
}// Writes output on the console
if ( iT%statIter==0 ) {
// Timer console output
timer.update( iT );
timer.printStep();// Lattice statistics console output
sLattice.getStatistics().print( iT,converter.getPhysTime( iT ) );// Flux at the inflow and outflow region
std::vector<int> materials = { 1, 3, 4};IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01);
SuperPlaneIntegralFluxVelocity3D<T> vFluxOutflow( sLattice, converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete );
vFluxOutflow.print( “outflow”,”m/s” );IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725,0., 0.,-1., 0.01);
SuperPlaneIntegralFluxVelocity3D<T> vFluxInflow( sLattice, converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete );
vFluxInflow.print( “inflow0″,”m/s” );int input_velo[1] = {};
T output_velo [vFluxInflow.getTargetDim()];
vFluxInflow.operator()(output_velo,input_velo);
meanVelo_inlet = output_velo[0] / output_velo[1];
clout << “Meanvelocity_Inlet [m/s]: ” << meanVelo_inlet << std::endl;SuperPlaneIntegralFluxPressure3D<T> inlet_pressure(sLattice,converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete);
SuperPlaneIntegralFluxPressure3D<T> outlet_pressure(sLattice,converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete);inlet_pressure.print(“inlet_pressure”, “Pa”);
outlet_pressure.print(“outlet_pressure”,”Pa”);
int input_pressureInlet[1] = {};
T output_pressure_inlet [inlet_pressure.getTargetDim()];
inlet_pressure.operator()(output_pressure_inlet,input_pressureInlet);
T meanPressure_inlet = util::abs(output_pressure_inlet[0] / output_pressure_inlet[1]);int input_pressureOutlet[1] = {};
T output_pressure_outlet [outlet_pressure.getTargetDim()];
outlet_pressure.operator()(output_pressure_outlet,input_pressureOutlet);
T meanPressure_outlet = util::abs(output_pressure_outlet[0] / output_pressure_outlet[1]);T pressureDrop = meanPressure_inlet – meanPressure_outlet;
clout << “pressure-drop [Pa]: ” << pressureDrop << std::endl;
SuperLatticeYplus3D<T, DESCRIPTOR> yPlus( sLattice, converter, superGeometry, stlReader, 3 );
SuperMax3D<T> yPlusMaxF( yPlus, superGeometry, 1 );
int input[4]= {};
T yPlusMax[1];
yPlusMaxF( yPlusMax,input );
clout << “yPlusMax=” << yPlusMax[0] << std::endl;
}//uMax darf nicht größer 0.3 der Machzahl sein ansonsten können die ergebnisse nicht ohne weiteres verwendet werden
if ( sLattice.getStatistics().getMaxU() > 0.3 ) {
clout << “PROBLEM uMax=” << sLattice.getStatistics().getMaxU() << std::endl;
std::exit(0);
}
}int main( int argc, char* argv[] )
{
// === 1st Step: Initialization ===
olbInit( &argc, &argv );
singleton::directories().setOutputDir( “./HX_70_101010/” );
OstreamManager clout( std::cout,”main” );UnitConverterFromResolutionAndRelaxationTime<T, DESCRIPTOR> const converter(
int{N}, //Resolution number of Voxel per charPhysL
(T) 0.5001, //latticeRelaxtionsTime //ALT: maxPhysT latticeU. mean lattice velocity no units
(T) adaptedPhysSimulatedLength, //charPhysLength: reference length of simunlation geometry
(T) physU, //charPhysVelocity;
(T) charPhysNu, //kin. Viskosität
(T) physRho //Density kg/m^3 Water
);// Prints the converter log as console output
converter.print();
// Writes the converter log in a file
converter.write(“Test”);// === 2nd Step: Prepare Geometry ===
// Instantiation of the STLreader class
// file name, voxel size in meter, stl unit in meter, outer voxel no., inner voxel no.
STLreader<T> stlReader( “HX.stl”, converter.getConversionFactorLength(), 0.001, 0, true );
IndicatorLayer3D<T> extendedDomain( stlReader, converter.getConversionFactorLength() );// Instantiation of a cuboidGeometry with weights
const int noOfCuboids = util::min(16*N, 8*singleton::mpi().getSize());CuboidGeometry3D<T> cuboidGeometry( extendedDomain, converter.getConversionFactorLength(), noOfCuboids, “volume” );
// Instantiation of a loadBalancer
HeuristicLoadBalancer<T> loadBalancer( cuboidGeometry );// Instantiation of a superGeometry
SuperGeometry<T,3> superGeometry( cuboidGeometry, loadBalancer );prepareGeometry( converter, extendedDomain, stlReader, superGeometry );
// === 3rd Step: Prepare Lattice ===
SuperLattice<T, DESCRIPTOR> sLattice( superGeometry );util::Timer<T> timer1( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
timer1.start();prepareLattice( sLattice, converter, stlReader, superGeometry );
timer1.stop();
timer1.printSummary();// === 4th Step: Main Loop with Timer ===
clout << “starting simulation…” << std::endl;
util::Timer<T> timer( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
timer.start();for ( std::size_t iT = 0; iT <= converter.getLatticeTime( maxPhysT ); iT++ ) {
// === 5th Step: Definition of Initial and Boundary Conditions ===
setBoundaryValues( sLattice, converter, iT, superGeometry );// === 6th Step: Collide and Stream Execution ===
sLattice.collideAndStream();// === 7th Step: Computation and Output of the Results ===
getResults( sLattice, converter, iT, superGeometry, timer, stlReader );//clout<<“one time step done”<< std::endl;
}timer.stop();
timer.printSummary();
}-Alex
June 3, 2024 at 4:57 pm #8760aseidlerParticipantI forgot to mention that my MPI is built with Cuda.
mca:mpi:base:param:mpi_built_with_cuda_support:value:true
June 4, 2024 at 1:40 pm #8761YujiParticipantDear @aseidler
could you try mpirun with ” -mca btl_smcuda_use_cuda_ipc 0″? for example $mpirun -np 2 –mca btl_smcuda_use_cuda_ipc 0 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d’we disscused similar topic in https://www.openlb.net/forum/topic/multi-gpus-calculation/
- This reply was modified 3 months, 1 week ago by Yuji.
June 18, 2024 at 10:55 am #8834aseidlerParticipantDear Yuji,
I got the bugs under control and it runs on multiple GPUs, it was a problem with Dresden University of Technology’s HPC, the UCX CUDA needs to be loaded separately.
For my future colleagues using Dresden’s HPC, here is what needs to be set up:You need to load the following packages:
ml release/23.04 GCC/11.3.0 OpenMPI/4.1.4 CUDA/11.7 UCX-CUDAMy configuration looks like this:
# Example of a build configuration for OpenLB 1.7 with CUDA and OpenMPICXX := nvcc -ccbin=mpicxx
CC := nvcc -ccbin=mpiccCXXFLAGS := -O3
CXXFLAGS += -std=c++17PARALLEL_MODE := MPI
#MPIFLAGS := -lmpi_cxx -lmpi
PLATFORMS := CPU_SISD GPU_CUDA
CUDA_ARCH := 70 #or 80 for Alpha
FLOATING_POINT_TYPE := Float
USE_EMBEDDED_DEPENDENCIES := ON
-
AuthorPosts
- You must be logged in to reply to this topic.