Running examples on multiple GPUs
OpenLB – Open Source Lattice Boltzmann Code › Forums › on OpenLB › General Topics › Running examples on multiple GPUs
- This topic has 4 replies, 2 voices, and was last updated 8 months, 3 weeks ago by Danial.Khazaeipoul.
-
AuthorPosts
-
June 20, 2024 at 3:57 pm #8840Danial.KhazaeipoulParticipant
Dear community,
I am trying to run an example on a cluster with 2 GPUs allocated. However, I am getting the following error when using the instruction in the config file and running the example as follow:
mpirun -np 2 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./risingBubble3d’
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:bash
Either request fewer slots for your application, or make more slots
available for use.A “slot” is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:1. Hostfile, via “slots=N” clauses (N defaults to number of
processor cores if not provided)
2. The –host command line parameter, via a “:N” suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the –host command line parameter, or an
RM is present, Open MPI defaults to the number of processor coresIn all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
–use-hwthread-cpus option.Alternatively, you can use the –oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.Here is the nvidia-smi output:
+—————————————————————————————–+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|—————————————–+————————+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:01:00.0 Off | 0 |
| N/A 30C P0 56W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+—————————————–+————————+———————-+
| 1 NVIDIA A100-SXM4-40GB Off | 00000000:C1:00.0 Off | 0 |
| N/A 29C P0 50W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+—————————————–+————————+———————-++—————————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+—————————————————————————————–+June 20, 2024 at 4:28 pm #8841AdrianKeymasterHow are you scheduling this? (e.g. what parameters do you define in your SLURM script or similar)
The explanation for the error is given in the message – MPI doesn’t find the two necessary slots (e.g. you request only one task in the scheduling script).
June 20, 2024 at 8:09 pm #8842Danial.KhazaeipoulParticipantCurrently, I am requesting an interactive allocation using the VNC protocol on the cluster. This means I am not submitting the job through a SLURM script. Instead, I am running the “mpirun” command directly, as if I were on a local PC with two Nvidia cards, as shown in the “nvidia-smi” output.
The cluster runs Rocky Linux operating system.
June 20, 2024 at 8:13 pm #8844AdrianKeymasterYou should be able to request more than a single task also for an interactive session. An alternative to this would be to provide a hostfile / use oversubscription as mentioned in the error message (however, I am not sure if this latter option will actually use the cores if they were not requested)
June 20, 2024 at 8:30 pm #8845Danial.KhazaeipoulParticipantOK with the below command, both GPUs are now utilized 100%.
mpirun –oversubscribe –bind-to none -np 1 -x CUDA_VISIBLE_DEVICES=0 ./risingBubble3d : -np 1 -x CUDA_VISIBLE_DEVICES=1 ./risingBubble3d
Currently running:
[MpiManager] Sucessfully initialized, numThreads=2
[ThreadPool] Sucessfully initialized, numThreads=1
[Directories] Directory ./tmp/ created.
[Directories] Directory ./tmp/imageData/ created.
[Directories] Directory ./tmp/imageData/data/ created.
[Directories] Directory ./tmp/vtkData/ created.
[Directories] Directory ./tmp/vtkData/data/ created.
[Directories] Directory ./tmp/gnuplotData/ created.
[Directories] Directory ./tmp/gnuplotData/data/ created.
[UnitConverter] —————– UnitConverter information —————–
[UnitConverter] — Parameters:
[UnitConverter] Resolution: N= 40
[UnitConverter] Lattice velocity: latticeU= 0.00266056
[UnitConverter] Lattice relaxation frequency: omega= 1.99521
[UnitConverter] Lattice relaxation time: tau= 0.5012
[UnitConverter] Characteristical length(m): charL= 0.0261
[UnitConverter] Characteristical speed(m/s): charU= 1
[UnitConverter] Phys. kinematic viscosity(m^2/s): charNu= 9.80996e-05
[UnitConverter] Phys. density(kg/m^d): charRho= 1332
[UnitConverter] Characteristical pressure(N/m^2): charPressure= 0
[UnitConverter] Mach number: machNumber= 0.00460823
[UnitConverter] Reynolds number: reynoldsNumber= 266.056
[UnitConverter] Knudsen number: knudsenNumber= 1.73205e-05
[UnitConverter]
[UnitConverter] — Conversion factors:
[UnitConverter] Voxel length(m): physDeltaX= 0.0006525
[UnitConverter] Time step(s): physDeltaT= 1.73602e-06
[UnitConverter] Velocity factor(m/s): physVelocity= 375.86
[UnitConverter] Density factor(kg/m^3): physDensity= 1332
[UnitConverter] Mass factor(kg): physMass= 3.70038e-07
[UnitConverter] Viscosity factor(m^2/s): physViscosity= 0.245249
[UnitConverter] Force factor(N): physForce= 80.1159
[UnitConverter] Pressure factor(N/m^2): physPressure= 1.88173e+08
[UnitConverter] ————————————————————-
[SuperGeometry3D] cleaned 0 outer boundary voxel(s)
[SuperGeometry3D] cleaned 0 inner boundary voxel(s)
[SuperGeometryStatistics3D] updated
[SuperGeometry3D] the model is correct!
[CuboidGeometry3D] —Cuboid Stucture Statistics—
[CuboidGeometry3D] Number of Cuboids: 2
[CuboidGeometry3D] Delta (min): 0.0006525
[CuboidGeometry3D] (max): 0.0006525
[CuboidGeometry3D] Ratio (min): 0.800499
[CuboidGeometry3D] (max): 1.24922
[CuboidGeometry3D] Nodes (min): 41216400
[CuboidGeometry3D] (max): 41319441
[CuboidGeometry3D] Weight (min): 41216400
[CuboidGeometry3D] (max): 41319441
[CuboidGeometry3D] ——————————–
[SuperGeometryStatistics3D] materialNumber=1; count=82329759; minPhysR=(0,0,0.0006525); maxPhysR=(0.2088,0.2088,0.521348)
[SuperGeometryStatistics3D] materialNumber=2; count=103041; minPhysR=(0,0,0); maxPhysR=(0.2088,0.2088,0)
[SuperGeometryStatistics3D] materialNumber=3; count=103041; minPhysR=(0,0,0.522); maxPhysR=(0.2088,0.2088,0.522)
[SuperGeometryStatistics3D] countTotal[1e6]=82.5358 -
AuthorPosts
- You must be logged in to reply to this topic.