User Tools

Site Tools


users:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
users:slurm [2016/04/01 13:28] – [Examples] ecaloreusers:slurm [2016/11/07 13:56] (current) – [Running 16 MPI job using 16 GPUs on one node] ecalore
Line 68: Line 68:
 #SBATCH --error=my_job_name-%j.err #SBATCH --error=my_job_name-%j.err
 #SBATCH --output=my_job_name-%j.out #SBATCH --output=my_job_name-%j.out
-#SBATCH --ntasks 4+#SBATCH --ntasks=4
 #SBATCH --ntasks-per-node=4 #SBATCH --ntasks-per-node=4
  
Line 82: Line 82:
 <code bash> <code bash>
 #!/bin/bash #!/bin/bash
-#SBATCH --ntasks 32+#SBATCH --ntasks=32
 #SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
  
Line 95: Line 95:
 We also load the cuda and openmpi [[users:software|modules]]. We also load the cuda and openmpi [[users:software|modules]].
  
-Remember that SLURM will decide which GPUs to reserve for you, thus is your program duty to select the correct device IDs, otherwise GPUs could not be accessed. The list of the reserved device IDs is in the //CUDA_VISIBLE_DEVICES// environment variable. In this example we pass this variable as an argument to our program.+Remember that SLURM will decide which GPUs to reserve for you, thus is your program duty to select the correct device IDs, otherwise GPUs could not be accessed. The list of the reserved device IDs is in the [[https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/|$CUDA_VISIBLE_DEVICES]] environment variable. If you are developing your own software, we suggest to read this variable from your code (e.g. in C using the [[http://linux.die.net/man/3/getenv|getenv()]] function).
  
 <code bash> <code bash>
Line 101: Line 101:
 #SBATCH --error=gpu-test-%j.err #SBATCH --error=gpu-test-%j.err
 #SBATCH --output=gpu-test-%j.out #SBATCH --output=gpu-test-%j.out
-#SBATCH --ntasks 16+#SBATCH --ntasks=16
 #SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
 #SBATCH --partition=longrun #SBATCH --partition=longrun
Line 109: Line 109:
 module load openmpi module load openmpi
  
-srun ./my_program $CUDA_VISIBLE_DEVICES+srun ./my_program
 </code> </code>
 +
 +===== Additional Examples =====
 +
 +You can find additional generic examples here:
 +
 +[[https://www.hpc2n.umu.se/batchsystem/examples_scripts|High Performance Computing Center North]]
 +
 +[[http://www.ceci-hpc.be/slurm_faq.html|Consortium des Équipements de Calcul Intensif]]
  
 ===== Meaning of the most common options ===== ===== Meaning of the most common options =====
users/slurm.1459517295.txt.gz · Last modified: 2016/04/01 13:28 by ecalore