User Tools

Site Tools


users:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
users:slurm [2016/03/31 18:08] – [Running 4 MPI jobs on one node] ecaloreusers:slurm [2016/11/07 13:56] (current) – [Running 16 MPI job using 16 GPUs on one node] ecalore
Line 31: Line 31:
 The examples below assume you are submitting the job from the same directory your program is located in, otherwise you need to give the full path. The examples below assume you are submitting the job from the same directory your program is located in, otherwise you need to give the full path.
  
-==== Running 4 MPI jobs on one node ====+<WRAP center round important 80%> 
 +All jobs by default will run on the //shortrun// [[users:partitions|partition]] which is limited to 30min. If needed different partitions have to be specified.  
 +</WRAP> 
 + 
 +==== Running an OpenMP job on one node ==== 
 + 
 +In this script we request to launch 1 task (1 process), which will use 4 cores (4 threads).
  
 <code bash> <code bash>
Line 38: Line 44:
 #SBATCH --error=my_job_name-%j.err #SBATCH --error=my_job_name-%j.err
 #SBATCH --output=my_job_name-%j.out #SBATCH --output=my_job_name-%j.out
-#SBATCH --ntasks 4+#SBATCH --ntasks=1 
 +#SBATCH --cpus-per-task=4 
 +#SBATCH --mem-per-cpu=100 
 + 
 +export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
 + 
 +./my_program 
 + 
 +</code> 
 + 
 +==== Running 4 MPI job on one node ==== 
 + 
 +In this script we request 4 tasks, all of them in the same node. 
 +We also load the openmpi [[users:software|module]]. 
 + 
 +<WRAP center round important 80%> 
 +Remember that in this case, if //my_program// is not an MPI program, //ntasks// instances (i.e. 4 in this case) of the same program will be run. 
 +</WRAP> 
 + 
 +<code bash> 
 +#!/bin/bash 
 +#SBATCH --job-name=my_job_name 
 +#SBATCH --error=my_job_name-%j.err 
 +#SBATCH --output=my_job_name-%j.out 
 +#SBATCH --ntasks=4
 #SBATCH --ntasks-per-node=4 #SBATCH --ntasks-per-node=4
-#SBATCH --time=00:30:00 
  
 module load openmpi module load openmpi
Line 47: Line 76:
 </code> </code>
  
-==== Running 32 MPI jobs on two nodes ====+==== Running 32 MPI job on two nodes ==== 
 + 
 +In this script we request 32 tasks, 16 of them in each node (i.e. we are requesting 2 nodes). We also load the openmpi [[users:software|module]]. 
  
 <code bash> <code bash>
 #!/bin/bash #!/bin/bash
-#SBATCH --ntasks 32+#SBATCH --ntasks=32
 #SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
-#SBATCH --time=00:30:00 
  
 module load openmpi module load openmpi
Line 60: Line 90:
 </code> </code>
  
-==== Running 16 MPI jobs using 16 GPUs on one node ====+==== Running 16 MPI job using 16 GPUs on one node ====
  
 In this script we request 16 GPUs and 16 tasks, all of them in the same node. Moreover we request the job to be enqueued in the //longrun// [[users:partitions|partition]]. In this script we request 16 GPUs and 16 tasks, all of them in the same node. Moreover we request the job to be enqueued in the //longrun// [[users:partitions|partition]].
 We also load the cuda and openmpi [[users:software|modules]]. We also load the cuda and openmpi [[users:software|modules]].
  
-Remember that SLURM will decide which GPUs to reserve for you, thus is your program duty to select the correct device IDs, otherwise GPUs could not be accessed. The list of the reserved device IDs is in the //CUDA_VISIBLE_DEVICES// environment variable. In this example we pass this variable as an argument to our program.+Remember that SLURM will decide which GPUs to reserve for you, thus is your program duty to select the correct device IDs, otherwise GPUs could not be accessed. The list of the reserved device IDs is in the [[https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/|$CUDA_VISIBLE_DEVICES]] environment variable. If you are developing your own software, we suggest to read this variable from your code (e.g. in C using the [[http://linux.die.net/man/3/getenv|getenv()]] function).
  
 <code bash> <code bash>
Line 71: Line 101:
 #SBATCH --error=gpu-test-%j.err #SBATCH --error=gpu-test-%j.err
 #SBATCH --output=gpu-test-%j.out #SBATCH --output=gpu-test-%j.out
-#SBATCH --ntasks 16+#SBATCH --ntasks=16
 #SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
 #SBATCH --partition=longrun #SBATCH --partition=longrun
Line 79: Line 109:
 module load openmpi module load openmpi
  
-module list +srun ./my_program
- +
-srun echo $CUDA_VISIBLE_DEVICES+
 </code> </code>
 +
 +===== Additional Examples =====
 +
 +You can find additional generic examples here:
 +
 +[[https://www.hpc2n.umu.se/batchsystem/examples_scripts|High Performance Computing Center North]]
 +
 +[[http://www.ceci-hpc.be/slurm_faq.html|Consortium des Équipements de Calcul Intensif]]
  
 ===== Meaning of the most common options ===== ===== Meaning of the most common options =====
users/slurm.1459447715.txt.gz · Last modified: 2016/03/31 18:08 by ecalore