Submitting a Job with SLURM

The workload manager adopted in the COKA cluster is SLURM. The suggested way to submit a job using SLURM is to prepare a Job submission file, also known as job file or submission script. The main purpose of this file is to ask for resources from the job scheduler and to indicate the program to be run. It can be used also to set environment variables, compile your code, etc.

It is recommended to ask only for the needed resources; asking for more resources will likely lead your job to wait a longer time before being scheduled.

After the creation of the Job submission file (e.g. lets say my_job_file.slurm) you can submit it with the following command:

sbatch my_job_file.slurm

In this script all the lines starting with #SBATCH will be evaluated by SLURM, while all the other lines will be interpreted as in an ordinary shell script.

Remember that all the environment variables set when you submit the job file will be inherited by the environment where your program will run, unless you specify “#SBATCH –export=NONE”. See examples below.

To query the status of your jobs you can use this SLURM command:

squeue

Examples

After logging into the front-end node, as described in Users Login, in your home directory, you can create a Job submission file, for each of your programs, such the ones reported in the later examples.

The examples below assume you are submitting the job from the same directory your program is located in, otherwise you need to give the full path.

All jobs by default will run on the shortrun partition which is limited to 30min. If needed different partitions have to be specified.

Running an OpenMP job on one node

In this script we request to launch 1 task (1 process), which will use 4 cores (4 threads).

#!/bin/bash
#SBATCH --job-name=my_job_name
#SBATCH --error=my_job_name-%j.err
#SBATCH --output=my_job_name-%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=100
 
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
./my_program

Running 4 MPI job on one node

In this script we request 4 tasks, all of them in the same node. We also load the openmpi module.

Remember that in this case, if my_program is not an MPI program, ntasks instances (i.e. 4 in this case) of the same program will be run.

#!/bin/bash
#SBATCH --job-name=my_job_name
#SBATCH --error=my_job_name-%j.err
#SBATCH --output=my_job_name-%j.out
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
 
module load openmpi
 
srun ./my_program

Running 32 MPI job on two nodes

In this script we request 32 tasks, 16 of them in each node (i.e. we are requesting 2 nodes). We also load the openmpi module.

#!/bin/bash
#SBATCH --ntasks=32
#SBATCH --ntasks-per-node=16
 
module load openmpi
 
srun ./my_program

Running 16 MPI job using 16 GPUs on one node

In this script we request 16 GPUs and 16 tasks, all of them in the same node. Moreover we request the job to be enqueued in the longrun partition. We also load the cuda and openmpi modules.

Remember that SLURM will decide which GPUs to reserve for you, thus is your program duty to select the correct device IDs, otherwise GPUs could not be accessed. The list of the reserved device IDs is in the $CUDA_VISIBLE_DEVICES environment variable. If you are developing your own software, we suggest to read this variable from your code (e.g. in C using the getenv() function).

#SBATCH --job-name=gpu-test
#SBATCH --error=gpu-test-%j.err
#SBATCH --output=gpu-test-%j.out
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=16
#SBATCH --partition=longrun
#SBATCH --gres=gpu:16
 
module load cuda
module load openmpi
 
srun ./my_program

Additional Examples

You can find additional generic examples here:

High Performance Computing Center North

Consortium des Équipements de Calcul Intensif

Meaning of the most common options

Specify the number of tasks to be spawned by the application; one can think of it like a process (or MPI process):

#SBATCH --ntasks=$(NUMBER_OF_TASKS)

Specify how many cores will be used for running each task (process); for example, imagine your application uses pthreads and each process spawns 2 threads. Then, if you want to map each thread in a different core, you should set this value to 2:

#SBATCH --cpus-per-task=$(CPUS_PER_TASK)

Specify the partition where to run the job. You can check which partitions are available for your user with the sinfo command:

#SBATCH --partition=$(PARTITION)

Specify the name of the job:

#SBATCH --job-name=$(JOB_NAME)

Specify the file where the stderr will be redirected; if not set, stderr will be redirected to stdout:

#SBATCH --error=err/$(JOB_NAME)-%j.err

Specify the file where the stdout will be redirected; if not set, stdout will be redirected to a file called slurm-$(JOB_ID).out. This placed will be placed at the folder from you submitted the job:

#SBATCH --output=out/$(JOB_NAME)-%j.out

Specify the directory from where your job script will start the execution:

#SBATCH --workdir=$(PATH_TO_BINARIES)

Specify to SLURM the number of GPUs needed to run your application (e.g. OpenCL or CUDA applications):

#SBATCH --gres=gpu:$(NUMBER_OF_GPUs)

Table of Contents