Artemis uses a modified version of PBS Pro to manage and queue jobs. Some minimal example scripts are provided below.

To submit these scripts to the job scheduler, save them to a text file, and submit them using the "qsub" command.

Serial jobs (1 core)

An example single core script:

#!/bin/bash
#PBS -P PANDORA                  
#PBS -l select=1:ncpus=1:mem=4GB  
#PBS -l walltime=10:00:00       

cd "$PBS_O_WORKDIR"
my_program > results.out

Save this script to a file (for example MyScript.pbs), then then submit it to the PBS scheduler using qsub:

[abcd1234@login1 ~] qsub MyScript.pbs

See the following table for a line-by-line explanation of the above PBS script.

Line	Explanation
#!/bin/bash	Your shell. This is safe to use this as-is.
#PBS -P PANDORA	Your short project name, as specified in your DashR project. Replace PANDORA with your short project name.
#PBS -l select=1:ncpus=1:mem=4GB	Select one chunk of compute resources with 1 core and 4 GB memory
#PBS -l walltime=10:00:00	Length of time to reserve requested compute resources
cd "$PBS_O_WORKDIR"	changes directory to where you ran qsub
my_program > results.out	Any commands you need to run. Replace this line with commands you require.

there are other #PBS directives that you may optionally use. For a full list, see the PBS Professional user manual.

Parallel jobs using OpenMP

OpenMP is a shared memory parallelism model, so OpenMP parallelised programs can only be run on one chunk (a “chunk” is, effectively, a compute node) with up to 24, 32 or 64 cores, depending on the node.

#!/bin/bash
#PBS -P PANDORA                       
#PBS -l select=1:ncpus=4:mem=4GB    
#PBS -l walltime=10:00:00

cd "$PBS_O_WORKDIR"
my_program > results.out

OMP_NUM_THREADS is automatically set to the number of cores per chunk on Artemis, so you don’t have to manually set the value of OMP_NUM_THREADS in your PBS script.

Parallel jobs using MPI

MPI (Message Passing Interface) allows jobs to communicate across several nodes. A basic MPI job script is shown below.

#!/bin/bash
#PBS -P PANDORA                             
#PBS -l select=1:ncpus=4:mpiprocs=4:mem=4GB  
#PBS -l walltime=10:00:00             

cd "$PBS_O_WORKDIR"
module load mpich                 
module load siesta            
mpirun -np 4 siesta < h2o.fdf > h2o.out

The above job script requests one chunk of compute resources with 4 cores and 4 GB of memory. All 4 cores have been made available to MPI as specified by the mpiprocs=4 option. It is recommended to set ncpus and mpiprocs to the same value, unless you know you can set them differently.

Load the correct MPI implementation for your program. Most modules will automatically load the appropriate MPI implementation. However, you can load your own MPI implementation if you are compiling your own MPI programs or if you know you can use an alternative MPI implementation.

With MPI, you can request more than one chunk of compute resources:

#!/bin/bash
#PBS -P PANDORA                     
#PBS -l select=5:ncpus=8:mpiprocs=8:mem=4GB   
#PBS -l walltime=10:00:00          

cd "$PBS_O_WORKDIR"
module load mpich                    
module load siesta                         
mpirun -np 40 siesta < h2o.fdf > h2o.out

This script requests 5 chunks with 8 cores (all cores are available to MPI) and 4 GB of memory per chunk, meaning your job will be allocated 40 cores and 20 GB of memory in total.

GPU jobs

Single node GPU jobs

To run a job requesting a GPU, request ngpus=1 in your select statement. Your job will be automatically routed to the GPU queue.

#!/bin/bash
#PBS -P Project
#PBS -l select=1:ncpus=9:ngpus=1:mem=30GB
#PBS -l walltime=10:00:00
#PBS -j oe

module load cuda/9.1.85

cd "$PBS_O_WORKDIR"
/path/to/my_cuda_executable arg1 arg2 > output.log

Multi-node GPU jobs

If you have a GPU program that has been compiled with MPI and CUDA directives on Artemis, you can run this program using more than 4 GPUs across multiple nodes. Programs that would benefit from having more than 4 GPUs are ones whose bottleneck is computation on the GPUs themselves. If your bottleneck is GPU-GPU communication, you will achieve better performance by using 4 GPUs on a single compute node.

The below script will request 8 GPUs across two compute nodes. Note that mpiprocs is set to the number of GPUs per node, not the number of CPUs per node and that we need to use a CUDA-aware MPI library (openmpip-gcc/3.0.0-cuda in this case).

#!/bin/bash
#PBS -P Project
#PBS -l select=2:ncpus=36:ngpus=4:mpiprocs=4:mem=185GB
#PBS -l walltime=10:00:00
#PBS -j oe

module load mpich
module load openmpi-gcc/3.0.0-cuda
module load cuda/9.1.85

cd "$PBS_O_WORKDIR"
/path/to/my_cuda_executable arg1 arg2 > output.log

Identifying GPUs available to your job

You can only use GPUs assigned to your job. Attempts to use GPUs not assigned to your job will fail. If your job cannot identify available GPUs on its own, the following information may help identify available GPUs.

For CUDA applications

Only the assigned GPU(s) should be visible from the CUDA API.
Accessible GPUs will be numbered sequentially from zero to N, where N is one less than the total number of GPUs requested in all chunks on the node.
If you need to set CUDA_VISIBLE_DEVICES, you can do so with the following code snippet:

NGPUS=`grep -c '^c 195:[0-9] rwm' /cgroup/devices/pbspro/$PBS_JOBID/devices.list`
CUDA_VISIBLE_DEVICES=`seq -s, 0 $(($NGPUS - 1))`

For non-CUDA applications

Special files for all GPUs will be visible under /dev.
All /dev files will appear as a+rw, but only the assigned GPU(s) will have an open() succeed.
Should a non-CUDA application need to be told which GPUs it should use, the following shell code snippet may be used as an example for determining this:

GPUDEVS=`sed -n 's#^c 195:\([0-9]\) rwm$#/dev/nvidia\1#g p' /cgroup/devices/pbspro/$PBS_JOBID/devices.list`