/
Array jobs

Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

Array jobs

PBS array jobs are a convenient way of running many computations with a single script. They are particularly suitable for jobs which need to be run many times, but with different parameters or on different input files. To submit an array job, you need to include this PBS directive in your PBS script:

#PBS -J start_index-end_index:increment

For example:

#PBS -J 1-10

will start an array job with 10 elements. Each job will be assigned a single value of $PBS_ARRAY_INDEX from 1 to 10. The array indices must be integers. Another example is:

#PBS -J 1-10:2

This directive will start an array job containing 5 elements, with values of $PBS_ARRAY_INDEX of 1, 3, 5, 7, and 9.

Two common uses of array jobs are to run a program multiple times with different input parameters or different input files. Some example array job PBS scripts are shown below.

Single Parameter Change

This script will start an array job with 10 elements, each element will be assigned a different value of $PBS_ARRAY_INDEX from 1 to 10. In this case, $PBS_ARRAY_INDEX is used directly as a command line argument to the program called my_program.

#!/bin/bash
#PBS -P PANDORA
#PBS -l select=1:ncpus=1:mem=4GB
#PBS -l walltime=10:20:10
#PBS -J 1-10

cd $PBS_O_WORKDIR
my_program $PBS_ARRAY_INDEX > output.$PBS_ARRAY_INDEX

Multiple Parameter Changes

This script will run the program my_program with two parameters. $PBS_ARRAY_INDEX is only a single parameter; therefore, you will need a dictionary or lookup table to convert $PBS_ARRAY_INDEX indices to multiple parameters.

In the following example, all parameters required for the array job are stored in a file called job_params. The array job will read the line corresponding to $PBS_ARRAY_INDEX, and then pass these parameters to my_program.

An example job_params file:

1 0.1
2.4524 0.2
3.45 0.3
4.4 0.4

and an example PBS script called array-script.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -l select=1:ncpus=1:mem=4GB
#PBS -l walltime=15:20:10
#PBS -J 1-4

cd $PBS_O_WORKDIR
params=`sed "${PBS_ARRAY_INDEX}q;d" job_params`
param_array=( $params )
my_program ${param_array[0]} ${param_array[1]}

Multiple input files

A third common usage of array jobs is to run the same program on multiple input files. The example here will run computations on all files in a directory. This script first saves a list of files in the directory to a file called file_list. The number of files in this list is then counted and saved to a variable called num_files. A PBS array script called run_array_job, which runs my_program with each file in the directory as input, is written to disk. At the end of the script, the script submits run_array_job to the scheduler using the qsub command.

#!/bin/bash

rm file_list
find . -maxdepth 1 -type f | sed 's/.\///' > file_list
num_files=`cat file_list | wc -l | tr -d ' '`

cat > run_array_job << EOF
#!/bin/bash
#PBS -P PANDORA
#PBS -l select=1:ncpus=1:mem=4GB
#PBS -l walltime=15:20:10
#PBS -J 1-${num_files}

cd \$PBS_O_WORKDIR
filename=\`sed "\${PBS_ARRAY_INDEX}q;d" file_list\`
my_program < \$filename > \${filename}.output
EOF
qsub run_array_job