Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au
Array jobs
PBS array jobs are a convenient way of running many computations with a single script. They are particularly suitable for jobs which need to be run many times, but with different parameters or on different input files. To submit an array job, you need to include this PBS directive in your PBS script:
#PBS -J start_index-end_index:increment
For example:
#PBS -J 1-10
will start an array job with 10 elements. Each job will be assigned a single value of $PBS_ARRAY_INDEX
from 1 to 10. The array indices must be integers. Another example is:
#PBS -J 1-10:2
This directive will start an array job containing 5 elements, with values of $PBS_ARRAY_INDEX
of 1, 3, 5, 7, and 9.
Two common uses of array jobs are to run a program multiple times with different input parameters or different input files. Some example array job PBS scripts are shown below.
Single Parameter Change
This script will start an array job with 10 elements, each element will be assigned a different value of $PBS_ARRAY_INDEX
from 1 to 10. In this case, $PBS_ARRAY_INDEX
is used directly as a command line argument to the program called my_program
.
#!/bin/bash #PBS -P PANDORA #PBS -l select=1:ncpus=1:mem=4GB #PBS -l walltime=10:20:10 #PBS -J 1-10 cd $PBS_O_WORKDIR my_program $PBS_ARRAY_INDEX > output.$PBS_ARRAY_INDEX
Multiple Parameter Changes
This script will run the program my_program
with two parameters. $PBS_ARRAY_INDEX
is only a single parameter; therefore, you will need a dictionary or lookup table to convert $PBS_ARRAY_INDEX
indices to multiple parameters.
In the following example, all parameters required for the array job are stored in a file called job_params
. The array job will read the line corresponding to $PBS_ARRAY_INDEX
, and then pass these parameters to my_program
.
An example job_params
file:
1 0.1 2.4524 0.2 3.45 0.3 4.4 0.4
and an example PBS script called array-script.pbs
:
#!/bin/bash #PBS -P PANDORA #PBS -l select=1:ncpus=1:mem=4GB #PBS -l walltime=15:20:10 #PBS -J 1-4 cd $PBS_O_WORKDIR params=`sed "${PBS_ARRAY_INDEX}q;d" job_params` param_array=( $params ) my_program ${param_array[0]} ${param_array[1]}
Multiple input files
A third common usage of array jobs is to run the same program on multiple input files. The example here will run computations on all files in a directory. This script first saves a list of files in the directory to a file called file_list
. The number of files in this list is then counted and saved to a variable called num_files
. A PBS array script called run_array_job
, which runs my_program
with each file in the directory as input, is written to disk. At the end of the script, the script submits run_array_job
to the scheduler using the qsub
command.
#!/bin/bash rm file_list find . -maxdepth 1 -type f | sed 's/.\///' > file_list num_files=`cat file_list | wc -l | tr -d ' '` cat > run_array_job << EOF #!/bin/bash #PBS -P PANDORA #PBS -l select=1:ncpus=1:mem=4GB #PBS -l walltime=15:20:10 #PBS -J 1-${num_files} cd \$PBS_O_WORKDIR filename=\`sed "\${PBS_ARRAY_INDEX}q;d" file_list\` my_program < \$filename > \${filename}.output EOF qsub run_array_job