/
Automating data transfers with dtq

Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

Automating data transfers with dtq

dtq can be used to schedule data transfers and compute jobs that run sequentially. For example, you could set up jobs that do the following:

  1. Submit a dtq job to copy data to Artemis for processing.
  2. Submit a processing job that automatically runs after the dtq job successfully completes.
  3. Upon successful completion of the processing job:
    1. Have the processing job copy the resultant data to another location on Artemis; or
    2. If the result of processing is to be copied to a destination that is not accessible from the compute nodes, then submit another dtq job to copy this data to its remote location.
  4. Optionally, delete the data on Artemis that was used for processing.

This entire process can be automated using three PBS scripts: copy-in.pbs, process-data.pbs and copy-out.pbs:

copy-in.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q dtq
#PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00

rsync -avxP /rds/PRJ-PANDORA/input_data /scratch/PANDORA/

process-data.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q defaultQ
#PBS -l select=1:ncpus=4:mem=10gb,walltime=20:00:00

cd /scratch/PANDORA/input_data
my-program < input.inp > output.out

copy-out.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q dtq
#PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00

rsync -avxP /scratch/PANDORA/input_data/ /rds/PRJ-PANDORA/output_data

Then, you can submit these three scripts (using the -W depend=afterok option) to the scheduler as follows:

[abcd1234@login1]$ qsub copy-in.pbs
1260945.pbsserver
[abcd1234@login1]$ qsub -W depend=afterok:1260945 process-data.pbs
1260946.pbsserver
[abcd1234@login1]$ qsub -W depend=afterok:1260945:1260946 copy-out.pbs

If successful, your jobs should look similar to this if you type qstat -u abcd1234:

[abcd1234@login1]$ qstat -u abcd1234
pbsserver:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time

--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----

1260945.pbsserv abcd1234 dtq      copy-in.pb    --    1   1 4096mb 01:00 R   --
1260946.pbsserv abcd1234 small    process-da    --    1   1    2gb 01:00 H   --
1260947.pbsserv abcd1234 dtq      copy-out.p    --    1   1 4096mb 01:00 H   --

Note that process-data.pbs and copy-out.pbs are both in the H state, which means they’re being “held” by the scheduler until the previous jobs have successfully completed.