/
Automating data transfers with dt-script

Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

Automating data transfers with dt-script

A data transfer script, called dt-script, has been created to help simplify data transfer and submitting compute jobs on the transferred data. This tool is especially useful if you need to transfer large amounts of data from /wiki/spaces/RC/pages/229113857 to Artemis before processing it.

The syntax of dt-script is:

dt-script -P <project> -f <from> -t <to> -j <job.pbs>

This script uses rsync to copy from the source directory <from> to the destination directory <to> and then submits the PBS job script contained in job.pbs that will start once the copy successfully completes.

The arguments this script accepts are shown in the following table:

Short argument

Long argument

Description

-f

–from

The source of the data

-t

–to

The destination of the data

-P <project>

–project <project>

All pbs jobs require a project to run under. Specify it here.

-notest

–skip

Skip testing of readable source. Useful if called from a node and /rds is the source which is not available on the calling node

-w <walltime>

–walltime <walltime>

Wall Time Required (Default 24:00:00 (1 day))

-ncpus <ncpus>

–ncpus <ncpus>

CPU cores required (Default 1)

-mem <mem>

–mem <mem>

RAM Required (default 4GB )

-n

–name

Set the copy job name (default “dt-script”)

-rf <rsync extra flags>

–rflags <rsync extra flags>

Any extra rsync flags you may require

-j <pbs job script>

–job <pbs job script>

The pbs job script to run after the copy. if no job script is specified, then no subsequent job is run

-jf <flags>

–jobflags <flags>

Any extra ‘qsub’ flags that may be needed to run the pbs job script specified with –job

-d <depend option>

–depend <depend option>

The default depend option is “afterok”. You may change this to “afterany” or “afternotok” with this option

-l <logfile>

–log <logfile>

Rather than wait for the PBS output files, you may specify a log of stdout and stderr from the rsync command here

The script returns the PBS job ID of the last job it submits as follows:

  • If no PBS job script is specified, the PBS job ID of the dtq job is returned and may be used as a dependency of subsequent jobs.
  • If a PBS job script is specified, the PBS job ID of that job is returned.

The source code of dt-script is available to all Artemis users. The path to the script is /usr/local/bin/dt-script. Feel free to make a copy of this script if you would like to modify it for your needs.

Example dt-script workflow

An example dt-script workflow, using the example project PANDORA, is shown below:

[abcd1234@login2]$ dt-script -P PANDORA \
--from /rds/PRJ-PANDORA/mydata \
--to /scratch/PANDORA/ \
--job /scratch/PANDORA/run-processing.pbs
1261577.pbsserver
[abcd1234@login2]$ qstat -u abcd1234

pbsserver:

Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1261576.pbsserv abcd1234 dtq      dt-script     --    1   1    4gb 24:00 Q   --
1261577.pbsserv abcd1234 small    process-da    --    1   1    2gb 01:00 H   --

After verifying the processing job ran successfully, you can transfer data back to RCOS using another dt-script command:

[abcd1234@login2]$ dt-script -P PANDORA \
--from /scratch/PANDORA/mydata/ \
--to /rds/PRJ-PANDORA/mydata_output
1261588.pbsserver
[abcd1234@login2]$ qstat -u abcd1234

pbsserver:
                                                             Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1261588.pbsserv abcd1234 dtq      dt-script     --    1   1    4gb 24:00 Q   --

Finally, you can remove any temporary data from Artemis after checking all data was successfully transferred to RCOS:

[abcd1234@login2]$ rm /scratch/PANDORA/mydata/*
[abcd1234@login2]$ rmdir /scratch/PANDORA/mydata