Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au
Automating data transfers with dt-script
A data transfer script, called dt-script
, has been created to help simplify data transfer and submitting compute jobs on the transferred data. This tool is especially useful if you need to transfer large amounts of data from /wiki/spaces/RC/pages/229113857 to Artemis before processing it.
The syntax of dt-script
is:
dt-script -P <project> -f <from> -t <to> -j <job.pbs>
This script uses rsync
to copy from the source directory <from>
to the destination directory <to>
and then submits the PBS job script contained in job.pbs
that will start once the copy successfully completes.
The arguments this script accepts are shown in the following table:
Short argument | Long argument | Description |
---|---|---|
-f | –from | The source of the data |
-t | –to | The destination of the data |
-P <project> | –project <project> | All pbs jobs require a project to run under. Specify it here. |
-notest | –skip | Skip testing of readable source. Useful if called from a node and /rds is the source which is not available on the calling node |
-w <walltime> | –walltime <walltime> | Wall Time Required (Default 24:00:00 (1 day)) |
-ncpus <ncpus> | –ncpus <ncpus> | CPU cores required (Default 1) |
-mem <mem> | –mem <mem> | RAM Required (default 4GB ) |
-n | –name | Set the copy job name (default “dt-script”) |
-rf <rsync extra flags> | –rflags <rsync extra flags> | Any extra rsync flags you may require |
-j <pbs job script> | –job <pbs job script> | The pbs job script to run after the copy. if no job script is specified, then no subsequent job is run |
-jf <flags> | –jobflags <flags> | Any extra ‘qsub’ flags that may be needed to run the pbs job script specified with –job |
-d <depend option> | –depend <depend option> | The default depend option is “afterok”. You may change this to “afterany” or “afternotok” with this option |
-l <logfile> | –log <logfile> | Rather than wait for the PBS output files, you may specify a log of stdout and stderr from the rsync command here |
The script returns the PBS job ID of the last job it submits as follows:
- If no PBS job script is specified, the PBS job ID of the dtq job is returned and may be used as a dependency of subsequent jobs.
- If a PBS job script is specified, the PBS job ID of that job is returned.
The source code of dt-script
is available to all Artemis users. The path to the script is /usr/local/bin/dt-script
. Feel free to make a copy of this script if you would like to modify it for your needs.
Example dt-script workflow
An example dt-script workflow, using the example project PANDORA
, is shown below:
[abcd1234@login2]$ dt-script -P PANDORA \ --from /rds/PRJ-PANDORA/mydata \ --to /scratch/PANDORA/ \ --job /scratch/PANDORA/run-processing.pbs 1261577.pbsserver [abcd1234@login2]$ qstat -u abcd1234 pbsserver: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1261576.pbsserv abcd1234 dtq dt-script -- 1 1 4gb 24:00 Q -- 1261577.pbsserv abcd1234 small process-da -- 1 1 2gb 01:00 H --
After verifying the processing job ran successfully, you can transfer data back to RCOS using another dt-script
command:
[abcd1234@login2]$ dt-script -P PANDORA \ --from /scratch/PANDORA/mydata/ \ --to /rds/PRJ-PANDORA/mydata_output 1261588.pbsserver [abcd1234@login2]$ qstat -u abcd1234 pbsserver: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1261588.pbsserv abcd1234 dtq dt-script -- 1 1 4gb 24:00 Q --
Finally, you can remove any temporary data from Artemis after checking all data was successfully transferred to RCOS:
[abcd1234@login2]$ rm /scratch/PANDORA/mydata/* [abcd1234@login2]$ rmdir /scratch/PANDORA/mydata