Comprehensive training guide
See the training guide with Sydney specific information for using NCI systems and services:
https://sydney-informatics-hub.github.io/usyd-gadi-onboarding-guide/
Job Submission
NCI Gadi uses the same job scheduler as Artemis but a more modern version (PBSpro 2024.1.1 vs PBSPro_13.1.0). Configuration and user experience options are fairly similar with some slight modifications.
Gadi
Code Block |
---|
#!/bin/bash #PBS -P PANDORA #PBS -l ncpus=1 #PBS –l mem=4GB #PBS -l walltime=10:00:00 module load program cd "$PBS_O_WORKDIR” my_program |
Artemis
Code Block |
---|
#!/bin/bash #PBS -P PANDORA #PBS -l select=1:ncpus=1:mem=4GB #PBS -l walltime=10:00:00 module load program cd "$PBS_O_WORKDIR” my_program |
Storage
Storage options for. “/scratch” on gadi is essentially “unlimited”, but has an aggressive deletion policy for unused data. You can increase your /scratch quota by contacting help@nci.org.au. For more persistent storage you can use /g/data/<project> directory. Quota increases are done by the Scheme Manager.
NCI Gadi
Code Block |
---|
/scratch/<NCIproject> /g/data/<NCIproject> |
Artemis
Code Block |
---|
/scratch/<RDSproject> /project/<RDSproject> |
Connect to Sydney Research Data Storage (RDS)
NCI Gadi
Code Block |
---|
sftp <unikey>@research-data-ext.sydney.edu.au:/rds/PRJ-<project> |
Artemis
Code Block |
---|
/rds/PRJ-<project> |
Walltime
All queues on Gadi have at-most a 48 hour walltime in contrast to 21 days for Artemis. This is primarily for easier resource sharing and prevention of wasted compute time (if a node fails or a job is not behaving as expected). Tips for running jobs in a short walltime environment.
Enable checkpointing in your software.
Break long running jobs into shorter chunks of work.
Make use of dependent compute jobs (-W depend=afterok:jobid).
Internet Access
Compute nodes on Gadi do not have access to the internet.
Use copyq.
Use ARE jobs.
Job Arrays
PBS job arrays (#PBS -J 1-10): these are not permitted on Gadi. Other means of parallel task execution are required. An example using OpenMPI and the custom utility ‘nci-parallel’ are demonstrated with this example parallel job.