Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au
GPU Cluster
Sydney GPU Cluster - Early access available
Researchers are able to get early access to the Sydney GPU cluster through the Sydney Informatics Hub.
To onboard your research group, please fill out this onboarding form
For enquries and support please use the https://sydney.au1.qualtrics.com/jfe/form/SV_5mXyhFZsPIwZDBs
Standard Acknowledgement:
The authors acknowledge the use of the Sydney GPU Cluster, a service provided by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney.
Run:ai Interface
Early access available now! To get onboard, you will need to provide Sydney Informatics Hub with your Dashr Portal research project code, and PC/RC account codes for usage beyond any testing allocation, and we will manually add you to the system.
User onboarding guide for USyd researchers
University of Sydney researchers can use the cluster through NVIDIA Run:ai, (documentation available here).
Soon: automated role-based access control via your research project on the university’s Dashr Portal. (ETA: May 2026)
This provides:
Interactive workspaces for data exploration, visualisation, analysis, Jupyter notebooks and others.
AI model inference workloads to predict on your data across models from Huggingface, NVIDIA NIM, or any custom model.
Custom workloads for other GPU-accelerated use cases such as
Structural and synthetic biology workflows e.g. AlphaFold3, ProteinMPNN etc.
Numerical modelling and simulation for physics, engineering, chemistry
Decision Optimisation e.g. Linear Programming using NVIDIA CuOpt
Transcription and translation of audio and text data
Omics sequencing pipelines making use of libraries such as NVIDIA Clara Parabricks
Model training workflows for training and fine-tuning AI models and ML models.
Services
We will progressively be rolling out managed services backed by compute on the GPU cluster. These include:
Now available
Jupyter and Marimo Notebooks
JupyterLab notebooks (for python, R, Julia) and
Marimo notebooks and apps (for python) are now available on the GPU cluster.
These enable interactive use of the cluster for researchers in a web interface.
Gradio Apps
Gradio Apps (in python), including those you might find on hugginface spaces are able to run on the Sydney GPU Cluster as well
Coming Soon
OpenWebUI
An interactive chat platform with OpenWebUI, enabling live analysis and AI code execution.
LabelStudio
A data labelling platform with Label Studio, for images, text, video, audio or other data types, and backed by AI assisted labelling models running on the cluster e.g. Segment Anything 2.
Globus Data Transfer
Triton Inference Server
Cluster Specifications
DGX H200 GPU nodes
3 x NVIDIA DGX H200 nodes with 8 x H200 per node, for a total of 24 H200 GPUs.
| H200 GPUs | VRAM (Gb) | CPU cores | RAM (Gb) | Disk (Tb) | Compute | Service Units (SU) |
|---|---|---|---|---|---|---|---|
Per chunk | (1 / 14) th | 10.07 | 1 | 19.6 | 0.3 | 2.4 | 36 |
Per H200 GPU | 1 | 141 | 14 | 275 | 3.8 | 34 | 504 |
Per DGX Node (8x H200 GPUs) | 8 | 1128 | 112 | 2200 | 30.7 | 272 | 4032 |
Whole cluster | 24 | 3384 | 336 | 6600 | 92.2 | 816 | 12096 |
Storage
DDN EXAScaler Parallel File System - 1 Petabyte
Provides high-performance shared storage to DGX nodes
Connected via 8x 200GbE Active Optical Cables
1 x Controller: DDN ES400NVX2-NDR200-SE with:
2 x SE2420-EBOD NVMe Expansion Enclosures with 24 NVMe drives each
Total 48 x 30.72TB QLC NVMe G4 4K SSD drive
Networking
Cumulus Linux
Runs on NVIDIA Spectrum-2 200GbE switches
Supports MLAG, VLANs, and routing configurations
InfiniBand Networking for high-speed GPU-to-GPU communication
Out-of-Band (OOB) Management
Operating System
NVIDIA Base Command Manager with DGX OS 7
Installed on head nodes, Kubernetes master nodes, and DGX nodes
Includes built-in LDAP authentication
Job Scheduler & Cluster Management:
Three master nodes for managing the cluster
DGX nodes function as Kubernetes worker nodes
Installed on Kubernetes for workload scheduling and AI workload orchestration
Includes namespaces, secrets, backend, and API server configurations