/
Sydney GPU Cluster

Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

Sydney GPU Cluster

Coming Soon!

How to get access

Access model coming soon! Fill in the expression of interest form.

Contact sih.info@sydney.edu.au with any questions.

Inference end points

Coming soon!

Raw GPU Access

http://Run.ai

Machine Specs

Operating System

Job Scheduler & Cluster Management:

  • Kubernetes (K8s)

    • Three master nodes for managing the cluster

    • DGX nodes function as Kubernetes worker nodes

  • Run:AI

    • Installed on Kubernetes for workload scheduling and AI workload orchestration

    • Includes namespaces, secrets, backend, and API server configurations

GPU nodes

3 x NVIDIA DGX H200 with:

  • 8 x H200 per node.

  • 1128 GB VRAM per node.

CPU nodes

5 x Dell PowerEdge R760 Server with:

  • Primary and Secondary Head Nodes with High-Availability configuration for failover protection

  • Three master nodes for managing Kubernetes

  • 2x Intel Xeon Gold 5418Y 2GHz, 24Cores/48Threads

  • 16x 32GB RDIMM, 5600MT/s, Dual Rank

  • BOSS-N1 controller card + with 2 M.2 480GB (RAID 1)

  • 6.4TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 with carrier

  • Broadcom 57416 Dual Port 10GbE BASE-T Adapter, OCP NIC 3.0

  • Mellanox ConnectX-6 DX Dual Port 100GbE QSFP56 Network Adapter

Storage

  • DDN EXAScaler Parallel File System 1 PB

    • Provides high-performance shared storage to DGX nodes

    • Connected via 8x 200GbE Active Optical Cables

1 x Controller: DDN ES400NVX2-NDR200-SE with:

Networking

  • Cumulus Linux

    • Runs on NVIDIA Spectrum-2 200GbE switches

    • Supports MLAG, VLANs, and routing configurations

  • InfiniBand Networking for high-speed GPU-to-GPU communication

  • Out-of-Band (OOB) Management

Related content