SDSC Voyager Habana Training and Inference Processor based AI System

2FA/MFA RP account needed

Voyager is an innovative AI system designed specifically for science and engineering research at scale. Voyager is focused on supporting research in science and engineering that is increasingly dependent upon artificial intelligence and deep learning as a critical element in the experimental and/or computational work.

Ask about SDSC Voyager AI System

File Transfer

Supported Methods Data Transfer Node URL
GLOBUS (COMING SOON) https://www.sdsc.edu/systems/voyager/user_guide.html#narrow-wysiwyg-5

Storage

File System

Directory Path Quota Purge Backup Notes
home /home/username 200GB - /home and /voyager/projects file system ARE NOT backed up The home directory is limited in space and should be used only for source code storage. User will have access to 200GB in /home. Users should keep usage on $HOME under 200GB.
scratch - - - Users are responsible for backing up all important data to protect against data loss at SDSC. The compute nodes on Voyager have access to fast flash storage. The latency to the SSDs is several orders of magnitude lower than that for spinning disk (<100 microseconds vs. milliseconds) making them ideal for user-level check pointing and applications
ceph /voyager/ceph/users/username 3 PB - System is NOT backed up Every Voyager node has access to a 3 PB Ceph parallel file system, 140 GB/second performance storage. ( /voyager/ceph/user/$USER) IS NOT an archival file system
projects /voyager/projects/project/username 153TB - Users are responsible for backing up all important data to protect against data loss at SDSC. NSF mounted project space

External Storage

  • Ceph Parallel File System (/voyager/ceph): This is the primary high-performance storage for large datasets. It is "external" in that it resides on a dedicated storage cluster accessible by all nodes.
  • Project Storage (NFS): Shared storage used for collaborative projects, providing a single scalable namespace accessible across multiple SDSC systems.
  • Home Directory (/home): Persistent network storage for source code and small files, limited to 200 GB.

Jobs

Voyager runs Kubernetes. Kubernetes is an open-source platform for managing containerized workloads and services. A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. The application workloads are executed by placing containers into Pods to run on nodes. The resources required by the Pods are specified in YAML files. 

For computer, inference, or gaudi examples: Basic Jobs

Queue specifications

Name Purpose CPUs GPUs RAM Jobs
30 days
Wait Time
30-day trend
Wall Time
30-day trend
inference Dedicated for Habana Gaudi model inference, utilizing 2 first-generation nodes. 2 - 3.2TB
gaudi Designed for high-performance AI training using 42 Intel Habana Gaudi nodes, each with 8 training processors 2 - 6.4TB
compute Includes 36 Intel x86 nodes for general-purpose pre/post-data processing. 2 - 3.2TB