Voyager is an innovative AI system designed specifically for science and engineering research at scale. Voyager is focused on supporting research in science and engineering that is increasingly dependent upon artificial intelligence and deep learning as a critical element in the experimental and/or computational work.
Login to SDSC Voyager AI System
Voyager uses ssh key pairs for access.
Approved users will need to send their ssh public key to consult [at] sdsc.edu (consult[at]sdsc[dot]edu) to gain access to the system.
To log in to Voyager from the command line, use the hostname:
login.voyager.sdsc.edu
The following are examples of Secure Shell (ssh) commands that may be used to log in:
ssh <your_username>@login.voyager.sdsc.edu ssh -l <your_username> login.voyager.sdsc.edu
Notes and hints
- Voyager will not maintain local passwords, your public key will need to be appended to your ~/.ssh/authorized_keys file to enable access from authorized hosts. We accept RSA, ECDSA and ed25519 keys. Make sure you have a strong passphrase on the private key on your local machine.
- You can use ssh-agent or keychain to avoid repeatedly typing the private key password.
- Hosts which connect to SSH more frequently than ten times per minute may get blocked for a short period of time
- Do not use the login node for computationally intensive processes, as hosts for running workflow management tools, as primary data transfer nodes for large or numerous data transfers or as servers providing other services accessible to the Internet. The login nodes are meant for file editing, simple data analysis, and other tasks that use minimal compute resources. All computationally demanding jobs should be run using kubernetes.
MFA
Voyager does not maintain local passwords and relies entirely on SSH key pairs. Keys must be appended to ~/.ssh/authorized_keys. RSA, ECDSA, and ed25519 keys are accepted, and a strong passphrase on the private key is required.
SSH Login
$ ssh <your_username>@ssh <your_username>@login.voyager.sdsc.edu
File Transfer
| Supported Methods | Data Transfer Node | URL |
|---|---|---|
| GLOBUS (COMING SOON) | https://www.sdsc.edu/systems/voyager/user_guide.html#narrow-wysiwyg-5 |
Storage
File System
| Directory | Path | Quota | Purge | Backup | Notes |
|---|---|---|---|---|---|
| home | /home/username | 200GB | - | /home and /voyager/projects file system ARE NOT backed up | The home directory is limited in space and should be used only for source code storage. User will have access to 200GB in /home. Users should keep usage on $HOME under 200GB. |
| scratch | - | - | - | Users are responsible for backing up all important data to protect against data loss at SDSC. | The compute nodes on Voyager have access to fast flash storage. The latency to the SSDs is several orders of magnitude lower than that for spinning disk (<100 microseconds vs. milliseconds) making them ideal for user-level check pointing and applications |
| ceph | /voyager/ceph/users/username | 3 PB | - | System is NOT backed up | Every Voyager node has access to a 3 PB Ceph parallel file system, 140 GB/second performance storage. ( /voyager/ceph/user/$USER) IS NOT an archival file system |
| projects | /voyager/projects/project/username | 153TB | - | Users are responsible for backing up all important data to protect against data loss at SDSC. | NSF mounted project space |
External Storage
- Ceph Parallel File System (
/voyager/ceph): This is the primary high-performance storage for large datasets. It is "external" in that it resides on a dedicated storage cluster accessible by all nodes. - Project Storage (NFS): Shared storage used for collaborative projects, providing a single scalable namespace accessible across multiple SDSC systems.
- Home Directory (
/home): Persistent network storage for source code and small files, limited to 200 GB.
Jobs
Voyager runs Kubernetes. Kubernetes is an open-source platform for managing containerized workloads and services. A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. The application workloads are executed by placing containers into Pods to run on nodes. The resources required by the Pods are specified in YAML files.
For computer, inference, or gaudi examples: Basic Jobs
Queue specifications
| Name | Purpose | CPUs | GPUs | RAM | Jobs
30 days
|
Wait Time
30-day trend
|
Wall Time
30-day trend
|
|---|---|---|---|---|---|---|---|
| inference | Dedicated for Habana Gaudi model inference, utilizing 2 first-generation nodes. | 2 | - | 3.2TB | — | — | — |
| gaudi | Designed for high-performance AI training using 42 Intel Habana Gaudi nodes, each with 8 training processors | 2 | - | 6.4TB | — | — | — |
| compute | Includes 36 Intel x86 nodes for general-purpose pre/post-data processing. | 2 | - | 3.2TB | — | — | — |