Expanse is a supercomputing cluster managed by SDSC. Expanse contains installs and modules for commonly used packages in bioinformatics, molecular dynamics, machine learning, quantum chemistry, structural mechanics, and visualization, and will continue to support Singularity-based containerization. Expanse also provides composable software allowing you to treat the hardware like building blocks. You are capable of bundling RAM, software containers such as Kubernetes, and processors into a “virtual cluster” customized for your project. You are also able to same that composition and re-use or tweak it later. Expanse will also feature direct scheduler integration with the major cloud providers, leveraging high-speed networks to ease data movement to and from the cloud.
Login to Expanse CPU
There are two methods for logging into Expanse. The first is through their web based user portal, the others with SSH login nodes. To log into the web based portal, you will need to login with your ACCESS credentials. Having an allocation for Expanse is a pre cursor to accessing their system, and you can obtain an allocation through ACCESS. If you are logging in through the web based portal, you'll be taken to a Globus login page and you must select ACCESS CI as your organization and you can use your ACCESS credentials.
To access the system through the SSH login nodes you must first set up 2FA with Expanse. Once you have downloaded your 2FA app of choice ( Google Authenticator, DUO Mobile App, LastPass Authenticator App, etc ) you go to this website https://passive.sdsc.edu/. You then can login with Globus using your ACCESS credentials. Then click the button labeled “Manage 2FA” and complete the steps to register with them. Now you can then close the webpage, though the changes may take 15 minutes to reflect on the system.
The login with ssh is broken up into 2 steps.
- You are first prompted to enter a passkey, this should be your ACCESS portal passkey.
- You will then be prompted to enter another passkey, this should be your TOTP number from your authenticator app.
There is a way to bypass the first step and go straight to TOTP. If you have a ssh key loaded in Expanse and the corresponding key in your ssh-agent you will go straight to TOTP.
To add a key you can append your public key to your ~/.ssh/authorized_keys file to enable access from authorized hosts without having to enter your password. They accept RSA, ECDSA and ed25519 keys. Make sure you have a strong passphrase on the private key on your local machine.
- You can use ssh-agent or keychain to avoid repeatedly typing the private key password.
- Hosts which connect to SSH more frequently than ten times per minute may get blocked for a short period of time
SSH Login
$ ssh <your_username>@login.expanse.sdsc.edu
File Transfer
Data Movement
Globus: SDSC Collections, Data Movers and Mount Points
All of Expanse's Lustre filesystems are accessible via the SDSC Expanse specific collections(SDSC HPC - Expanse Lustre ; *SDSC HPC - Projects) . The following table shows the mount points on the data mover nodes (that are the backend for ).
| Machine | Location on machine | Location on Globus/Data Movers |
|---|---|---|
| *Expanse | /expanse/projects | / |
| Expanse | /expanse/lustre/projects | /projects/... |
| Expanse | /expanse/lustre/scratch | /scratch/... |
They also provide an ability to download and upload files in our directory directly within the webpage / online portal.
It is also possible to use scp to transfer files from your machine to your Expanse node. You do this with a regular SCP command, with the address being [username]@login.expanse.sdsc.edu:path_to_file. You use your ACCESS username in place of username, and will have to go through the login process to complete the transfer.
| Supported Methods | Data Transfer Node | URL |
|---|
Storage
File System
| Directory | Path | Quota | Purge | Backup | Notes |
|---|---|---|---|---|---|
| Scratch Lustre | /expanse/lustre/scratch | 10 TB | 90 days after allocation expiration. | No backups stored. | This is not an archival file system, it is not backed up, and will be purged according to purge policy. |
| Scratch Compute Node | /scratch/$USER/job_$SLURM_JOB_ID | 1 TB | Users only have access to these SSDs during job execution at the local file system path to the compute node. |
Jobs
The job charge for a compute node is 128 SU per hour run ( 128 cores in one node x 1 hour = 128 SU ).
Each standard compute node has ~256 GB of memory and 128 cores
- Each standard node core will be allocated 1 GB of memory, users should explicitly include the
--memdirective to request additional memory - Max. available memory per compute node
--mem = 249208M
Requesting interactive resources using srun
You can request an interactive session using the srun command. The following example will request one regular compute node, 4 cores, in the debug partition for 30 minutes.
srun --partition=debug --pty --account=<<project>> --nodes=1 --ntasks-per-node=4 \
--mem=8G -t 00:30:00 --wait=0 --export=ALL /bin/bash
Expanse uses the Simple Linux Utility for Resource Management (SLURM) batch environment. When you run in the batch mode, you submit jobs to be run on the compute nodes using the sbatch command as described below. Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes.
Expanse places limits on the number of jobs queued and running on a per group (allocation) and partition basis. Please note that submitting a large number of jobs (especially very short ones) can impact the overall scheduler response for all users. If you are anticipating submitting a lot of jobs, please contact the SDSC consulting staff before you submit them. We can work to check if there are bundling options that make your workflow more efficient and reduce the impact on the scheduler.
The limits for each partition are noted in the table below. Partition limits are subject to change based on Early User Period evaluation.
| Partition Name | Max Walltime | Max Nodes/Job | Max Running Jobs | Max Running + Queued Jobs | Charge Factor | Notes |
|---|---|---|---|---|---|---|
| compute | 48 hrs | 32 | 32 | 64 | 1 | Exclusive access to regular compute nodes; limit applies per group |
| ind-compute | 48 hrs | 32 | 16 | 32 | 1 | Exclusive access to Industry compute nodes; limit applies per group |
| shared | 48 hrs | 1 | 4096 | 4096 | 1 | Single-node jobs using fewer than 128 cores |
| ind-shared | 48 hrs | 1 | 2048 | 2048 | 1 | Single-node Industry jobs using fewer than 128 cores |
| gpu | 48 hrs | 4 | 4 | 8 (32 Tres GPU) | 1 | Used for exclusive access to the GPU nodes |
| ind-gpu | 48 hrs | 2 | 4 | 4 (8 Tres GPU) | 1 | Exclusive access to the Industry GPU nodes |
| nairr-gpu | 48 hrs | 4 | 4 | 8 (32 Tres GPU) | 1 | Exclusive access to the NAIRR GPU nodes |
| gpu-shared | 48 hrs | 1 | 24 | 24 (24 Tres GPU) | 1 | Single-node job using fewer than 4 GPUs |
| ind-gpu-shared | 48 hrs | 1 | 24 | 24 (24 Tres GPU) | 1 | Single-node job using fewer than 4 Industry GPUs |
| nairr-gpu-shared | 48 hrs | 1 | 16 | 16(16 Tres GPU) | 1 | Single-node job using fewer than 4 NAIRR GPUs |
| large-shared | 48 hrs | 1 | 1 | 4 | 1 | Single-node jobs using large memory up to 2 TB (minimum memory required 256G) |
| debug | 30 min | 2 | 1 | 2 | 1 | Priority access to shared nodes set aside for testing of jobs with short walltime and limited resources |
| gpu-debug | 30 min | 2 | 1 | 2 | 1 | Priority access to gpu-shared nodes set aside for testing of jobs with short walltime and limited resources; max two gpus per job |
| preempt | 7 days | 32 | 128 | .8 | Non-refundable discounted jobs to run on free nodes that can be pre-empted by jobs submitted to any other queue | |
| gpu-preempt | 7 days | 1 | 24 (24 Tres GPU) | .8 | Non-refundable discounted jobs to run on unallocated nodes that can be pre-empted by higher priority queues |
Requesting interactive resources using srun
You can request an interactive session using the srun command. The following example will request one regular compute node, 4 cores, in the debug partition for 30 minutes.
srun --partition=debug --pty --account=<<project>> --nodes=1 --ntasks-per-node=4 \
--mem=8G -t 00:30:00 --wait=0 --export=ALL /bin/bash
The following example will request a GPU node, 10 cores, 1 GPU and 96G in the debug partition for 30 minutes. To ensure the GPU environment is properly loaded, please be sure run both the module purge and module restore commands.
login01$ srun --partition=gpu-debug --pty --account=<<project>> --ntasks-per-node=10 \
--nodes=1 --mem=96G --gpus=1 -t 00:30:00 --wait=0 --export=ALL /bin/bashsrun: job 1336890 queued and waiting for resources srun: job 1336890 has been allocated resources exp-7-59$ module purge exp-7-59$ module restore
Resetting modules to system default. Resetting $MODULEPATH back to system default.
All extra directories will be removed from $MODULEPATH.
Submitting Jobs Using sbatch
Jobs can be submitted to the sbatch partitions using the sbatch command as follows:
sbatch jobscriptfile
where jobscriptfile is the name of a UNIX format file containing special statements (corresponding to sbatch options), resource specifications and shell commands. Several example SLURM scripts are given below:
BASIC MPI JOB
#!/bin/bash #SBATCH --job-name="hellompi" #SBATCH --output="hellompi.%j.%N.out" #SBATCH --partition=compute
#SBATCH --nodes=2 #SBATCH --ntasks-per-node=128
#SBATCH --mem=0
#SBATCH --account=<<project*>>
#SBATCH --export=ALL #SBATCH -t 01:30:00 #This job runs with 2 nodes, 128 cores per node for a total of 256 tasks.
module purge
module load cpu
#Load module file(s) into the shell environment
module load gcc
module load mvapich2
module load slurm
srun --mpi=pmi2 -n 256 ../hello_mpi
Queue specifications
| Name | Purpose | CPUs | GPUs | RAM | Jobs
30 days
|
Wait Time
30-day trend
|
Wall Time
30-day trend
|
|---|---|---|---|---|---|---|---|
| Expanse Compute Node | Compute Node Usage | 128 AMD EPYC 7742 | 256 GB DDR4 DRAM | — | — | — |
Datasets
| Name | Description |
|---|---|
| OceanTopography | OpenTopography provides efficient, user-friendly access to high-resolution topography data, processing tools, and resources to advance understanding of the Earth's surface, vegetation, and built environment. |
| OpenAltimetry | OpenAltimetry is a web based data visualization and discovery tool for exploring surface elevation profiles over time using satellite altimetry data from NASA's ICESat and ICESat-2 missions. |
| OpenForest4D | OpenForest4D is a web-based platform that leverages multi-source remote sensing data and artificial intelligence to generate on-demand, research-grade estimates of forest structure and above-ground biomass in four dimensions for global forest monitoring. |