SDSC Expanse CPU

2FA/MFA

Expanse is a supercomputing cluster managed by SDSC. Expanse contains installs and modules for commonly used packages in bioinformatics, molecular dynamics, machine learning, quantum chemistry, structural mechanics, and visualization, and will continue to support Singularity-based containerization. Expanse also provides composable software allowing you to treat the hardware like building blocks. You are capable of bundling RAM, software containers such as Kubernetes, and processors into a “virtual cluster” customized for your project. You are also able to same that composition and re-use or tweak it later. Expanse will also feature direct scheduler integration with the major cloud providers, leveraging high-speed networks to ease data movement to and from the cloud.

Login to Expanse CPU

There are two methods for logging into Expanse. The first is through their web based user portal, the others with SSH login nodes. To log into the web based portal, you will need to login with your ACCESS credentials. Having an allocation for Expanse is a pre cursor to accessing their system, and you can obtain an allocation through ACCESS. If you are logging in through the web based portal, you'll be taken to a Globus login page and you must select ACCESS CI as your organization and you can use your ACCESS credentials.

To access the system through the SSH login nodes you must first set up 2FA with Expanse. Once you have downloaded your 2FA app of choice ( Google Authenticator, DUO Mobile App, LastPass Authenticator App, etc ) you go to this website https://passive.sdsc.edu/. You then can login with Globus using your ACCESS credentials. Then click the button labeled “Manage 2FA” and complete the steps to register with them. Now you can then close the webpage, though the changes may take 15 minutes to reflect on the system.

The login with ssh is broken up into 2 steps.

You are first prompted to enter a passkey, this should be your ACCESS portal passkey.
You will then be prompted to enter another passkey, this should be your TOTP number from your authenticator app.

There is a way to bypass the first step and go straight to TOTP. If you have a ssh key loaded in Expanse and the corresponding key in your ssh-agent you will go straight to TOTP.

To add a key you can append your public key to your ~/.ssh/authorized_keys file to enable access from authorized hosts without having to enter your password. They accept RSA, ECDSA and ed25519 keys. Make sure you have a strong passphrase on the private key on your local machine.

You can use ssh-agent or keychain to avoid repeatedly typing the private key password.
Hosts which connect to SSH more frequently than ten times per minute may get blocked for a short period of time

SSH Login

$ ssh <your_username>@login.expanse.sdsc.edu

ACCESS OnDemand Login

How do I use ACCESS OnDemand?

File Transfer

Data Movement

Globus: SDSC Collections, Data Movers and Mount Points

All of Expanse's Lustre filesystems are accessible via the SDSC Expanse specific collections(SDSC HPC - Expanse Lustre ; *SDSC HPC - Projects) . The following table shows the mount points on the data mover nodes (that are the backend for ).

Machine	Location on machine	Location on Globus/Data Movers
*Expanse	`/expanse/projects`	/
Expanse	`/expanse/lustre/projects`	`/projects/...`
Expanse	`/expanse/lustre/scratch`	`/scratch/...`

They also provide an ability to download and upload files in our directory directly within the webpage / online portal.

It is also possible to use scp to transfer files from your machine to your Expanse node. You do this with a regular SCP command, with the address being [username]@login.expanse.sdsc.edu:path_to_file. You use your ACCESS username in place of username, and will have to go through the login process to complete the transfer.

Supported Methods	Data Transfer Node	URL

Storage

File System

Directory	Path	Quota	Purge	Backup	Notes
Scratch Lustre	/expanse/lustre/scratch	10 TB	90 days after allocation expiration.	No backups stored.	This is not an archival file system, it is not backed up, and will be purged according to purge policy.
Scratch Compute Node	/scratch/$USER/job_$SLURM_JOB_ID	1 TB			Users only have access to these SSDs during job execution at the local file system path to the compute node.

Jobs

The job charge for a compute node is 128 SU per hour run ( 128 cores in one node x 1 hour = 128 SU ).

Each standard compute node has ~256 GB of memory and 128 cores

Each standard node core will be allocated 1 GB of memory, users should explicitly include the --mem directive to request additional memory
Max. available memory per compute node --mem = 249208M

Requesting interactive resources using srun

You can request an interactive session using the srun command. The following example will request one regular compute node, 4 cores, in the debug partition for 30 minutes.

srun --partition=debug  --pty --account=<<project>> --nodes=1 --ntasks-per-node=4 \
    --mem=8G -t 00:30:00 --wait=0 --export=ALL /bin/bash

Expanse uses the Simple Linux Utility for Resource Management (SLURM) batch environment. When you run in the batch mode, you submit jobs to be run on the compute nodes using the sbatch command as described below. Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes.

Expanse places limits on the number of jobs queued and running on a per group (allocation) and partition basis. Please note that submitting a large number of jobs (especially very short ones) can impact the overall scheduler response for all users. If you are anticipating submitting a lot of jobs, please contact the SDSC consulting staff before you submit them. We can work to check if there are bundling options that make your workflow more efficient and reduce the impact on the scheduler.

The limits for each partition are noted in the table below. Partition limits are subject to change based on Early User Period evaluation.

Partition Name	Max Walltime	Max Nodes/Job	Max Running Jobs	Max Running + Queued Jobs	Charge Factor	Notes
compute	48 hrs	32	32	64	1	Exclusive access to regular compute nodes; limit applies per group
ind-compute	48 hrs	32	16	32	1	Exclusive access to Industry compute nodes; limit applies per group
shared	48 hrs	1	4096	4096	1	Single-node jobs using fewer than 128 cores
ind-shared	48 hrs	1	2048	2048	1	Single-node Industry jobs using fewer than 128 cores
gpu	48 hrs	4	4	8 (32 Tres GPU)	1	Used for exclusive access to the GPU nodes
ind-gpu	48 hrs	2	4	4 (8 Tres GPU)	1	Exclusive access to the Industry GPU nodes
nairr-gpu	48 hrs	4	4	8 (32 Tres GPU)	1	Exclusive access to the NAIRR GPU nodes
gpu-shared	48 hrs	1	24	24 (24 Tres GPU)	1	Single-node job using fewer than 4 GPUs
ind-gpu-shared	48 hrs	1	24	24 (24 Tres GPU)	1	Single-node job using fewer than 4 Industry GPUs
nairr-gpu-shared	48 hrs	1	16	16(16 Tres GPU)	1	Single-node job using fewer than 4 NAIRR GPUs
large-shared	48 hrs	1	1	4	1	Single-node jobs using large memory up to 2 TB (minimum memory required 256G)
debug	30 min	2	1	2	1	Priority access to shared nodes set aside for testing of jobs with short walltime and limited resources
gpu-debug	30 min	2	1	2	1	Priority access to gpu-shared nodes set aside for testing of jobs with short walltime and limited resources; max two gpus per job
preempt	7 days	32		128	.8	Non-refundable discounted jobs to run on free nodes that can be pre-empted by jobs submitted to any other queue
gpu-preempt	7 days	1		24 (24 Tres GPU)	.8	Non-refundable discounted jobs to run on unallocated nodes that can be pre-empted by higher priority queues

Requesting interactive resources using srun

You can request an interactive session using the srun command. The following example will request one regular compute node, 4 cores, in the debug partition for 30 minutes.

srun --partition=debug  --pty --account=<<project>> --nodes=1 --ntasks-per-node=4 \
    --mem=8G -t 00:30:00 --wait=0 --export=ALL /bin/bash

The following example will request a GPU node, 10 cores, 1 GPU and 96G in the debug partition for 30 minutes. To ensure the GPU environment is properly loaded, please be sure run both the module purge and module restore commands.

login01$ srun --partition=gpu-debug --pty --account=<<project>> --ntasks-per-node=10 \
    --nodes=1 --mem=96G --gpus=1 -t 00:30:00 --wait=0 --export=ALL /bin/bash

srun: job 1336890 queued and waiting for resources
srun: job 1336890 has been allocated resources
exp-7-59$ module purge
exp-7-59$ module restore

Resetting modules to system default. Resetting $MODULEPATH back to system default.
    All extra directories will be removed from $MODULEPATH.

Submitting Jobs Using sbatch

Jobs can be submitted to the sbatch partitions using the sbatch command as follows:

 sbatch jobscriptfile

where jobscriptfile is the name of a UNIX format file containing special statements (corresponding to sbatch options), resource specifications and shell commands. Several example SLURM scripts are given below:

BASIC MPI JOB

#!/bin/bash
#SBATCH --job-name="hellompi"
#SBATCH --output="hellompi.%j.%N.out"
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=128
#SBATCH --mem=0 
#SBATCH --account=<<project*>>
#SBATCH --export=ALL
#SBATCH -t 01:30:00

#This job runs with 2 nodes, 128 cores per node for a total of 256 tasks.

module purge
module load cpu
#Load module file(s) into the shell environment
module load gcc
module load mvapich2
module load slurm

srun --mpi=pmi2 -n 256 ../hello_mpi

Queue specifications

Name	Purpose	CPUs	GPUs	RAM	Jobs 30 days	Wait Time 30-day trend	Wall Time 30-day trend
Expanse Compute Node	Compute Node Usage	128 AMD EPYC 7742		256 GB DDR4 DRAM	—	—	—

Datasets

Name	Description
OceanTopography	OpenTopography provides efficient, user-friendly access to high-resolution topography data, processing tools, and resources to advance understanding of the Earth's surface, vegetation, and built environment.
OpenAltimetry	OpenAltimetry is a web based data visualization and discovery tool for exploring surface elevation profiles over time using satellite altimetry data from NASA's ICESat and ICESat-2 missions.
OpenForest4D	OpenForest4D is a web-based platform that leverages multi-source remote sensing data and artificial intelligence to generate on-demand, research-grade estimates of forest structure and above-ground biomass in four dimensions for global forest monitoring.