AI Institutes Cyberinfrastructure

Members get updates about announcements, events, and outages.

AI Institutes Cyberinfrastructure

ai, machine-learning, gpu, python

Gathering place for AI researchers to find curated information about using ACCESS resources for AI applications and research.

NCSA Delta GPU (Delta GPU)

The Delta GPU resource comprises 4 different node configurations intended to support accelerated computation across a broad range of domains such as soft-matter physics, molecular dynamics, replica-exchange molecular dynamics, machine learning, deep learning, natural language processing, textual analysis, visualization, ray tracing, and accelerated analysis of very large in-memory datasets. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta GPU resource capacity is predominately provided by 200 single-socket nodes, each configured with 1 AMD EPYC 7763 (“Milan”) processor with 64-cores/socket (64-cores/node) at 2.45GHz and 256GB of DDR4-3200 RAM. Half of these single-socket GPU nodes (100 nodes) are configured with 4 NVIDIA A100 GPUs with 40GB HBM2 RAM and NVLink (400 total A100 GPUs); the remaining half (100 nodes) are configured with 4 NVIDIA A40 GPUs with 48GB GDDR6 RAM and PCIe 4.0 (400 total A40 GPUs). Rounding out the GPU resource is 6 additional large-memory, “dense” GPU nodes, containing 8 GPUs each, in a dual-socket CPU configuration (128-cores per node) and 2TB of DDR4-3200 RAM but otherwise configured similarly to the single-socket GPU nodes. Within the “dense” GPU nodes, 5 nodes employ NVIDIA A100 GPUs (40 total A100 GPUs in “dense” configuration) and 1 node employs AMD MI100 GPUs (8 total MI100 GPUs) with 32GB HBM2 RAM. A 1.6TB, NVMe solid-state disk is available for use as local scratch space during job execution on each GPU node type. All Delta GPU compute nodes are interconnected to each other and to the Delta storage resource by a 100 Gb/sec HPE Slingshot network fabric. A Delta GPU allocation grants access to all types of Delta GPU nodes (both GPUs and CPUs on those nodes). The Delta CPU allocation is needed to utilize the CPU-only nodes.

Purdue Anvil GPU

Purdue's Anvil GPU cluster is comprised of 16 GPU nodes (each with 128 cores, 256 GB of memory, and four NVIDIA A100 Tensor Core GPUs) providing 1.5 PF of single-precision performance to support machine learning and artificial intelligence applications. All CPU cores are AMD's "Milan" architecture running at 2.0 GHz, and all nodes are interconnected using a 100 Gbps HDR Infiniband fabric. Scratch storage consists of a 10+ PB parallel filesystem with over 3 PB of flash drives. Storage for active projects is provided by Purdue's Research Data Depot, and data archival is available via Purdue's Fortress tape archive. The operating system is CentOS 8, and the batch scheduling system is Slurm.

Indiana Jetstream2 GPU

Jetstream2 is a user-friendly cloud environment designed to give researchers and students access to computing and data analysis resources on demand as well as for gateway and other infrastructure projects. This is for the GPU-specific Jetstream2 resources only. Jetstream2 GPU is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. This particular portion of the resource is allocated separately from the primary resource and contains 360 NVIDIA A100 GPUs -- 4 GPUs per node, 128 AMD Milan cores, and 512gb RAM connected by 100gbps ethernet to the spine.

PSC Bridges-2 GPU-AI (Bridges-2 GPU Artificial Intelligence)

Bridges-2 Accelerated GPU (GPU) nodes are optimized for scalable artificial intelligence (AI; deep learning). Bridges-2 GPU nodes each contain 8 NVIDIA Tesla V100-32GB SXM2 GPUs, providing 40,960 CUDA cores and 5,120 tensor cores. In addition, each node holds 2 Intel Xeon Gold 6248 CPUs; 512GB of DDR4-2933 RAM; and 7.68TB NVMe SSD. They are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to enhance scalability of deep learning training. Bridges-2 GPU AI resource will also contain 9 HPE 8-Volta servers along with a NVIDIA DGX-2.

SDSC Expanse GPU

Expanse GPU will be a Dell integrated cluster, NVIDIA V100 GPUs with NVLINK, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. There are a total of 52 nodes with four V100 SMX2 GPUs per node (with NVLINK connectivity). There are two 20-core Xeon 6248 CPUs per node. Full bisection bandwidth will be available at rack level (52 CPU nodes, 4 GPU nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and there will be 3:1 oversubscription cross-rack. In addition, Expanse also has four 2 TB large memory nodes. The system will also feature 12PB of Lustre based performance storage (140GB/s aggregate), and 7PB of Ceph based object storage.

CI Links

Announcements

There are no announcements at this time. Please check back later or visit the Announcements page.