10/07/25 - 2:30 PM - 5:00 PM EDT
Location
Virtual - Zoom
This short course (2.5 hours) will cover the basic concepts and fundamentals of High Performance Computing (HPC), Artificial Intelligence (AI) and why HPC is important for AI. Participants will learn to use scikit-learn and PyTorch libraries to build, train, and evaluate machine learning and deep learning models in JupyterLab on a HPC cluster - ACES. We will also cover distributed training strategies with a focus on PyTorch Distributed Data Parallel (DDP). Through hands-on exercises, we will progress step by step: starting from CPU-based training, moving to a single GPU, scaling up to multiple GPUs on a single node, and finally extending to multi-node distributed training.
Learn more at https://hprc.tamu.edu/training/aces_ai4faculty.html