Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

HPC University

HPC University Resources

A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.

debugging hpc-operations professional-development training workforce-development compiling matlab python r mpi

3 Likes

Type

learning

Level

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

jetstream matlab cloud-computing data-analysis performance-tuning parallelization file-transfer globus slurm training cuda matlab python r mpi

1 Like

Type

learning

Level

Attention, Transformers, and LLMs: a hands-on introduction in Pytorch

This workshop focuses on developing an understanding of the fundamentals of attention and the transformer architecture so that you can understand how LLMs work and use them in your own projects.

ai deep-learning machine-learning neural-networks pytorch

1 Like

Type

learning

Level

Using Linux commands in a python script (and the difference between the subprocess and os python modules)

Using Linux Commands in a Python Script

Learn how to use Linux commands in a python script. Specifically, learn how to use the subprocess and os modules in python to run shell commands (which run Linux commands) in a python script that is run on a cluster.

cluster-management programming python

1 Like

Type

learning

Level

Introduction to Deep Learning in Pytorch

This workshop series introduces the essential concepts in deep learning and walks through the common steps in a deep learning workflow from data loading and preprocessing to training and model evaluation. Throughout the sessions, students participate in writing and executing simple deep learning programs using Pytorch – a popular Python library for developing, training, and deploying deep learning models.

ai deep-learning image-processing machine-learning neural-networks pytorch gpu

1 Like

Type

learning

Level

File management of Visual Studio Code on clusters

VS Code installation

Visual Studio Code, commonly known as VSCode, is a popular tool used by programmers worldwide. It serves as a text editor and an Integrated Development Environment (IDE) that supports a wide variety of programming languages. One of its key features is its extensive library of extensions. These extensions add on to the basic functionalities of VSCode, making coding more efficient and convenient. However, there's a catch. When these extensions are installed and used frequently, they generate a multitude of files. These files are typically stored in a folder named .vscode-extension within your home directory. On a cluster computing facility such as the FASTER and Grace clusters at Texas A&M University, there's a limitation on how many files you can have in your home directory. For instance, the file number limit could be 10000, while the .vscode-extension directory can hold around 4000 temporary files even with just a few extensions. Thus, if the number of files in your home directory surpasses this limit due to VSCode extensions, you might face some issues. This restriction can discourage users from taking full advantage of the extensive features and extensions offered by the VSCode editor. To overcome this, we can shift the .vscode-extension directory to the scratch space. The scratch space is another area in the cluster where you can store files and it usually has a much higher limit on the number of files compared to the home directory. We can perform this shift smoothly using a feature called symbolic links (or symlinks for short). Think of a symlink as a shortcut or a reference that points to another file or directory located somewhere else. Here's a step-by-step guide on how to move the .vscode-extension directory to the scratch space and create a symbolic link to it in your home directory: 1. Copy the .vscode-extension directory to the scratch space: Using the cp command, you can copy the .vscode-extension directory (along with all its contents) to the scratch space. Here's how: cp -r ~/.vscode-extension /scratch/user Don't forget to replace /scratch/user with the actual path to your scratch directory. 2. Remove the original .vscode-extension directory: Once you've confirmed that the directory has been copied successfully to the scratch space, you can remove the original directory from your home space. You can do this using the rm command: rm -r ~/.vscode-extension It's important to make sure that the directory has been copied to the scratch space successfully before deleting the original. 3. Create a symbolic link in the home directory: Lastly, you'll create a symbolic link in your home directory that points to the .vscode-extension directory in the scratch space. You can do this as follows: ln -s /scratch/user/.vscode-extension ~/.vscode-extension By following this process, all the files generated by VSCode extensions will be stored in the scratch space. This prevents your home directory from exceeding its file limit. Now, when you access ~/.vscode-extension, the system will automatically redirect you to the directory in the scratch space, thanks to the symlink. This method ensures that you can use VSCode and its various extensions without worrying about hitting the file limit in your home directory.

faster file-limit scratch file-transfer

0 Likes

Type

learning

Level

Biopython Tutorial

The Biopython Tutorial and Cookbook website is a dedicated online resource for users in the field of computational biology and bioinformatics. It provides a collection of tutorials and practical examples focused on using the Biopython library. The website offers a series of tutorials that cover various aspects of Biopython, catering to users with different levels of expertise. It also includes code snippets and examples, and common solutions to common challenges in computational biology.

bioinformatics genomics python

0 Likes

Type

learning

Level

NCSA HPC-Moodle

NCSA HPC-Moodle

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.

training workforce-development

0 Likes

Type

learning

Level

Understanding LLM Fine-tuning

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.

big-data training

0 Likes

Type

learning

Level

NERSC Training and Tutorials

A comprehensive collection of NERSC developed training and tutorial events, offered on regular schedules. All sessions are archived, including slide decks, video recordings, and software examples as are available. Some examples of past training and tutorial topics are listed below Deep Learning for Sciences Webinar Series BerkeleyGW Tutorial Workshop VASP Trainings Timemory Software Monitoring Tutorial, April 2021 HPCToolkit to Measure and Analyzing GPU Applications Performance Tutorial Totalview Tutorial NVidia HPCSDK - OpenMP Target Offload Training Parallelware Training Series ARM Debugging and Profiling Tools Tutorial Roofline on NVIDIA GPUs GPUs for Science events 3-part OpenACC Training Series 9-part CUDA Training Series

training

0 Likes

Type

learning

Level

Why 'N How: Martinos Center for Biomedical Imaging:

Why 'N How: Martinos

The Why & How seminar series is designed to introduce research assistants, graduate students, and postdoctoral and clinical fellows – really, anyone who is interested – to the many tools used in medical imaging. These include software tools and most of the major imaging modalities wielded by investigators (MRI, PET, EEG, MEG, optical, TMS and others). As the name of the series suggests, the talks cover both the reasons researchers might need a particular tool and the nuts and bolts of how to apply it. You can watch videos of the overviews below.

image-processing

0 Likes

Type

learning

Level

Header-only C++ JSON library

JSON is a lightweight format for storing and transporting data, for example in a config file. This library is header-only, and has easy-to-read documentation. It is a C++ library.

resources c++

0 Likes

Type

learning

Level

Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs

JupyterLabIDE GitHub Repository

Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively. There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.

0 Likes

Type

learning

Level

A guide to pip in Python

Pip Guide

pip stands for "pip installs packages". It's the go-to package manager for Python, allowing developers to install, update, and manage software libraries and dependencies used in Python projects. With just a few commands in your terminal or command prompt, pip makes it effortless to fetch libraries from the Python Package Index (PyPI) and integrate them into your projects. This guide will walk you through the basics of pip, from installation to advanced package management.

pip software-installation

0 Likes

Type

learning

Level

Automated Machine Learning Book

Automated Machine Learning: Methods, Systems, Challenges

The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.

ai data-analysis deep-learning machine-learning neural-networks python r

0 Likes

Type

learning

Level

Python Data and Viz Training (CCEP Program)

5 Days of recordings of Python data analysis and visualization training.

data-science python

0 Likes

Type

learning

Level

Active inference textbook

Active Inference: The Free Energy Principle in Mind, Brain, and Behavior

This textbook is the first comprehensive treatment of active inference, an integrative perspective on brain, cognition, and behavior used across multiple disciplines including computational neurosciences, machine learning, artificial intelligence, and robotics. It was published in 2022 and it's open access at this time. The contents in this textbook should be educational to those who want to understand how the free energy principle is applied to the normative behavior of living organisms and who want to widen their knowledge of sequential decision making under uncertainty.

ai machine-learning neural-networks

0 Likes

Type

learning

Level

CHARMM Links to Install, Run, and Troubleshoot MD Simulations

CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a widely distributed molecular simulation program with a broad array of applications. CHARMM has the capabilities to setup and run simulations on both biological and materials systems, contains a comprehensive set of analysis and tools, and has high performance on a variety of platforms. Here you will find links to the CHARMM website, forum, and registration/download page.

charmm molecular-dynamics namd computational-chemistry

0 Likes

Type

learning

Level

Texas A&M HPRC Training Site

Texas A&M Research Computing Training Resources

Training Resources and Courses offered by Texas A&M's Research Computing Group

ACES TAMU

0 Likes

Type

learning

Level

Scipy Lecture Notes

https://lectures.scientific-python.org/

Comprehensive tutorials and lecture notes covering various aspects of scientific computing using Python and Scipy.

visualization data-analysis machine-learning python

0 Likes

Type

learning

Level

CMake Tutorials

CMake Tutorials

CMake is an open-source tool used to manage the build process in operating systems. This tutorial takes you through how to use CMake from the very basics with example projects.

training compiling

0 Likes

Type

learning

Level

Oakridge Leadership Computing Facility (OLCF) Training Events and Archive

Upcoming training events and archives of training materials detailing general HPC best practices as well as how to use OLCF resources and services.

training

0 Likes

Type

learning

Level

GIS: Projections and their distortions

Map Projections

In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.

gis

0 Likes

Type

learning

Level

Thrust resources

Thrust is a CUDA library that optimizes parallelization on the GPU for you. The Thrust tutorial is great for beginners. The documentation is helpful for anyone using Thrust.

parallelization gpu resources

0 Likes

Type

learning

Level

How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev

Tutorial Site

A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.

0 Likes

Type

learning

Level