Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

AI/ML TechLab - Accelerating AI/ML Workflows on a Composable Cyberinfrastructure

This technology lab contains a set of sessions to help a new user start an AI project on the ACES cluster, a composable accelerator testbed at Texas A&M University. You will learn how to create and activate a virtual environment, manipulate and visualize data with Pandas and Matplotlib, use Scikit-learn for linear regression and classification applications, and use Pytorch to create and train a simple image classification model with deep neural networks (DNN).

ACES documentation TAMU ai visualization deep-learning machine-learning neural-networks login authentication composable-systems gpu nvidia slurm bash modules vim anaconda conda programming python scikit-learn

0 Likes

Type

documentation

Level

AI Institutes Cyberinfrastructure Documents: SAIL Meeting

Materials from the SAIL meeting (https://aiinstitutes.org/2023/06/21/sail-2023-summit-for-ai-leadership/). A space where AI researchers can learn about using ACCESS resources for AI applications and research.

access-account ai data-analysis machine-learning

0 Likes

Type

learning

Level

Jetstream2 Docs Site

Jetstream2 Docs Site

Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.

jetstream

0 Likes

Type

documentation

Level

DAGMan for orchestrating complex workflows on HTC resources (High Throughput Computing)

DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor. It manages dependencies between jobs at a higher level than the HTCondor Scheduler. It is a workflow management system developed by the High-Throughput Computing (HTC) community, specifically for managing large-scale scientific computations and data analysis tasks. It enables users to define complex workflows as directed acyclic graphs (DAGs). In a DAG, nodes represent individual computational tasks, and the directed edges represent dependencies between the tasks. DAGMan manages the execution of these tasks and ensures that they are executed in the correct order based on their dependencies. The primary purpose of DAGMan is to simplify the management of large-scale computations that consist of numerous interdependent tasks. By defining the dependencies between tasks in a DAG, users can easily express the order of execution and allow DAGMan to handle the scheduling and coordination of the tasks. This simplifies the development and execution of complex scientific workflows, making it easier to manage and track the progress of computations.

open-science-grid

0 Likes

Type

tool

Level

InsideHPC

InsideHPC HomePage

InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.

ai machine-learning community-outreach

0 Likes

Type

website

Level

Displaying Scientific Data with Tableau

Displaying Scientific Data with Tableau

Tableau is a popular and capable software product for creating charts that present data and dashboards that allow you to explore data. It is typically used to present business or statistical data, but can also create compelling visualizations of scientific data. However, scientific data is often generated or stored in formats that are not immediately accessible by Tableau. This seminar will explore the data formats that work best with Tableau and the available mechanisms for generating scientific data in (or converting it to) those formats so that you can apply the full power of Tableau to create the best possible visualizations of your data.

big-data data-analysis training workforce-development

0 Likes

Type

video_link

Level

ACCESS KB Guide - Expanse

ACCESS KB Guide

Expanse at SDSC is a cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and offers Composable Systems and Cloud Bursting.

expanse composable-systems gpu

0 Likes

Type

documentation

Level

Trusted CI

Trusted CI

The mission of Trusted CI is to lead in the development of an NSF Cybersecurity Ecosystem with the workforce, knowledge, processes, and cyberinfrastructure that enables trustworthy science and NSF’s vision of a nation that is a global leader in research and innovation.

cybersecurity training

0 Likes

Type

website

Level

Chameleon

Chameleon User Guide

Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.

data-sharing data-reproducibility

0 Likes

Type

documentation

Level

Jetstream Home

https://jetstream-cloud.org

jetstream

0 Likes

Type

website

Level

Neurostars

Neurostars

A question and answer forum for neuroscience researchers, infrastructure providers and software developers.

documentation image-processing data-sharing psychology

0 Likes

Type

website

Level

Guide to building AirSim on Linux machines

Build AirSim on Linux

This article provides step-by-step instructions on how to build AirSim, a simulator for autonomous vehicles, on Linux. It includes both Docker and host machine setup options, along with details on building Unreal Engine, AirSim, and the Unreal environment. It also provides guidance on how to use AirSim once it is set up.

documentation github github-pages hardware unix-environment

0 Likes

Type

documentation

Level

Solving differential equations with Physics-informed Neural Network

solving DE with neural networks

Differential equations, the backbone of countless physical phenomena, have traditionally been solved using numerical methods or analytical techniques. However, the advent of deep learning introduces an intriguing alternative: Physics-Informed Neural Networks (PINNs). By leveraging the representational power of neural networks and integrating physical laws (like differential equations), PINNs offer a novel approach to solving complex problems. This guide walks through an implementation of a PINN to solve DEs such as the logistic equation.

neural-networks

0 Likes

Type

learning

Level

Applications of Machine Learning in Engineering and Parameter Tuning Tutorial

Applications of ML in Engineering and Parameter Tuning Tutorial (RMACC 2019)

Slides for a tutorial on Machine Learning applications in Engineering and parameter tuning given at the RMACC conference 2019.

data-analysis machine-learning python

0 Likes

Type

learning

Level

R for Research Scientists

R for Research Scientists GitHub Repository

A book for researchers who contribute code to R projects: This booklet is the result of my work with the Social Cognition for Social Justice lab. It was developed in response to questions I was getting from students; both grad students that were making software design decisions, and undergraduates who were using things like version control for the first time. Although many tutorials and resources exist for these topics, there was not a single source that I thought covered just enough material to build up to the workflow used by the lab without extraneous detail.

software-carpentry workforce-development r

0 Likes

Type

learning

Level

Understanding LLM Fine-tuning

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.

big-data training

0 Likes

Type

learning

Level

What is fairness in ML?

Building ML models for everyone: understanding fairness in machine learning

This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models. The article covers several key topics: Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities. Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations. Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed. Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle. Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples. Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI. Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.

ai visualization data-analysis deep-learning machine-learning

0 Likes

Type

documentation

Level

Beautiful Soup - Simple Python Web Scraping

Beautiful Soup Docs

This package lets you easily scrape websites and extract information based on html tags and various other metadata found in the page. It can be useful for large-scale web analysis and other tasks requiring automated data gathering.

documentation ai big-data data-sharing data-transfer data-wrangling

0 Likes

Type

tool

Level

Using Dask on HPC Systems

A tutorial on the effective use of Dask on HPC resources. The four-hour tutorial will be split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to): Beginner section High-level collections including dask.array and dask.dataframe Distributed Dask clusters using HPC job schedulers Earth Science data analysis using Dask with Xarray Using the Dask dashboard to understand your computation Intermediate section Optimizing the number of workers and memory allocation Choosing appropriate chunk shapes and sizes for Dask collections Querying resource usage and debugging errors

training jupyterhub python

0 Likes

Type

learning

Level

Biopython Tutorial

The Biopython Tutorial and Cookbook website is a dedicated online resource for users in the field of computational biology and bioinformatics. It provides a collection of tutorials and practical examples focused on using the Biopython library. The website offers a series of tutorials that cover various aspects of Biopython, catering to users with different levels of expertise. It also includes code snippets and examples, and common solutions to common challenges in computational biology.

bioinformatics genomics python

0 Likes

Type

learning

Level

Neural Networks in Julia

Neural Networks in Julia using Flux.jl

Making a neural network has never been easier! The following link directs users to the Flux.jl package, the easiest way of programming a neural network using the Julia programming language. Julia is the fastest growing software language for AI/ML and this package provides a faster alternative to Python's TensorFlow and PyTorch with a 100% Julia native programming and GPU support.

ai deep-learning machine-learning neural-networks julia

0 Likes

Type

tool

Level

NCSA HPC-Moodle

NCSA HPC-Moodle

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.

training workforce-development

0 Likes

Type

learning

Level

ACCESS Support Portal

ACCESS Support Portal

affinity-group pegasus ACCESS-website ondemand

0 Likes

Type

website

Level

Rockfish at Johns Hopkins University

Rockfish Resources and Documentation

Resources and User Guide available at Rockfish

rockfish

0 Likes

Type

documentation

Level

Globus Documentation

Globus Documentation

Globus is a data transfer, sharing, automation, and discovery service used by hundreds of thousands of researchers to manage "big data" at universities, research labs, and national systems such as ACCESS. The Globus documentation website provides how-to guides, reference documentation, and examples for Globus's web application, command-line interface, Python software development kit (SDK), and APIs.

cloud-storage data-sharing data-management data-management-software data-transfer data-wrangling file-transfer globus dtn python data-security data-compliance federated-authentication secure-data-architecture

0 Likes

Type

documentation

Level