Knowledge Base Resources

Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!

Add a Resource

Introduction to Deep Learning in Pytorch

This workshop series introduces the essential concepts in deep learning and walks through the common steps in a deep learning workflow from data loading and preprocessing to training and model evaluation. Throughout the sessions, students participate in writing and executing simple deep learning programs using Pytorch – a popular Python library for developing, training, and deploying deep learning models.

ai deep-learning image-processing machine-learning neural-networks pytorch gpu

1 Like

Type

learning

Level

DeapSECURE – Data-Enabled Advanced Computational Training Platform for Cybersecurity Research and Education

DeapSECURE lesson modules

DeapSECURE is a training program to infuse high-performance computational techniques into cybersecurity research and education. It is an NSF-funded project of the ODU School of Cybersecurity along with the Department of Electrical and Computer Engineering and the Information Technology Services at ODU. The DeapSECURE team has developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Techniques taught in DeapSECURE workshops are rather general and transferable to other areas including science, engineering, finance, linguistics, etc. All lesson materials are made available as open-source educational resources.

ai deep-learning machine-learning neural-networks visualization big-data data-analysis jekyll batch-jobs slurm bash ssh training workforce-development python scikit-learn cybersecurity

1 Like

Type

learning

Level

Attention, Transformers, and LLMs: a hands-on introduction in Pytorch

This workshop focuses on developing an understanding of the fundamentals of attention and the transformer architecture so that you can understand how LLMs work and use them in your own projects.

ai deep-learning machine-learning neural-networks pytorch

1 Like

Type

learning

Level

PyTorch for Deep Learning and Natural Language Processing

Introduction to PyTorch for Deep Learning

PyTorch is a Python library that supports accelerated GPU processing for Machine Learning and Deep Learning. In this tutorial, I will teach the basics of PyTorch from scratch. I will then explore how to use it for some ML projects such as Neural Networks, Multi-layer perceptrons (MLPs), Sentiment analysis with RNN, and Image Classification with CNN.

ai big-data data-analysis deep-learning machine-learning neural-networks

1 Like

Type

documentation

Level

AI for improved HPC research - Cursor and Termius - Powerpoint

Powerpoint - Cursor and Termius benefits for HPC

These slides provide an introduction on how Termius and Cursor, two new and freemium apps that use AI to perform more efficient work, can be used for faster HPC research.

documentation ai machine-learning ssh programming programming-best-practices python terminal-emulation-and-window-management

0 Likes

Type

presentation

Level

Automated Machine Learning Book

Automated Machine Learning: Methods, Systems, Challenges

The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.

ai data-analysis deep-learning machine-learning neural-networks python r

0 Likes

Type

learning

Level

Introduction to Probabilistic Graphical Models

https://ermongroup.github.io/cs228-notes/

This website summarizes the notes of Stanford's introductory course on probabilistic graphical models. It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.

ai machine-learning

0 Likes

Type

learning

Level

InsideHPC

InsideHPC HomePage

InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.

ai machine-learning community-outreach

0 Likes

Type

website

Level

Recommended Libraries for Cyberinfrastructure Users Developing Jupyter Notebooks

Recommended Libraries for Cyberinfrastructure Users Developing Jupyter Notebooks

This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!

0 Likes

Type

website

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

AI/ML TechLab - Accelerating AI/ML Workflows on a Composable Cyberinfrastructure

This technology lab contains a set of sessions to help a new user start an AI project on the ACES cluster, a composable accelerator testbed at Texas A&M University. You will learn how to create and activate a virtual environment, manipulate and visualize data with Pandas and Matplotlib, use Scikit-learn for linear regression and classification applications, and use Pytorch to create and train a simple image classification model with deep neural networks (DNN).

ACES documentation TAMU ai visualization deep-learning machine-learning neural-networks login authentication composable-systems gpu nvidia slurm bash modules vim anaconda conda programming python scikit-learn

0 Likes

Type

documentation

Level

Scikit-Learn: Easy Machine Learning and Modeling

Scikit-learn

Scikit-learn is free software machine learning library for Python. It has a variety of features you can use on data, from linear regression classifiers to xg-boost and random forests. It is very useful when you want to analyze small parts of data quickly.

documentation ai plotting visualization big-data data-analysis deep-learning image-processing machine-learning monte-carlo neural-networks vectorization

0 Likes

Type

tool

Level

Neural Networks in Julia

Neural Networks in Julia using Flux.jl

Making a neural network has never been easier! The following link directs users to the Flux.jl package, the easiest way of programming a neural network using the Julia programming language. Julia is the fastest growing software language for AI/ML and this package provides a faster alternative to Python's TensorFlow and PyTorch with a 100% Julia native programming and GPU support.

ai deep-learning machine-learning neural-networks julia

0 Likes

Type

tool

Level

Python Tools for Data Science

Python Tools for Data Science

Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2

ai machine-learning big-data data-analysis data-wrangling data-science training workforce-development python scikit-learn sql

0 Likes

Type

video_link

Level

Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs

JupyterLabIDE GitHub Repository

Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively. There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.

0 Likes

Type

learning

Level

Data Imputation Methods for Climate Data and Mortality Data

This slices and videos introduced how to use K-Nearest-Neighbors method to impute climate data and how to use Bayesian Spatio-Temporal models in R-INLA to impute mortality data. The demos will be added soon.

allocation-value documentation ai plotting visualization data-analysis machine-learning

0 Likes

Type

video_link

Level

Intro to Machine Learning on HPC

Intro to Machine Learning on HPC

This tutorial introduces machine learning on high performance computing (HPC) clusters. While it focuses on the HPC clusters at The University of Arizona, the content is generic enough that it can be used by students from other institutions.

ai supervised-learning unsupervised-learning deep-learning machine-learning neural-networks

0 Likes

Type

documentation

Level

Fairness and Machine Learning

Fairness and Machine Learning

The "Fairness and Machine Learning" book offers a rigorous exploration of fairness in ML and is suitable for researchers, practitioners, and anyone interested in understanding the complexities and implications of fairness in machine learning.

ai data-analysis deep-learning machine-learning data-science

0 Likes

Type

documentation

Level

Machine Learning with sci-kit learn

scikit learn tutorial

In the realm of Python-based machine learning, Scikit-Learn stands out as one of the most powerful and versatile tools available. This introductory post serves as a gateway to understanding Scikit-Learn through explanations of introductory ML concepts along with implementations examples in Python.

ai big-data machine-learning

0 Likes

Type

learning

Level

Implementing Markov Processes with Julia

Markov Decision Processes in Julia

The following link provides an easy method of implementing Markov Decision Processes (MDP) in the Julia computing language. MDPs are a class of algorithms designed to handle stochastic situations where the actor has some level of control. For example, used at a low level, MDPs can be used to control an inverted pendulum, but applied in higher level decision making the can also decide when to take evasive action in air traffic management. MDPs can also be extended to the partially observable domain to form the Partially Observable Markov Decision Process (POMDP). This link contains a wealth of information to show one can easily implement basic POMDP and MDP algorithms and apply well known online and offline solvers.

ai machine-learning julia

0 Likes

Type

tool

Level

Active inference textbook

Active Inference: The Free Energy Principle in Mind, Brain, and Behavior

This textbook is the first comprehensive treatment of active inference, an integrative perspective on brain, cognition, and behavior used across multiple disciplines including computational neurosciences, machine learning, artificial intelligence, and robotics. It was published in 2022 and it's open access at this time. The contents in this textbook should be educational to those who want to understand how the free energy principle is applied to the normative behavior of living organisms and who want to widen their knowledge of sequential decision making under uncertainty.

ai machine-learning neural-networks

0 Likes

Type

learning

Level

Factor Graphs and the Sum-Product Algorithm

https://ieeexplore.ieee.org/document/910572

A tutorial paper that presents a generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph. Following a single, simple computational rule, the sum-product algorithm computes either exactly or approximately various marginal functions derived from the global function. A wide variety of algorithms developed in artificial intelligence, signal processing, and digital communications can be derived as specific instances of the sum-product algorithm, including the forward/backward algorithm, the Viterbi algorithm, the iterative "turbo" decoding algorithm, Pearl's (1988) belief propagation algorithm for Bayesian networks, the Kalman filter, and certain fast Fourier transform (FFT) algorithms

access-account ai machine-learning

0 Likes

Type

documentation

Level

AI powered VsCode Editor

Cursor - AI code editor

**Cursor: The AI-Powered Code Editor** Cursor is a cutting-edge, AI-first code editor designed to revolutionize the way developers write, debug, and understand code. Built upon the premise of pair-programming with artificial intelligence, Cursor harnesses the capabilities of advanced AI models to offer real-time coding assistance, bug detection, and code generation. **How Cursor Benefits High-Performance Computing (HPC) Work:** 1. **Efficient Code Development:** With AI-assisted code generation, researchers and developers in the HPC realm can quickly write optimized code for simulations, data processing, or modeling tasks, reducing the time to deployment. 2. **Debugging Assistance:** Handling complex datasets and simulations often lead to intricate bugs. Cursor's capability to automatically investigate errors and determine root causes can save crucial time in the HPC workflow. 3. **Tailored Code Suggestions:** Cursor's AI provides context-specific code suggestions by understanding the entire codebase. For HPC applications where performance is paramount, this means receiving recommendations that align with optimization goals. 4. **Improved Code Quality:** With AI-driven bug scanning and linter checks, Cursor ensures that HPC codes are not only fast but also robust and free of common errors. 5. **Easy Integration:** Being a fork of VSCode, Cursor allows seamless migration, ensuring that developers working in HPC can swiftly integrate their existing VSCode setups and extensions. In essence, for HPC tasks that demand speed, precision, and robustness, Cursor acts as an invaluable co-pilot, guiding developers towards efficient and optimized coding solutions. It is free if you provide your own OPEN AI API KEY.

ai machine-learning workflow natural-language-processing programming python sas

0 Likes

Type

tool

Level

AI Institutes Cyberinfrastructure Documents: SAIL Meeting

Materials from the SAIL meeting (https://aiinstitutes.org/2023/06/21/sail-2023-summit-for-ai-leadership/). A space where AI researchers can learn about using ACCESS resources for AI applications and research.

access-account ai data-analysis machine-learning

0 Likes

Type

learning

Level

What is fairness in ML?

Building ML models for everyone: understanding fairness in machine learning

This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models. The article covers several key topics: Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities. Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations. Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed. Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle. Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples. Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI. Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.

ai visualization data-analysis deep-learning machine-learning

0 Likes

Type

documentation

Level