Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

What is fairness in ML?

Building ML models for everyone: understanding fairness in machine learning

This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models. The article covers several key topics: Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities. Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations. Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed. Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle. Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples. Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI. Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.

ai visualization data-analysis deep-learning machine-learning

0 Likes

Type

documentation

Level

Numba: Compiler for Python

Numba Compiler

Numba is a Python compiler designed for accelerating numerical and array operations, enabling users to enhance their application's performance by writing high-performance functions in Python itself. It utilizes LLVM to transform pure Python code into optimized machine code, achieving speeds comparable to languages like C, C++, and Fortran. Noteworthy features include dynamic code generation during import or runtime, support for both CPU and GPU hardware, and seamless integration with the Python scientific software ecosystem, particularly Numpy.

vectorization optimization performance-tuning parallelization

0 Likes

Type

documentation

Level

Running Particle-in-Cell Simulations on HPC

WarpX website

WarpX is an advanced particle-in-cell code used to model particle accelerators, which needs to be run on HPC. This website contains the tutorial on how to build WarpX on various HPC systems such as NERSC along with examples on how to set up post-processing/visualization tools for different physics cases.

github github-pages novel-accelerators

0 Likes

Type

documentation

Level

Vulkan Support Survey across Systems

It's not uncommon to see beautiful visualizations in HPC center galleries, but the majority of these are either rendered off the HPC or created using programs that run on OpenGL or custom rasterization techniques. To put it simply the next generation of graphics provided by OpenGL's successor Vulkan is strangely absent in the super computing world. The aim of this survey of available resources is to determine the systems that can support Vulkan workflows and programs. This will assist users in getting past some of the first hurdles in using Vulkan in HPC contexts.

anvil matlab darwin expanse xsede c++

0 Likes

Type

documentation

Level

ACCESS Guide (originally given at Duke OIT)

Using Jetstream 2 for Duke members (written for Duke OIT)

A guide for Duke OIT on how to advise users on using ACCESS and allocation credits to jetstream 2 for Duke University members. This can be used for non Duke members. Assumes the reader has basic knowledge of ACCESS.

ACCESS-credits adding-users allocation-management jetstream cloud-computing login ACCESS-website project-management cilogon

0 Likes

Type

documentation

Level

Guide to building AirSim on Linux machines

Build AirSim on Linux

This article provides step-by-step instructions on how to build AirSim, a simulator for autonomous vehicles, on Linux. It includes both Docker and host machine setup options, along with details on building Unreal Engine, AirSim, and the Unreal environment. It also provides guidance on how to use AirSim once it is set up.

documentation github github-pages hardware unix-environment

0 Likes

Type

documentation

Level

Introduction to GPU/Parallel Programming using OpenACC

Intro to OpenACC

Introduction to the basics of OpenACC.

gpu c c++compiling fortran

0 Likes

Type

presentation

Level

Contributing cycles to the Open Science Grid

Contributing cycles to the Open Science Grid

documentation open-science-grid

0 Likes

Type

documentation

Level

Intro to Statistical Computing with Stan

The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.

data-analysis machine-learning monte-carlo python

0 Likes

Type

documentation

Level

Pandas - Python

Pandas Docs

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.

documentation ai big-data data-analysis

0 Likes

Type

documentation

Level

Paraview UArizona HPC links (beginner)

These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant (rtdatavis.github.io). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. Some of the pages linked are very beginner friendly: getting started, working with cameras and keyframes for rendering, visualizing external files (netcdf climate data), graphs and data exporting. Many of the workflows involve using remote desktops via the Open On Demand interface, but if this isn't set up at your university you can use paraview locally on a desktop. Feel free to post on access ci https://ask.cyberinfrastructure.org/ if you need assistance getting a paraview gui open for your work on HPC.

visualization

0 Likes

Type

documentation

Level

AHPCC documentary

Arkansas High Performance Computing Center

This link is a documentary website to use AHPCC.

0 Likes

Type

documentation

Level

Chameleon

Chameleon User Guide

Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.

data-sharing data-reproducibility

0 Likes

Type

documentation

Level

Machine Learning in Astrophysics

Machine learning is becoming increasingly important in field with large data such as astrophysics. AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy allowing for a range of statistical and machine learning routines to analyze astronomical data in Python. In particular, it has loaders for many open astronomical datasets with examples on how to visualize such complicated and large datasets.

plotting big-data image-processing machine-learning astrophysics

0 Likes

Type

documentation

Level

Rockfish at Johns Hopkins University

Rockfish Resources and Documentation

Resources and User Guide available at Rockfish

rockfish

0 Likes

Type

documentation

Level

Neocortex Documentation

Neocortex Documentation

Neocortex is a new supercomputing cluster at the Pittsburgh Supercomputing Center (PSC) that features groundbreaking AI hardware from Cerebras Systems.

documentation ai deep-learning neural-networks hardware

0 Likes

Type

documentation

Level

UCLA Extended Reality (XR) collaboration resources and Workshop

Extended Reality (XR) Resource workshop/Guide for Building collaboration

Comprehensive Extended Reality (XR) collaboration resources for building a high performance extended reality (XR), augmented reality (AR), virtual reality (VR) and mixed reality campus teams. The tags set are a small subset of the the topics covered.

documentation neural-networks

0 Likes

Type

presentation

Level

Moving-Lid-Driven Flow Simulation by Finite Difference Method

Finite Difference Implementation for Flow Inside a Cavity With a lid Moving Above

The listed repository contains code written in C++ to model the flow inside a cavity with a lid moving above from left to right by discretizing incompressible N-S equations with finite difference method. For the governing equations, artificial viscosity has been considered to increase the stability. In terms of solving the resulted algebraic equation system, both the Point Jacobi Method and Symmetric Gauss Seidel methods have been used for the iteration process.

fluid-dynamics

0 Likes

Type

documentation

Level

Bioinformatics Workflow Management with Nextflow

Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis. Here provided some links to start with.

cloud-computing parallelization data-management bioinformatics training

0 Likes

Type

documentation

Level

Intro to Machine Learning on HPC

Intro to Machine Learning on HPC

This tutorial introduces machine learning on high performance computing (HPC) clusters. While it focuses on the HPC clusters at The University of Arizona, the content is generic enough that it can be used by students from other institutions.

ai supervised-learning unsupervised-learning deep-learning machine-learning neural-networks

0 Likes

Type

documentation

Level

CUDA Toolkit Documentation

CUDA Toolkit Documentation

NVIDIA CUDA Toolkit Documentation: If you are working with GPUs in HPC, the NVIDIA CUDA Toolkit is essential. You can access the CUDA Toolkit documentation, including programming guides and API references, at this provided website

documentation c c++fortran python

0 Likes

Type

documentation

Level

Representation Learning in Deep Learning

Representation Learning in Deep Learning

Representation learning is a fundamental concept in machine learning and artificial intelligence, particularly in the field of deep learning. At its core, representation learning involves the process of transforming raw data into a form that is more suitable for a specific task or learning objective. This transformation aims to extract meaningful and informative features or representations from the data, which can then be used for various tasks like classification, clustering, regression, and more.

deep-learning image-processing machine-learning neural-networks

0 Likes

Type

documentation

Level

Official Documentation for PyTorch and NumPy

The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.

deep-learning neural-networks pytorch python

0 Likes

Type

documentation

Level

EasyBuild Documentation

EasyBuild is a software installation framework that allows administrators to easily build and install software on high-performance computing (HPC) systems. It supports a wide range of software packages, toolchains, and compilers. Supported software are found in the EasyConfigs repository, one of several resositories in EasyBuild project.

easybuild

0 Likes

Type

documentation

Level

Introductory Tutorial to Numpy and Pandas for Data Analysis

Numpy and Pandas for Data Analysis

In this tutorial, I present an overview with many examples of the use of Numpy and Pandas for data analysis. Beginners in the field of data analysis can find It incredibly helpful, and at the same time, anyone who already has experience in data analysis and needs a refresher can find value in it. I discuss the use of Numpy for analyzing 1D and 2D multidimensional data and an introduction on using Pandas to manipulate CSV files.

ai big-data data-analysis vectorization

0 Likes

Type

documentation

Level