Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

Fairness and Machine Learning

Fairness and Machine Learning

The "Fairness and Machine Learning" book offers a rigorous exploration of fairness in ML and is suitable for researchers, practitioners, and anyone interested in understanding the complexities and implications of fairness in machine learning.

ai data-analysis deep-learning machine-learning data-science

0 Likes

Type

documentation

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

Automated Machine Learning Book

Automated Machine Learning: Methods, Systems, Challenges

The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.

ai data-analysis deep-learning machine-learning neural-networks python r

0 Likes

Type

learning

Level

Data Imputation Methods for Climate Data and Mortality Data

This slices and videos introduced how to use K-Nearest-Neighbors method to impute climate data and how to use Bayesian Spatio-Temporal models in R-INLA to impute mortality data. The demos will be added soon.

allocation-value documentation ai plotting visualization data-analysis machine-learning

0 Likes

Type

video_link

Level

Advanced Mathematical Optimization Techniques

https://scipy-lectures.org/advanced/mathematical_optimization/

Mathematical optimization deals with the problem of finding numerically minimums or maximums of a functions. This tutorial provides the Python solutions for the optimization problems with examples.

optimization python

0 Likes

Type

learning

Level

Examples of Thrust code for GPU Parallelization

thrust_ex.txt

Some examples for writing Thrust code. To compile, download the CUDA compiler from NVIDIA. This code was tested with CUDA 9.2 but is likely compatible with other versions. Before compiling change extension from thrust_ex.txt to thrust_ex.cu. Any code on the device (GPU) that is run through a Thrust transform is automatically parallelized on the GPU. Host (CPU) code will not be. Thrust code can also be compiled to run on a CPU for practice.

parallelization gpu cuda

0 Likes

Type

learning

Level

Jetstream2 Status

Jetstream2 Status

Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.

jetstream

0 Likes

Type

website

Level

OnShape FeatureScripts: Custom features for everyone

OnShape FeatureScripts

OnShape FeatureScripts allow users to create their own features via OnShape's programming language. The user can make these as simple or complex as they need, and they can save tons of time for heavy OnShape users or complex projects!

documentation materials-science particle-physics

0 Likes

Type

tool

Level

Paraview UArizona HPC links (advanced)

These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant ([rtdatavis.github.io](http://rtdatavis.github.io/)). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. These links are distinct from the others posted in the beginner paraview access ci links from the University of Arizona in that they are for more complex workflows. The links included explain how to use the terminal with paraview (pvpython), and the steps to leverage HPC resources for headless batch rendering. The batch rendering tutorial is significantly more complex than the others so if you find yourself stuck please post on the https://ask.cyberinfrastructure.org/ and I will try to troubleshoot with you.

visualization

0 Likes

Type

documentation

Level

Chameleon

Chameleon User Guide

Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.

data-sharing data-reproducibility

0 Likes

Type

documentation

Level

RRCoP Resources Page

RRCoP External resources Page

Very helpful list of Regulated Research Community of Practice's collaborating communities.

community-outreach cybersecurity

0 Likes

Type

website

Level

CMake Tutorials

CMake Tutorials

CMake is an open-source tool used to manage the build process in operating systems. This tutorial takes you through how to use CMake from the very basics with example projects.

training compiling

0 Likes

Type

learning

Level

Examples of code using JSON nlohmann header only Library for C++

This code showcases how to work with the header-only nlohmann JSON library for C++. In order to compile, change the extensions from json_test.txt to json_test.cpp and test.txt to test.json. You must also download the header files from https://github.com/nlohmann/json. Complilation instructions are at the bottom of json_test. This code is very helpful for creating config files, for example.

c++

0 Likes

Type

learning

Level

Optimizing Research Workflows - A Documentation of Snakemake

https://snakemake.readthedocs.io/en/stable/

Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.

documentation data-analysis data-reproducibility workflow bioinformatics data-science python

0 Likes

Type

documentation

Level

Better Scientific Software (BSSw)

The Better Scientific Software (BSSw) project provides a community to collaborate and learn about best practices in scientific software development. Software—the foundation of discovery in computational science & engineering—faces increasing complexity in computational models and computer architectures. BSSw provides a central hub for the community to address pressing challenges in software productivity, quality, and sustainability.

community-outreach project-management research-facilitation workforce-development

0 Likes

Type

website

Level

Neural Networks in Julia

Neural Networks in Julia using Flux.jl

Making a neural network has never been easier! The following link directs users to the Flux.jl package, the easiest way of programming a neural network using the Julia programming language. Julia is the fastest growing software language for AI/ML and this package provides a faster alternative to Python's TensorFlow and PyTorch with a 100% Julia native programming and GPU support.

ai deep-learning machine-learning neural-networks julia

0 Likes

Type

tool

Level

Ask.CI Q&A Platform for Research Computing

Ask.CI

resources programming-best-practices

0 Likes

Type

website

Level

Time-Series LSTMs Python Walkthrough

A walkthrough (with a Google Colab link) on how to implement your own LSTM to observe time-dependent behavior.

ai deep-learning machine-learning neural-networks pytorch python

0 Likes

Type

website

Level

Neurodesk

Neurodesk

Neurodesk provides a containerised data analysis environment to facilitate reproducible analysis of neuroimaging data. Analysis pipelines for neuroimaging data typically rely on specific versions of packages and software, and are dependent on their native operating system. These dependencies mean that a working analysis pipeline may fail or produce different results on a new computer, or even on the same computer after a software update. Neurodesk provides a platform in which anyone, anywhere, using any computer can reproduce your original research findings given the original data and analysis code.

psychology containers software-installation version-control

0 Likes

Type

website

Level

ACCESS KB Guide - DELTA

ACCESS KB Guide - DELTA

NCSA is the home of Delta, a computing and data resource that balances cutting-edge graphics processor and CPU architectures with a non-POSIX file system with a POSIX-like interface. Delta allows applications to reap the benefits of modern file systems without rewriting code.

delta

0 Likes

Type

documentation

Level

Open Storage Network

Open Storage Network

The Open Storage Network, a national resource available through the XSEDE resource allocation system, is high quality, sustainable, distributed storage cloud for the research community.

data-management data-retention open-storage-network storage hpc-storage

0 Likes

Type

website

Level

Performance Engineering Of Software Systems

MIT Performance Engineering Of Software Systems Homepage

A class from MITOpenCourseware that gives a hands on approach to building scalable and high-performance software systems. Topics include performance analysis, algorithmic techniques for high performance, instruction-level optimizations, caching optimizations, parallel programming, and building scalable systems.

optimization parallelization training

0 Likes

Type

learning

Level

Setting up PyFR flow solver on clusters

PyFR installation to local machine

These instructions were executed on the FASTER and Grace cluster computing facilities at Texas A&M University. However, the process can be applied to other clusters with similar environments. For local installation, please refer to the PyFR documentation. Please note that these instructions were valid at the time of writing. Depending on the time you're executing these, the versions of the modules may need to be updated. 1. Loading Modules The first step involves loading pre-installed software libraries required for PyFR. Execute the following commands in your terminal to load these modules: module load foss/2022b module load libffi/3.4.4 module load OpenSSL/1.1.1k module load METIS/5.1.0 module load HDF5/1.13.1 2. Python Installation from Source Choose a location for Python 3.11.1 installation, preferably in a .local directory. Navigate to the directory containing the Python 3.11.1 source code. Then configure and install Python: cd $INSTALL/Python-3.11.1/ ./configure --prefix=$LOCAL --enable-shared --with-system-ffi --with-openssl=/sw/eb/sw/OpenSSL/1.1.1k-GCCcore-11.2.0/ PKG_CONFIG_PATH=$LOCAL/pkgconfig LDFLAGS=/usr/lib64/libffi.so.6.0.2 make clean; make -j20; make install; 3. Virtual Environment Setup A virtual environment allows you to isolate Python packages for this project from others on your system. Create and activate a virtual environment using: pip3.11 install virtualenv python3.11 -m venv pyfr-venv . pyfr-venv/bin/activate 4. Install PyFR Dependencies Several Python packages are required for PyFR. Install these packages using the following commands: pip3 install --upgrade pip pip3 install --no-cache-dir wheel pip3 install --no-cache-dir botorch pandas matplotlib pyfr pip3 uninstall -y pyfr 5. Install PyFR from Source Finally, navigate to the directory containing the PyFR source code, and then install PyFR: cd /scratch/user/sambit98/github/PyFR/ python3 setup.py develop Congratulations! You've successfully set up PyFR on the FASTER and Grace cluster computing facilities. You should now be able to use PyFR for your computational fluid dynamics simulations.

faster fluid-dynamics c++cuda python mpi software-installation

0 Likes

Type

learning

Level

How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev

Tutorial Site

A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.

0 Likes

Type

learning

Level

Machine Learning in R online book

Flexible and Robust Machine Learning Using mlr3 in R

The free online book for the mlr3 machine learning framework for R. Gives a comprehensive overview of the package and ecosystem, suitable from beginners to experts. You'll learn how to build and evaluate machine learning models, build complex machine learning pipelines, tune their performance automatically, and explain how machine learning models arrive at their predictions.

data-analysis machine-learning r

0 Likes

Type

learning

Level