Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

ACCESS Pegasus Documentation

ACCESS Pegasus Documentation

The documentation provides an overview of using Pegasus, a workflow management system, on ACCESS resources for high throughput computing (HTC) workloads, covering logging in, workflow creation, resource configuration, and monitoring options.

pegasus

1 Like

Type

documentation

Level

GIS: Geocoding Services

Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location. There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below. For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.

gis

1 Like

Type

documentation

Level

Managing Python Packages on an HPC Cluster

Python Packages on HPC

This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.

documentation pytorch data-science ondemand batch-jobs job-submission slurm environment-modules anaconda jupyterhub python library-paths dependencies pip version-control

1 Like

Type

documentation

Level

Data Visualization tools for Python

MatPlotLib Docs

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It makes analyzing and presenting your data extremely easy and works with Python which many people already know.

documentation python

1 Like

Type

documentation

Level

Useful R Packages for Data Science and Statistics

https://www.udacity.com/blog/2021/01/best-r-packages-for-data-science.html

This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.

plotting visualization data-analysis machine-learning data-science r

1 Like

Type

documentation

Level

DARWIN Documentation Pages

DARWIN Documentation

DARWIN (Delaware Advanced Research Workforce and Innovation Network) is a big data and high performance computing system designed to catalyze Delaware research and education

darwin big-data

1 Like

Type

documentation

Level

Contributing cycles to the Open Science Grid

Contributing cycles to the Open Science Grid

documentation open-science-grid

0 Likes

Type

documentation

Level

Intro to Statistical Computing with Stan

The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.

data-analysis machine-learning monte-carlo python

0 Likes

Type

documentation

Level

Spack Documentation

Spack is a package manager for supercomputers that can help administrators install scientific software and libraries for multiple complex software stacks.

spack

0 Likes

Type

documentation

Level

AHPCC documentary

Arkansas High Performance Computing Center

This link is a documentary website to use AHPCC.

0 Likes

Type

documentation

Level

Chameleon

Chameleon User Guide

Chameleon is an NSF-funded testbed system for Computer Science experimentation. It is designed to be deeply reconfigurable, with a wide variety of capabilities for researching systems, networking, distributed and cluster computing and security.

data-sharing data-reproducibility

0 Likes

Type

documentation

Level

The Official Documentation of Pandas

pandas documentation

Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.

plotting visualization

0 Likes

Type

documentation

Level

Moving-Lid-Driven Flow Simulation by Finite Difference Method

Finite Difference Implementation for Flow Inside a Cavity With a lid Moving Above

The listed repository contains code written in C++ to model the flow inside a cavity with a lid moving above from left to right by discretizing incompressible N-S equations with finite difference method. For the governing equations, artificial viscosity has been considered to increase the stability. In terms of solving the resulted algebraic equation system, both the Point Jacobi Method and Symmetric Gauss Seidel methods have been used for the iteration process.

fluid-dynamics

0 Likes

Type

documentation

Level

Bioinformatics Workflow Management with Nextflow

Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis. Here provided some links to start with.

cloud-computing parallelization data-management bioinformatics training

0 Likes

Type

documentation

Level

Jetstream2 Docs Site

Jetstream2 Docs Site

Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.

jetstream

0 Likes

Type

documentation

Level

Singularity/Apptainer User Manuals

Singularity/Apptainer is a free and open-source container platform that allows users to build and run containers on high performance computing resources. SingularityCE is the community edition of Singularity maintained by Sylabs, a company that also offers commercial Singularity products and services. Apptainer is a fork of Singularity, maintained by the Linux foundation, a community of developers and users who are passionate about open source software.

containers singularity

0 Likes

Type

documentation

Level

CUDA Toolkit Documentation

CUDA Toolkit Documentation

NVIDIA CUDA Toolkit Documentation: If you are working with GPUs in HPC, the NVIDIA CUDA Toolkit is essential. You can access the CUDA Toolkit documentation, including programming guides and API references, at this provided website

documentation c c++fortran python

0 Likes

Type

documentation

Level

ACCESS KB Guide - Anvil

ACCESS KB Guide - Anvil

Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.

anvil

0 Likes

Type

documentation

Level

Representation Learning in Deep Learning

Representation Learning in Deep Learning

Representation learning is a fundamental concept in machine learning and artificial intelligence, particularly in the field of deep learning. At its core, representation learning involves the process of transforming raw data into a form that is more suitable for a specific task or learning objective. This transformation aims to extract meaningful and informative features or representations from the data, which can then be used for various tasks like classification, clustering, regression, and more.

deep-learning image-processing machine-learning neural-networks

0 Likes

Type

documentation

Level

Running Particle-in-Cell Simulations on HPC

WarpX website

WarpX is an advanced particle-in-cell code used to model particle accelerators, which needs to be run on HPC. This website contains the tutorial on how to build WarpX on various HPC systems such as NERSC along with examples on how to set up post-processing/visualization tools for different physics cases.

github github-pages novel-accelerators

0 Likes

Type

documentation

Level

Info about retiring of R GIS packages rgdal, rgeos, maptools in 2023

R GIS packages "rgdal", "rgeos", and "maptools" are package set to be archived and no longer supported by end of 2023. Many other R GIS packages are build on top of these packages, including "sp" and "raster". The packages recommended as replacement for "sp" is "sf" and the replacement for "raster" is "terra". Below are links to published articles regarding this transition. Additionally, I am including links to the documentation for the new packages recommended to be used "sf" and "terra".

0 Likes

Type

documentation

Level

AI/ML TechLab - Accelerating AI/ML Workflows on a Composable Cyberinfrastructure

This technology lab contains a set of sessions to help a new user start an AI project on the ACES cluster, a composable accelerator testbed at Texas A&M University. You will learn how to create and activate a virtual environment, manipulate and visualize data with Pandas and Matplotlib, use Scikit-learn for linear regression and classification applications, and use Pytorch to create and train a simple image classification model with deep neural networks (DNN).

ACES documentation TAMU ai visualization deep-learning machine-learning neural-networks login authentication composable-systems gpu nvidia slurm bash modules vim anaconda conda programming python scikit-learn

0 Likes

Type

documentation

Level

A survey on datasets for fairness-aware machine learning

A survey on datasets for fairness-aware machine learning

The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.

ai data-analysis deep-learning data-science

0 Likes

Type

documentation

Level

Samtools Documentation

https://www.htslib.org/doc/

Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.

documentation data-analysis bioinformatics data-science genomics

0 Likes

Type

documentation

Level

Pandas - Python

Pandas Docs

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.

documentation ai big-data data-analysis

0 Likes

Type

documentation

Level