Knowledge Base Resources

Use these resources “vetted” by the community. Additional Knowledge Base Resources are always welcome.

GIS: Projections and their distortions

Map Projections

In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.

gis

0 Likes

Type

learning

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

MATLAB with other Programming Languages

Using MATLAB with Other Programming Languages

MATLAB is a really useful tool for data analysis among other computational work. This tutorial takes you through using MATLAB with other programming languages including C, C++, Fortran, Java, and Python.

c c++fortran java matlab python

0 Likes

Type

tool

Level

A survey on datasets for fairness-aware machine learning

A survey on datasets for fairness-aware machine learning

The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.

ai data-analysis deep-learning data-science

0 Likes

Type

documentation

Level

ACCESS KB Guide - Anvil

ACCESS KB Guide - Anvil

Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.

anvil

0 Likes

Type

documentation

Level

Git Branching Workflow and Maneuvers

A couple of resources that: 1.) Presents and defends a git branching workflow for stable collaborative git based projects. ("A Successful Git Branching Model") 2.) Maps "What do you want to do?" to the commands necessary to accomplish it. ("Git Flight Rules")

github git

0 Likes

Type

learning

Level

TensorFlow for Deep Neural Networks

TensorFlow Docs

TensorFlow is a powerful framework for Deep Learning, developed by google. This specifically is their python package, which is easy to use and can be used to train incredibly powerful models.

documentation faster tensorflow

0 Likes

Type

tool

Level

Paraview UArizona HPC links (advanced)

These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant ([rtdatavis.github.io](http://rtdatavis.github.io/)). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. These links are distinct from the others posted in the beginner paraview access ci links from the University of Arizona in that they are for more complex workflows. The links included explain how to use the terminal with paraview (pvpython), and the steps to leverage HPC resources for headless batch rendering. The batch rendering tutorial is significantly more complex than the others so if you find yourself stuck please post on the https://ask.cyberinfrastructure.org/ and I will try to troubleshoot with you.

visualization

0 Likes

Type

documentation

Level

An Introduction to the Julia Programming Language

The Julia Programming Language is one of the fastest growing software languages for AI/ML development. It writes in manner that's similar to Python while being nearly as fast as C++, while being open source, and reproducible across platforms and environments. The following link provide an introduction to using Julia including the basic syntax, data structures, key functions, and a few key packages.

ai data-analysis machine-learning julia

0 Likes

Type

learning

Level

Samtools Documentation

https://www.htslib.org/doc/

Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.

documentation data-analysis bioinformatics data-science genomics

0 Likes

Type

documentation

Level

GPU Computing Workshop Series for the Earth Science Community

GPU training series for scientists, software engineers, and students, with emphasis on Earth science applications. The content of this course is coordinated with the 6 month series of GPU Training sessions starting in Februrary 2022. The NVIDIA High Performance Computing Software Development Kit (NVHPC SDK) and CUDA Toolkit will be the primary software requirements for this training which will be already available on NCAR's HPC clusters as modules you may load. This software is free to download from NVIDIA by navigating to the NVHPC SDK Current Release Downloads page and the CUDA Toolkit downloads page. Any provided code is written specifically to build and run on NCAR's Casper HPC system but may be adapted to other systems or personal machines. Material will be updated as appropriate for the future deployment of NCAR's Derecho cluster and as technology progresses.

optimization performance-tuning profiling parallelization github pytorch tensorflow oceanography gpu hpc-arch-and-perf training c c++fortran cuda jupyterhub programming programming-best-practices python

0 Likes

Type

learning

Level

Charliecloud User Group

Charliecloud User Group

Announcements for for users and developers of Charliecloud, which provides lightweight user-defined software stacks for high-performance computing.

containers

0 Likes

Type

mailing_list

Level

Contributing cycles to the Open Science Grid

Contributing cycles to the Open Science Grid

documentation open-science-grid

0 Likes

Type

documentation

Level

Data Imputation Methods for Climate Data and Mortality Data

This slices and videos introduced how to use K-Nearest-Neighbors method to impute climate data and how to use Bayesian Spatio-Temporal models in R-INLA to impute mortality data. The demos will be added soon.

allocation-value documentation ai plotting visualization data-analysis machine-learning

0 Likes

Type

video_link

Level

Rockfish at Johns Hopkins University

Rockfish Resources and Documentation

Resources and User Guide available at Rockfish

rockfish

0 Likes

Type

documentation

Level

Installing Rocky Linux Operating System

Installing Rocky Linux 9

Rocky Linux is an open-source enterprise operating system. It is compatible with Red Hat Enterprise Linux (RHEL). It is a community-driven project that provides a stable and reliable platform for production workloads. It is one of the best alternatives to Opensource CentOS, since Centos will be on end of life (EoL) soon in 2024 by shifting to CentOS Stream.

unix-environment software-installation

0 Likes

Type

learning

Level

QGIS Processing Executor

QGIS processing from the command line

Running QGIS tools from the command line

gis

0 Likes

Type

documentation

Level

Campus Research Computing Consortium (CaRCC)

CaRCC

CaRCC – the Campus Research Computing Consortium – is an organization of dedicated professionals developing, advocating for, and advancing campus research computing and data and associated professions. Vision: CaRCC advances the frontiers of research by improving the effectiveness of research computing and data (RCD) professionals, including their career development and visibility, and their ability to deliver services and resources for researchers. CaRCC connects RCD professionals and organizations around common objectives to increase knowledge sharing and enable continuous innovation in research computing and data capabilities.

community-outreach professional-development research-facilitation workforce-development

0 Likes

Type

website

Level

Resource to active inference

Active inference institute website

Active inference is an emerging study field in machine learning and computational neuroscience. This website in particular introduces "active inference institute", which has established a couple of years ago, and contains a wide variety of resources for understanding the theory of active inference and for participating a worldwide active inference community.

0 Likes

Type

website

Level

MDAnalysis - Python library for the analysis of molecular dynamics simulations

MDAnalysis

MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.

computational-chemistry materials-science python

0 Likes

Type

tool

Level

Building Anaconda Navigator applications

Building Anaconda Navigator applications

This tutorial explains how to create an Anaconda Navigator Application (app) for JupyterLab. It is intended for users of Windows, macOS, and Linux who want to generate an Anaconda Navigator app conda package from a given recipe. Prior knowledge of conda-build or conda recipes is recommended.

compiling conda programming programming-best-practices

0 Likes

Type

tool

Level

Numba: Compiler for Python

Numba Compiler

Numba is a Python compiler designed for accelerating numerical and array operations, enabling users to enhance their application's performance by writing high-performance functions in Python itself. It utilizes LLVM to transform pure Python code into optimized machine code, achieving speeds comparable to languages like C, C++, and Fortran. Noteworthy features include dynamic code generation during import or runtime, support for both CPU and GPU hardware, and seamless integration with the Python scientific software ecosystem, particularly Numpy.

vectorization optimization performance-tuning parallelization

0 Likes

Type

documentation

Level

Trusted CI Resources Page

Trusted CI Resources Page

Very helpful list of external resources from Trusted CI

cybersecurity

0 Likes

Type

website

Level

Weka

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

big-data data-analysis machine-learning weka data-science java

0 Likes

Type

tool

Level

What is fairness in ML?

Building ML models for everyone: understanding fairness in machine learning

This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models. The article covers several key topics: Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities. Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations. Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed. Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle. Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples. Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI. Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.

ai visualization data-analysis deep-learning machine-learning

0 Likes

Type

documentation

Level