Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

Bash shell tutorial

Bash shell tutorial

Training materials for using the bash (and zsh) shell.

bash

0 Likes

Type

learning

Level

Linux Tutorial from Ryan's Tutorials

The following pages are intended to give you a solid foundation in how to use the terminal, to get the computer to do useful work for you. You won't be a Unix guru at the end but you will be well on your way and armed with the right knowledge and skills to get you there if that's what you want (which you should because that will make you even more awesome). Here you will learn the Linux command line (Bash) with our 13 part beginners tutorial. It contains clear descriptions, command outlines, examples, shortcuts and best practice. At first, the Linux command line may seem daunting, complex and scary. It is actually quite simple and intuitive (once you understand what is going on that is), and once you work through the following sections you will understand what is going on. Unix likes to take the approach of giving you a set of building blocks and then letting you put them together. This allows us to build things to suit our needs. With a bit of creativity and logical thinking, mixed in with an appreciation of how the blocks work, we can assemble tools to do virtually anything we want. The aim is to be lazy. Why should we do anything we can get the computer to do for us? The only reason I can think of is that you don't know how (but after working through these pages you will know how, so then there won't be a good reason). A question that may have crossed your mind is "Why should I bother learning the command line? The Graphical User Interface is much easier and I can already do most of what I need there." To a certain extent you would be right, and by no means am I suggesting you should ditch the GUI. Some tasks are best suited to a GUI, word processing and video editing are great examples. At the same time, some tasks are more suited to the command line, data manipulation (reporting) and file management are some good examples. Some tasks will be just as easy in either environment. Think of the command line as another tool you can add to your belt. As always, pick the best tool for the job.

file-systems bash unix-environment

0 Likes

Type

learning

Level

Machine Learning with sci-kit learn

scikit learn tutorial

In the realm of Python-based machine learning, Scikit-Learn stands out as one of the most powerful and versatile tools available. This introductory post serves as a gateway to understanding Scikit-Learn through explanations of introductory ML concepts along with implementations examples in Python.

ai big-data machine-learning

0 Likes

Type

learning

Level

Data Visualization Tools for Julia

Plots.jl is the most widely used plotting library for the Julia programming language. It's known for being especially powerful in its versatility and intuitiveness. It's limited set of dependencies and wide applicability across different graphics packages make it especially helpful in visualizing the results of your latest Julia implementation. However, there are still multiple options available for Julia programmers to visualize their datasets. The second link details a comparison against a variety of Julia packages.

plotting visualization julia

0 Likes

Type

tool

Level

Active inference textbook

Active Inference: The Free Energy Principle in Mind, Brain, and Behavior

This textbook is the first comprehensive treatment of active inference, an integrative perspective on brain, cognition, and behavior used across multiple disciplines including computational neurosciences, machine learning, artificial intelligence, and robotics. It was published in 2022 and it's open access at this time. The contents in this textbook should be educational to those who want to understand how the free energy principle is applied to the normative behavior of living organisms and who want to widen their knowledge of sequential decision making under uncertainty.

ai machine-learning neural-networks

0 Likes

Type

learning

Level

Python

Introduction to Python - Texas A&M

Python course offered by Texas A&M HPRC

python

0 Likes

Type

learning

Level

CMake Tutorials

CMake Tutorials

CMake is an open-source tool used to manage the build process in operating systems. This tutorial takes you through how to use CMake from the very basics with example projects.

training compiling

0 Likes

Type

learning

Level

GDAL Multi-threading

GDAL Multi-threading

Multi-threading guidance when using GDAL.

parallelization gis

0 Likes

Type

learning

Level

Horovod: Distributed deep learning training framework

Horovod

Horovod is a distributed deep learning training framework. Using horovod, a single-GPU training script can be scaled to train across many GPUs in parallel. The library supports popular deep learning framework such as TensorFlow, Keras, PyTorch, and Apache MXNet.

deep-learning distributed-computing gpu

0 Likes

Type

tool

Level

Scikit-Learn: Easy Machine Learning and Modeling

Scikit-learn

Scikit-learn is free software machine learning library for Python. It has a variety of features you can use on data, from linear regression classifiers to xg-boost and random forests. It is very useful when you want to analyze small parts of data quickly.

documentation ai plotting visualization big-data data-analysis deep-learning image-processing machine-learning monte-carlo neural-networks vectorization

0 Likes

Type

tool

Level

Introduction to Probabilistic Graphical Models

https://ermongroup.github.io/cs228-notes/

This website summarizes the notes of Stanford's introductory course on probabilistic graphical models. It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.

ai machine-learning

0 Likes

Type

learning

Level

Advanced Mathematical Optimization Techniques

https://scipy-lectures.org/advanced/mathematical_optimization/

Mathematical optimization deals with the problem of finding numerically minimums or maximums of a functions. This tutorial provides the Python solutions for the optimization problems with examples.

optimization python

0 Likes

Type

learning

Level

Gaussian 16

Gaussian 16 is a computational chemistry package that is used in predicting molecular properties and understanding molecular behavior at a quantum mechanical level.

gaussian computational-chemistry

0 Likes

Type

tool

Level

Oakridge Leadership Computing Facility (OLCF) Training Events and Archive

Upcoming training events and archives of training materials detailing general HPC best practices as well as how to use OLCF resources and services.

training

0 Likes

Type

learning

Level

Header-only C++ JSON library

JSON is a lightweight format for storing and transporting data, for example in a config file. This library is header-only, and has easy-to-read documentation. It is a C++ library.

resources c++

0 Likes

Type

learning

Level

Fundamentals of R Programming

This course is an introduction to the R programming language and covers the fundamental concepts needed to operate in the R environment. This course was taught for the ACCESS community on September 26, 2023, but the materials for the course are still available on the ACES cluster and can be completed independently. All materials are presented as learnR notebooks and cover several topics, including data types, variables, built-in functions, data structures, and plotting.

ACES TAMU plotting data-analysis r

0 Likes

Type

learning

Level

GIS: What is a Geodetic Datums?

What are Geodetic Datums?

Often when working with GIS, or spatial data, one encounters the word "datum" and it may require that you choose a "datum" when doing GIS computation tasks. Below is a short video on what are datums from NOAA and UCAR.

arcgis gis

0 Likes

Type

learning

Level

MDAnalysis - Python library for the analysis of molecular dynamics simulations

MDAnalysis

MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.

computational-chemistry materials-science python

0 Likes

Type

tool

Level

MATLAB bioinformatics toolbox

https://www.mathworks.com/products/bioinfo.html

Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.

visualization data-analysis bioinformatics genomics matlab

0 Likes

Type

tool

Level

NCSA HPC-Moodle

NCSA HPC-Moodle

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.

training workforce-development

0 Likes

Type

learning

Level

Metadata Systems

Metadata Systems

Metadata is a vital topic in libraries and librarianship, encompassing structured information used for accessing digital resources. The definition of metadata varies but is essentially data about data. It has evolved beyond simply describing metadata schemas and now focuses on topics like interoperability, non-descriptive metadata (administrative and preservation metadata), and the effective application of metadata schemas for user discovery. Interoperability, the ability to seamlessly exchange metadata between systems, is a major concern. Different levels of interoperability are examined, including schema-level, record-level, and repository-level. Challenges to interoperability include variations in standards, collaboration barriers, and costs.Metadata management is discussed in terms of the holistic management of metadata across an entire library. Steps include analyzing metadata requirements, adopting schema, creating metadata content, delivery/access, evaluation, and maintenance. Administrative metadata, which encompasses ownership and production information, is becoming more critical, particularly for electronic resource licensing. Preservation metadata is also gaining importance in ensuring the long-term viability of digital objects.

metadata

0 Likes

Type

learning

Level

File management of Visual Studio Code on clusters

VS Code installation

Visual Studio Code, commonly known as VSCode, is a popular tool used by programmers worldwide. It serves as a text editor and an Integrated Development Environment (IDE) that supports a wide variety of programming languages. One of its key features is its extensive library of extensions. These extensions add on to the basic functionalities of VSCode, making coding more efficient and convenient. However, there's a catch. When these extensions are installed and used frequently, they generate a multitude of files. These files are typically stored in a folder named .vscode-extension within your home directory. On a cluster computing facility such as the FASTER and Grace clusters at Texas A&M University, there's a limitation on how many files you can have in your home directory. For instance, the file number limit could be 10000, while the .vscode-extension directory can hold around 4000 temporary files even with just a few extensions. Thus, if the number of files in your home directory surpasses this limit due to VSCode extensions, you might face some issues. This restriction can discourage users from taking full advantage of the extensive features and extensions offered by the VSCode editor. To overcome this, we can shift the .vscode-extension directory to the scratch space. The scratch space is another area in the cluster where you can store files and it usually has a much higher limit on the number of files compared to the home directory. We can perform this shift smoothly using a feature called symbolic links (or symlinks for short). Think of a symlink as a shortcut or a reference that points to another file or directory located somewhere else. Here's a step-by-step guide on how to move the .vscode-extension directory to the scratch space and create a symbolic link to it in your home directory: 1. Copy the .vscode-extension directory to the scratch space: Using the cp command, you can copy the .vscode-extension directory (along with all its contents) to the scratch space. Here's how: cp -r ~/.vscode-extension /scratch/user Don't forget to replace /scratch/user with the actual path to your scratch directory. 2. Remove the original .vscode-extension directory: Once you've confirmed that the directory has been copied successfully to the scratch space, you can remove the original directory from your home space. You can do this using the rm command: rm -r ~/.vscode-extension It's important to make sure that the directory has been copied to the scratch space successfully before deleting the original. 3. Create a symbolic link in the home directory: Lastly, you'll create a symbolic link in your home directory that points to the .vscode-extension directory in the scratch space. You can do this as follows: ln -s /scratch/user/.vscode-extension ~/.vscode-extension By following this process, all the files generated by VSCode extensions will be stored in the scratch space. This prevents your home directory from exceeding its file limit. Now, when you access ~/.vscode-extension, the system will automatically redirect you to the directory in the scratch space, thanks to the symlink. This method ensures that you can use VSCode and its various extensions without worrying about hitting the file limit in your home directory.

faster file-limit scratch file-transfer

0 Likes

Type

learning

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

Weka

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

big-data data-analysis machine-learning weka data-science java

0 Likes

Type

tool

Level

UNIX/command line basics tutorial

UNIX/command line basics tutorial

Introductory training materials for working on the UNIX command line.

bash

0 Likes

Type

learning

Level