Using Dask on HPC Systems
0
A tutorial on the effective use of Dask on HPC resources. The four-hour tutorial will be split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to):
Beginner section
High-level collections including dask.array and dask.dataframe
Distributed Dask clusters using HPC job schedulers
Earth Science data analysis using Dask with Xarray
Using the Dask dashboard to understand your computation
Intermediate section
Optimizing the number of workers and memory allocation
Choosing appropriate chunk shapes and sizes for Dask collections
Querying resource usage and debugging errors
Official Documentation for PyTorch and NumPy
0
The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.
Fundamentals of Cloud Computing
0
An introduction to Cloud Computing
Git Branching Workflow and Maneuvers
0
A couple of resources that:
1.) Presents and defends a git branching workflow for stable collaborative git based projects. ("A Successful Git Branching Model")
2.) Maps "What do you want to do?" to the commands necessary to accomplish it. ("Git Flight Rules")
A visual introduction to Gaussian Belief Propagation
0
This website is an interactive introduction to Gaussian Belief Propagation (GBP). A probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. The key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.
GIS: What is a Geodetic Datums?
0
Often when working with GIS, or spatial data, one encounters the word "datum" and it may require that you choose a "datum" when doing GIS computation tasks. Below is a short video on what are datums from NOAA and UCAR.
DeepChem
0
DeepChem is an open-source library built on TensorFlow and PyTorch. It is helpful in applying machine learning algorithms to molecular data.
ACCESS - Video for new ACCESS users
0
This is a short video on how to exchange ACCESS credits and connect to Jetstream 2 (please note this was created for Duke users but applies to all) .
Educause HEISC-800-171 Community Group
0
The purpose of this group is to provide a forum to discuss NIST 800-171 compliance. Participants are encouraged to collaborate and share effective practices and resources that help higher education institutions prepare for and comply with the NIST 800-171 standard as it relates to Federal Student Aid (FSA), CMMC, DFARS, NIH, and NSF activities.
Research Software Engineering Training Materials
0
An ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.
Anvil Home Page
0
FreeSurfer Tutorials
0
The official MGH / Harvard tutorial page for FreeSurfer. The FreeSurfer group has provided and designed a series of tutorials for using FreeSurfer and for getting acquainted with the concepts needed to perform its various modes of analysis and processing of MRI data. The tutorials are designed to be followed along in a terminal window where commands can be copy/pasted instead of typed.
AHPCC documentary
0
This link is a documentary website to use AHPCC.
Trinity Tutorial for Transcriptome Assembly
0
Trinity is one of the most popular tool to assemble transcripts from RNA-Seq short reads. In this tutorial, we will cover the basic usage of Trinity, best practice and common problems.
Oakridge Leadership Computing Facility (OLCF) Training Events and Archive
0
Upcoming training events and archives of training materials detailing general HPC best practices as well as how to use OLCF resources and services.
Intro to Statistical Computing with Stan
0
The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.
Scikit-Learn: Easy Machine Learning and Modeling
0
Scikit-learn is free software machine learning library for Python. It has a variety of features you can use on data, from linear regression classifiers to xg-boost and random forests. It is very useful when you want to analyze small parts of data quickly.
Tutorial for OpenMP Building up and Utilization
0
The following link elaborates the usage of OpenMP API and its related syntax. There are also several exercises available for learners to help them get familiar with this widely-used tool for multi-threaded realization.
ACCESS Events and Training
0
Listing of upcoming ACCESS related events and training activities.
Data Analysis with R for Educators
0
This webinar series is an orientation to R. We start with an overview of R’s history and place in the larger data science ecosystem. Next, we introduce the R Studio user interface and how to access R’s excellent documentation. Finally, we present the fundamental concepts you need to use the R environment and language for data analysis. Along the way, we compare R script files (.R) to R Notebook (.Rmd) files and show how the features of R Notebook support better communication and encourage more dynamic engagement with statistical analysis and code. It is helpful to be familiar with tabular data analysis using statistical software, database tools, or spreadsheet programs.
Workshop materials, including setup directions and slides are available at https://github.com/CornellCAC/r_for_edu/ The Rstudio Cloud project used in the workshop is https://rstudio.cloud/project/4044219.
The Official Documentation of Pandas
0
Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.
Containerization Explained
0
Containerization is a software development method in which applications are packaged into standard units for development, shipment, and deployment.
Jetstream2 Docs Site
0
Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.