Big Data Research at the University of Colorado Boulder
0
Background: Big data, defined as having high volume, complexity or velocity, have the potential to greatly accelerate research discovery. Such data can be challenging to work with and require research support and training to address technical and ethical challenges surrounding big data collection, analysis, and publication.
Methods: The present study was conducted via a series of semi-structured interviews to assess big data methodologies employed by CU Boulder researchers across a broad sample of disciplines, with the goal of illuminating how they conduct their research; identifying challenges and needs; and providing recommendations for addressing them.
Findings: Key results and conclusions from the study indicate: gaps in awareness of existing big data services provided by CU Boulder; open questions surrounding big data ethics, security and privacy issues; a need for clarity on how to attribute credit for big data research; and a preference for a variety of training options to support big data research.
The Theory Behind Neural Networks (Very Simplified)
0
This video by the YouTube channel 3Blue1Brown provides a very simplified introduction to the theory behind neural networks. This tutorial is perfect for those that don't have much linear algebra or machine learning background and are eager to step into the realm of ML!
Molecular Dynamics Tutorials for Beginner's
0
Links to MD tutorials for beginner's across various simulation platforms.
Creating a Mobile Application
0
Goes through in detail on how to build an application that can run on Android and IOS devices, using Qt Creator to develop Qt Quick applications. Goes through the setting up, creation, configuration, optimization, and overall deployment. This provides the fundamental basis, need to click around on the site for more specifics.
MPI Resources
0
Workshop for beginners and intermediate students in MPI which includes helpful exercises. Open MPI documentation.
DAGMan for orchestrating complex workflows on HTC resources (High Throughput Computing)
0
DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor. It manages dependencies between jobs at a higher level than the HTCondor Scheduler.
It is a workflow management system developed by the High-Throughput Computing (HTC) community, specifically for managing large-scale scientific computations and data analysis tasks. It enables users to define complex workflows as directed acyclic graphs (DAGs). In a DAG, nodes represent individual computational tasks, and the directed edges represent dependencies between the tasks. DAGMan manages the execution of these tasks and ensures that they are executed in the correct order based on their dependencies.
The primary purpose of DAGMan is to simplify the management of large-scale computations that consist of numerous interdependent tasks. By defining the dependencies between tasks in a DAG, users can easily express the order of execution and allow DAGMan to handle the scheduling and coordination of the tasks. This simplifies the development and execution of complex scientific workflows, making it easier to manage and track the progress of computations.
Trinity Tutorial for Transcriptome Assembly
0
Trinity is one of the most popular tool to assemble transcripts from RNA-Seq short reads. In this tutorial, we will cover the basic usage of Trinity, best practice and common problems.
Python
0
Python course offered by Texas A&M HPRC
GIS: Projections and their distortions
0
In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.
DeepChem
0
DeepChem is an open-source library built on TensorFlow and PyTorch. It is helpful in applying machine learning algorithms to molecular data.
Master's in Data Science Program Guide - TechGuide
0
A master’s degree in data science helps prepare professionals to take the next career step. This article will focus primarily on data science, a graduate degree in this field, and a data scientist or data analyst career. With many employers preferring a master’s degree in data science for those seeking to fill roles as data scientists or analysts, we will discuss the data science master’s degree in detail.
Educause HEISC-800-171 Community Group
0
The purpose of this group is to provide a forum to discuss NIST 800-171 compliance. Participants are encouraged to collaborate and share effective practices and resources that help higher education institutions prepare for and comply with the NIST 800-171 standard as it relates to Federal Student Aid (FSA), CMMC, DFARS, NIH, and NSF activities.
Running Particle-in-Cell Simulations on HPC
0
WarpX is an advanced particle-in-cell code used to model particle accelerators, which needs to be run on HPC. This website contains the tutorial on how to build WarpX on various HPC systems such as NERSC along with examples on how to set up post-processing/visualization tools for different physics cases.
ACCESS Events and Training
0
Listing of upcoming ACCESS related events and training activities.
Framework to help in scaling Machine Learning/Deep Learning/AI/NLP Models to Web Application level
0
This framework will help in scaling Machine Learning/Deep Learning/Artificial Intelligence/Natural Language Processing Models to Web Application level almost without any time.
OnShape Documentation
0
This contains documentation for getting started with using OnShape for CAD. OnShape cloud-hosted CAD software that lets you work with others like on a Google Doc, with the power and capabilities of any other software like Solidworks or Inventor.
National Public Radio (NPR)
0
Pluses and challenges of mentor selection. Offers tips for acquiring a mentor (finding, asking). And how to be a good mentee. SMART framework mentioned. Discrimination mentioned. Difference between mentor and sponsor underlined. More than one mentor encouraged. Good tips.
Introduction to Probabilistic Graphical Models
0
This website summarizes the notes of Stanford's introductory course on probabilistic graphical models.
It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.
Regular Expressions
0
- Learn Regular Expressions with simple, interactive exercises
- An online tool to learn, build, & test Regular Expressions
- An Online tool that lets you enter your own text and regular expressions to see what matches
Regular expressions (sometimes referred to as RegEx) is an incredibly powerful tool that is used to define string patterns for "find" or "find and replace" operations on strings, or for input validation. Regular Expressions are used in search engines, in search and replace dialogs of word processors and text editors, and text-processing Linux utilities such as sed and awk. They are supported in many programming languages, including Python, R, Perl, Java, and others.
NCSA HPC-Moodle
0
Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.
Optimizing Research Workflows - A Documentation of Snakemake
0
Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.
Singularity/Apptainer User Manuals
0
Singularity/Apptainer is a free and open-source container platform that allows users to build and run containers on high performance computing resources.
SingularityCE is the community edition of Singularity maintained by Sylabs, a company that also offers commercial Singularity products and services.
Apptainer is a fork of Singularity, maintained by the Linux foundation, a community of developers and users who are passionate about open source software.
marimo | a next generation python notebook
0
Introduction seminar for new reactive python notebook from marimo ambassador.
Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing
0
Humans cannot always be treated as oracles for collaborative sensing. Robots thus need to maintain beliefs over unknown world states when receiving semantic data from humans, as well as account for possible discrepancies between human-provided data and these beliefs. To this end, this paper introduces the problem of semantic data association (SDA) in relation to conventional data association problems for sensor fusion. It then, develops a novel probabilistic semantic data association (PSDA) algorithm to rigorously address SDA in general settings. Simulations of a multi-object search task show that PSDA enables robust collaborative state estimation under a wide range of conditions.
Beautiful Soup - Simple Python Web Scraping
0
This package lets you easily scrape websites and extract information based on html tags and various other metadata found in the page. It can be useful for large-scale web analysis and other tasks requiring automated data gathering.