R for Data Science
0
R for Data Science is a comprehensive resource for individuals looking to harness the power of the R programming language for data analysis, visualization, and statistical modeling. Whether you're a beginner or an experienced data scientist, this guide will help you unlock the full potential of R in the realm of data science.
Python Data and Viz Training (CCEP Program)
0
RMACC Website
0
Rocky Mountain Advanced Computing Consortium Website
A survey on datasets for fairness-aware machine learning
0
The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.
Weka
0
Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.
Python Tools for Data Science
0
Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2
GIS: Projections and their distortions
0
In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.
Expanse Home Page
0
Expanse at SDSC is a cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and offers Composable Systems and Cloud Bursting.
Representation Learning in Deep Learning
0
Representation learning is a fundamental concept in machine learning and artificial intelligence, particularly in the field of deep learning. At its core, representation learning involves the process of transforming raw data into a form that is more suitable for a specific task or learning objective. This transformation aims to extract meaningful and informative features or representations from the data, which can then be used for various tasks like classification, clustering, regression, and more.
Singularity/Apptainer User Manuals
0
Singularity/Apptainer is a free and open-source container platform that allows users to build and run containers on high performance computing resources.
SingularityCE is the community edition of Singularity maintained by Sylabs, a company that also offers commercial Singularity products and services.
Apptainer is a fork of Singularity, maintained by the Linux foundation, a community of developers and users who are passionate about open source software.
Fine-tuning LLMs with PEFT and LoRA
0
As LLMs get larger fine-tuning to the full extent can become difficult to train on consumer hardware. Storing and deploying these tuned models can also be quite expensive and difficult to store. With PEFT (parameter -efficent fine tuning), it approaches fine-tune on a smaller scale of model parameters while freezing most parameters of the pretrained LLMs. Basically it is providing full performance that which is similar if not better than full fine tuning while only having a small number of trainable parameters. This source explains that as well as going over LORA diagrams and a code walk through.
UCLA Extended Reality (XR) collaboration resources and Workshop
0
Comprehensive Extended Reality (XR) collaboration resources for building a high performance extended reality (XR), augmented reality (AR), virtual reality (VR) and mixed reality campus teams. The tags set are a small subset of the the topics covered.
Astronomy data analysis with astropy
0
Astropy is a community-driven package that offers core functionalities needed for astrophysical computations and data analysis. From coordinate transformations to time and date handling, unit conversions, and cosmological calculations, Astropy ensures that astronomers can focus on their research without getting bogged down by the intricacies of programming. This guide walks you through practical usage of astropy from CCD data reduction to computing galactic orbits of stars.
Automated Machine Learning Book
0
The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.
Long Tales of Science: A podcast about women in HPC
0
A series of interviews with women in the HPC community
Scipy Lecture Notes
0
Comprehensive tutorials and lecture notes covering various aspects of scientific computing using Python and Scipy.
Implementing Markov Processes with Julia
0
The following link provides an easy method of implementing Markov Decision Processes (MDP) in the Julia computing language. MDPs are a class of algorithms designed to handle stochastic situations where the actor has some level of control. For example, used at a low level, MDPs can be used to control an inverted pendulum, but applied in higher level decision making the can also decide when to take evasive action in air traffic management. MDPs can also be extended to the partially observable domain to form the Partially Observable Markov Decision Process (POMDP). This link contains a wealth of information to show one can easily implement basic POMDP and MDP algorithms and apply well known online and offline solvers.
CHARMM Links to Install, Run, and Troubleshoot MD Simulations
0
CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a widely distributed molecular simulation program with a broad array of applications. CHARMM has the capabilities to setup and run simulations on both biological and materials systems, contains a comprehensive set of analysis and tools, and has high performance on a variety of platforms. Here you will find links to the CHARMM website, forum, and registration/download page.
Pandas - Python
0
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.
ACCESS HPC Workshop Series
0
Monthly workshops sponsored by ACCESS on a variety of HPC topics organized by Pittsburgh Supercomputing Center (PSC). Each workshop will be telecast to multiple satellite sites and workshop materials are archived.
Managing and Optimizing Your Jobs on HPC
0
An overview of tools and methods to manage and optimize jobs and HPC workflows
ConnectCI
0
Connect.Cybinfrastructure is a family of portals, each representing a program that is serving a segment of the research computing and data community. Each portal provides program-specific information, as well a custom "view" into a common database. The portal was originally developed to support project workflows and a knowledge base of self service learning resources for the Northeast Cyberteam. Subsequently, it was expanded to provide support to multiple cyberteams and other research computing communities of practice. We welcome additional communities, please contact us if you are interested in participating. Central to the Portal is an extensive and ever-evolving tagging infrastructure which informs every aspect of the Portal. The tag taxonomy was initially developed by the Northeast Cyberteam to categorize subject matter relevant to practitioners of Research Computing Facilitation and is ever changing due to the frequent introduction of new technology in domains that characterize the field of research computing.
Contributing cycles to the Open Science Grid
0
MATLAB with other Programming Languages
0
MATLAB is a really useful tool for data analysis among other computational work. This tutorial takes you through using MATLAB with other programming languages including C, C++, Fortran, Java, and Python.
NITRC
0
The Neuroimaging Tools and Resources Collaboratory (NITRC) is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.