Managing Python Packages on an HPC Cluster
1
This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.
Useful R Packages for Data Science and Statistics
1
This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.
Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)
0
A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code
Python Data and Viz Training (CCEP Program)
0
HPCwire
0
HPCwire is a prominent news and information source for the HPC community. Their website offers articles, analysis, and reports on HPC technologies, applications, and industry trends.
R for Data Science
0
R for Data Science is a comprehensive resource for individuals looking to harness the power of the R programming language for data analysis, visualization, and statistical modeling. Whether you're a beginner or an experienced data scientist, this guide will help you unlock the full potential of R in the realm of data science.
Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs
0
Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively.
There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.
Recommended Libraries for Cyberinfrastructure Users Developing Jupyter Notebooks
0
This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!
Samtools Documentation
0
Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.
How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev
0
A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.
Weka
0
Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.
Python Tools for Data Science
0
Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2
Optimizing Research Workflows - A Documentation of Snakemake
0
Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.
A survey on datasets for fairness-aware machine learning
0
The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.
Numpy - a Python Library
0
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
Research Software Engineering Training Materials
0
An ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.
Fairness and Machine Learning
0
The "Fairness and Machine Learning" book offers a rigorous exploration of fairness in ML and is suitable for researchers, practitioners, and anyone interested in understanding the complexities and implications of fairness in machine learning.