- Version control with Git1Understand the benefits of an automated version control system and the basics of how automated version control systems work. Configure git the first time it is used on a computer and understand the meaning of the --global configuration flag. Create a local Git repository and describe the purpose of the .git directory. Go through the modify-add-commit cycle for one or more files, explain where information is stored at each stage of that cycle, and distinguish between descriptive and non-descriptive commit messages.
- Open OnDemand Documentation Repository1This is the main documentation repo for the Open OnDemand Portal which enables researchers to access HPC resources from a familiar web interface.
- GIS: Geocoding Services1Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location. There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below. For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.
- Discover Data Science0Discover Data Science is all about making connections between prospective students and educational opportunities in an exciting new, hot, and growing field – data science.
- AWS Tutorial For Beginners0An AWS Tutorial for Beginners is a course that teaches the basics of Amazon Web Services (AWS), a cloud computing platform that offers a wide range of services, including compute, storage, networking, databases, analytics, machine learning, and artificial intelligence.
- Wiki for Onboarding onto the C3DDB Cluster at MGHPCC0This is a resource for researchers and students looking to on-board onto the c3ddb cluster at MGHPCC. In the code section, there are example job submission scripts for the different queues on c3ddb.
- Paraview UArizona HPC links (advanced)0These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant ([rtdatavis.github.io](http://rtdatavis.github.io/)). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. These links are distinct from the others posted in the beginner paraview access ci links from the University of Arizona in that they are for more complex workflows. The links included explain how to use the terminal with paraview (pvpython), and the steps to leverage HPC resources for headless batch rendering. The batch rendering tutorial is significantly more complex than the others so if you find yourself stuck please post on the https://ask.cyberinfrastructure.org/ and I will try to troubleshoot with you.
- Official Documentation for PyTorch and NumPy0The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.
- ACCESS KB Guide - Expanse0Expanse at SDSC is a cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and offers Composable Systems and Cloud Bursting. This documentation describes how to use the Expanse cluster with some specific information for people with ACCESS accounts.
- ACCESS Campus Champion Example Allocation0ACCESS requests proposals to be written following NSF proposal guidelines. The link provides an example of an ACCESS proposal using an NSF LaTeX template. The request is at the DISCOVER level appropriate for Campus Champions. The file is 2 pages: the first page details the motivation, approach, and resources requested; and the second page is a 1-page bio.
- C Programming0"These notes are part of the UW Experimental College course on Introductory C Programming. They are based on notes prepared (beginning in Spring, 1995) to supplement the book The C Programming Language, by Brian Kernighan and Dennis Ritchie, or K&R as the book and its authors are affectionately known. (The second edition was published in 1988 by Prentice-Hall, ISBN 0-13-110362-8.) These notes are now (as of Winter, 1995-6) intended to be stand-alone, although the sections are still cross-referenced to those of K&R, for the reader who wants to pursue a more in-depth exposition." C is a low-level programming language that provides a deep understanding of how a computer's memory and hardware work. This knowledge can be valuable when optimizing apps for performance or when dealing with resource-constrained environments.C is often used as the foundation for creating cross-platform libraries and frameworks. Learning C can allow you to develop libraries that can be used across different platforms, including iOS, Android, and desktop environments.
- Vulkan Support Survey across Systems0It's not uncommon to see beautiful visualizations in HPC center galleries, but the majority of these are either rendered off the HPC or created using programs that run on OpenGL or custom rasterization techniques. To put it simply the next generation of graphics provided by OpenGL's successor Vulkan is strangely absent in the super computing world. The aim of this survey of available resources is to determine the systems that can support Vulkan workflows and programs. This will assist users in getting past some of the first hurdles in using Vulkan in HPC contexts.
- Neural Networks in Julia0Making a neural network has never been easier! The following link directs users to the Flux.jl package, the easiest way of programming a neural network using the Julia programming language. Julia is the fastest growing software language for AI/ML and this package provides a faster alternative to Python's TensorFlow and PyTorch with a 100% Julia native programming and GPU support.
- Raftlib: Open Source library for concurrent data processing pipelines0Raftlib is an open-source C++ Library that provides a framework for implementing parallel and concurrent data processing pipelines. It is designed to simplify the development of high-performance data processing applications by abstracting away the complexities of parallelism, concurrency, and data flow management. It enables stream/data-flow parallel computation by linking parallel compute kernels together using simple right shift operators, similar to C++ streams for string manipulation. RaftLib eliminates the need for explicit usage of traditional threading libraries such as pthreads, std::thread, or OpenMP, which can lead to non-deterministic behavior when misused.
- ACCESS Getting Started Quick-Guide0A step-by-step guide to getting your first allocation for Access computing and storage resources.
- Jetstream2 Docs Site0Jetstream2 makes cutting-edge high-performance computing and software easy to use for your research regardless of your project’s scale—even if you have limited experience with supercomputing systems.Cloud-based and on-demand, the 24/7 system includes discipline-specific apps. You can even create virtual machines that look and feel like your lab workstation or home machine, with thousands of times the computing power.
- fast.ai0Fastai offers many tools to people working with machine learning and artifical intelligence including tutorials on PyTorch in addition to their own library built on PyTorch, news articles, and other resources to dive into this realm.
- Network Science Textbook0
"Network Science" by Albert-László Barabási is a textbook that introduces the interdisciplinary field of network science. This field explores the connections and relationships between different entities, which can be anything from people in a social network to computers on the internet.
Description of the Textbook
The book is designed for a broad audience, including students and professionals in physics, computer science, engineering, economics, and social sciences. It covers a wide range of topics, from the "six degrees of separation" concept to the spread of viruses like Ebola. The textbook is structured to be accessible to both undergraduate and graduate students, with more complex mathematical details separated into "Advanced Topics" sections. It also offers extensive online resources, including films and software for network analysis.
The core idea of the book is that networks are everywhere, and understanding their structure and dynamics can provide valuable insights into a variety of complex systems. It uses real-world examples to illustrate key concepts and emphasizes the analysis of real network data.
Role in AI and Machine Learning
Network science plays a significant role in AI and machine learning by providing a framework for analyzing and understanding complex, interconnected data. Here's how it helps:
- Data Representation: Many real-world datasets can be represented as networks, such as social networks, transaction networks, and biological networks. Network science provides the tools to model and analyze this data, which can then be used to train machine learning models.
- Feature Engineering: Network properties, such as a node's centrality or the structure of its local neighborhood, can be used as features in machine learning models. This can help improve the performance of tasks like fraud detection, recommendation systems, and churn prediction.
- Graph Neural Networks (GNNs): GNNs are a class of deep learning models that are specifically designed to work with graph-structured data. They are heavily influenced by concepts from network science, such as message passing and neighborhood aggregation. GNNs have achieved state-of-the-art results on a variety of tasks, including node classification, link prediction, and graph classification.
- Understanding Complex Systems: Network science can be used to understand the behavior of complex systems, such as the spread of information or disease. This understanding can then be used to build more accurate AI and machine learning models.
Who Can Benefit and How?
A wide range of people can benefit from reading "Network Science," including:
- Data Scientists and Machine Learning Engineers: This book provides a strong foundation in network science, which is becoming increasingly important for working with graph-structured data. It can help them develop new features, build more accurate models, and gain a deeper understanding of their data.
- Computer Scientists and Software Engineers: The book can help them design more robust and efficient networked systems, such as communication networks and distributed systems.
- Social Scientists and Economists: The book can help them understand the structure and dynamics of social and economic networks, which can be used to study a variety of phenomena, such as the spread of fads and the stability of financial markets.
- Biologists and Medical Researchers: The book can help them understand the structure and function of biological networks, such as gene regulatory networks and protein-protein interaction networks. This can lead to new insights into diseases and the development of new drugs.
In short, anyone who is interested in understanding the interconnectedness of the world around them can benefit from reading "Network Science." It provides a powerful set of tools and concepts that can be applied to a wide variety of problems.
- QGIS Processing Executor0Running QGIS tools from the command line
- The Official Documentation of Pandas0Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.
- Master’s in Cybersecurity Degree Essentials0Offers comprehensive information on various master's degree options in cybersecurity, including program details, admission requirements, and career opportunities, helping students make informed decisions about pursuing an advanced degree in cybersecurity.
- Numpy - a Python Library0Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
- MATLAB bioinformatics toolbox0Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.
- How to Build a Great Relationship with a Mentor0Emphasizes benefits of being mentored. Describes how to identify and choose a mentor. Suggests a path forward. Not mentor or two-way focused.
- Spack Documentation0Spack is a package manager for supercomputers that can help administrators install scientific software and libraries for multiple complex software stacks.