Automate your workflow

ACCESS Pegasus

About ACCESS Pegasus

Run Jobs and Workflows on ACCESS Resources from a Single Entry Point

Pegasus workflow
  • Get started quickly with sample workflows using a Python API
  • Construct, submit, and monitor workflows from a Jupyter Notebook
  • Track workflows and debug them when failures occur
  • Perform simple interactions on the command line
Pegasus logo

We are continually developing ACCESS Pegasus. Currently you can run high-throughput workflows of jobs on a single compute node(single core, multicore, or single node MPI jobs). For workflows with MPI jobs, reach out to us.

Powerful Features

Data Management icon
Data Management
Pegasus handles data transfers, input data selection and output registration by adding them as auxiliary jobs to the workflow.
Error Recovery icon
Error Recovery
Pegasus handles errors by retrying tasks, workflow-level checkpointing, re-mapping and alternative data sources for data staging.
Provenance Tracking icon
Provenance Tracking
Pegasus allows users to trace the history of a workflow and its outputs, including information about data sources and software used.
Heterogenous Environments icon
Heterogeneous Environments
Pegasus can execute workflows in a variety of distributed computing environments such as HPC clusters, Amazon EC2, Google Cloud, Open Science Grid.

Workflows

Why Use Workflows

Reproducibility icon
Reproducibility
Scientific workflows allow researchers to document and reproduce their analyses, ensuring their validity.
Automation icon
Automation
Workflows automate repetitive and time-consuming tasks, reducing the workload of researchers.
Scalability icon
Scalability
Workflows scale to handle large data sets and complex analyses, enabling scientists for bigger research problems.
Reusability icon
Reusability
Workflows can be used to build libraries of reusable code and tools that can adapted by other researchers.

View Workflow Examples

We have Jupyter based training notebooks available that walk you through creating simple diamond workflow (and more complex ones) using the Pegasus Python API and executing them on ACCESS resources.

workflow key

Single job workflow

Single Job

Example

Independent jobs workflow

Set of Independent jobs

Example

Split/merge workflow

Split/Merge Workflow

Example

 

Get Started with ACCESS Pegasus

To get started you only need some Python/Jupyter Notebook knowledge, some experience using a terminal window, and an ACCESS allocation.   
Find out about getting an ACCESS Allocation.

Setup

The first time you logon, you need to specify what allocations you have. Logon with your ACCESS ID and use Open OnDemand to get setup.

Login to ACCESS

Single Sign On with your ACCESS ID

All registered users with an active allocation automatically have an ACCESS Pegasus account.

SDSC Login Screenshot

Configure resources once

Use Open OnDemand instance at resource providers to install SSH keys and determine location allocation ID.

Run Workflows on ACCESS

Create the workflow
Workflow step 1

1. Create the workflow

  • Use Pegasus API in Jupyter Notebook or use our examples
  • Submit your workflow for execution
Provision compute resources
Workflow step 2

2. Provision compute resources

  • Use HTCondor Annex tool to provision pilot jobs on your allocated ACCESS resources
Monitor the execution
Workflow step 3

3. Monitor the execution

  • Follow the workflow execution within the notebook or in the terminal
  • You can see what resources you brought in using the terminal

 

Support

Video icon
Tutorial Video
A step by step tutorial video on how to use ACCESS Pegasus to run workflows.
Watch
Documentation icon
Documentation
Detailed documentation about setting up ACCESS Pegasus and using it.
Read
More help icon
More Help
Links to more help and places to ask questions related to ACCESS Pegasus.
Get Help
Pegasus icon
Pegasus Affinity Group
The hub for the ACCESS Pegasus community with news, Slack, email list and Github.
Join us