Browse examples of current and complete engagements.
Prediction of Polymerization of the Yersinia Pestis Type III Secretion System
Nova Southeastern University

<p>Yersinia pestis, the bacterium that causes the bubonic plague, uses a type III secretion system (T3SS) to inject toxins into host cells. The structure of the Y. pestis T3SS needle has not been modeled using AI or cryo-EM. T3SS in homologous bacteria have been solved using cryo-EM. Previously, we created possible hexamers of the Y. pestis T3SS needle protein, YscF, using CollabFold and AlphaFold2 Colab on Google Colab in an effort to understand more about the needle structure and calcium regulation of secretion. Hexamers and mutated hexamers were designed using data from a wet lab experiment by Torruellas et. al (2005). T3SS structures in homologous organisms show a 22 or 23mer structure where the rings of hexamers interlocked in layers. When folding was attempted with more than six monomers, we observed larger single rings of monomers. This revealed the inaccuracies of these online systems. To create a more accurate complete needle structure, a different computer software capable of creating a helical polymerized needle is required. The number of atoms in the predicted final needle is very high and more than our computational infrastructure can handle. For that reason, we need the computational resources of a supercomputer. We have hypothesized two ways to direct the folding that have the potential to result in a more accurate needle structure. The first option involves fusing the current hexamer structure into one protein chain, so that the software recognizes the hexamer as one protein. This will make it easier to connect multiple hexamers together. Alternatively, or additionally the cryo-EM structures of the T3SS of Shigella flexneri and Salmonella enterica Typhimurium can be used as models to guide the construction of the Y. pestis T3SS needle. The full AlphaFold library or a program like RoseTTAFold could help us predict protein-protein interactions more accurately for large structures. Based on our needs we have identified the TAMU ACES, Rockfish and Stampede-2 as promising resources for this project. The generated model of the Y. pestis T3SS YscF needle will provide insight into a possible structure of the needle.&nbsp;</p>

Status: Recruiting
Run Markov Chain Monte Carlo (MCMC) in Parallel for Evolutionary Study
Texas Tech University

<p>My ongoing project is focused on using species trait value (as data matrices) and its corresponding phylogenetic relationship (as a distance matrix) to reconstruct the evolutionary history of the smoke-induced seed germination trait. The results of this project are expected to increase the predictability of which untested species could benefit from smoke treatment, which could promote germination success of native species in ecological restoration. This computational resources allocated for this project pull from the high-memory partition of our Ivy cluster of HPCC (Centos 8, Slurm 20.11, 1.5 TB memory/node, 20 core /node, 4 node). However, given that I have over 1300 species to analyze, using the maximum amount of resources to speed up the data analysis is a challenge for two reasons: (1) the ancestral state reconstruction (the evolutionary history of plant traits) needs to use the Markov Chain Monte Carlo (MCMC) in Bayesian statistics, which runs more than 10 million steps and, according to experienced evolutionary biologists, could take a traditional single core simulation up 6 months to run; and (2) my data contain over 1300 native species, with about 500 polymorphic points (phylogenetic uncertainty), which would need a large scale of random simulation to give statistical strength. For instance, if I use 100 simulations for each 500 uncertainty points, I would have 50,000 simulated trees. Based on my previous experience with simulations, I could design codes to parallel analyze 50,000 simulated trees but even with this parallelization the long run MCMC will still require 50000 cores to run for up to 6 months. Given this computational and evolutionary research challenge, my current work is focused on discovering a suitable parallelization methods for the MCMC steps. I hope to have some computational experts to discuss my project.</p>

Status: In Progress
AI for Business
San Diego State University

<p>The research focus is to apply the pre-training techniques of Large Language Models to the encoding process of the Code Search Project, to improve the existing model and develop a new code searching model. The assistant&nbsp;shall explore a transformer or equivalent model (such as GPT-3.5) with fine-tuning,&nbsp;which can help achieve state-of-the-art performance for NLP tasks. The research also&nbsp;aims to test and evaluate various state-of-the-art models to find the most promising&nbsp;ones.</p>

Status: In Progress
Web Deployment for Undergraduates
Southern Oregon University

<p>Issue: I am teaching a web development course (MERN stack) and do not have a deployment set up. I would like to use Jetstream2 because they offer instances with IP addresses. I have enough allocations for 20 students to each have their own instance. I was able to build a MERN stack and deploy a static react site, but the backend MongoDB piece, while connected, did not serve up data. I cannot figure out why. Helpdesk at Jetstream suggest trying a docker container, which makes sense but obviously if I can’t get it working one way I’ll need help.</p>

Status: In Progress
GPU-accelerated Ice Sheet Flow Modeling
University of North Dakota

<p>Sea levels are rising (3.7 mm/year and increasing!)! The primary contributor to rising sea levels is enhanced polar ice discharge due to climate change. However, their dynamic response to climate change remains a fundamental uncertainty in future projections. Computational cost limits the simulation time on which models can run to narrow the uncertainty in future sea level rise predictions. The project's overarching goal is to leverage GPU hardware capabilities to significantly alleviate the computational cost and narrow the uncertainty in future sea level rise predictions. Solving time-independent stress balance equations to predict ice velocity or flow is the most computationally expensive part of ice-sheet simulations in terms of computer memory and execution time. The PI developed a preliminary ice-sheet flow GPU implementation for real-world glaciers. This project aims to investigate the GPU implementation further, identify bottlenecks and implement changes to justify it in the price to performance metrics to a "standard" CPU implementation. In addition, develop a performance portable hardware (or architecture) agnostic implementation.</p>

Status: Finishing Up
Adapting a GEOspatial Agent-based model for Covid Transmission (GeoACT) for general use
University of California San Diego

<p>GeoACT (GEOspatial Agent-based model for Covid Transmission) is a designed to simulate a range of intervention scenarios to help schools evaluate their COVID-19 plans to prevent super-spreader events and outbreaks. It consists of several modules, which compute infection risks in classrooms and on school buses, given specific classroom layouts, student population, and school activities. The first version of the model was deployed on the Expanse (and earlier, COMET) resource at SDSC and accessed via the Apache Airavata portal (geoact.org). The second version is a rewrite of the model which makes it easier to adjust to new strains, vaccines and boosters, and include detailed user-defined school schedules, school floor plans, and local community transmission rates. This version is nearing completion. We’ll use Expanse to run additional scenarios using the enhanced model and the newly added meta-analysis module. The current goal is to make the model more general so that it can be used for other health emergencies. GeoACT has been in the news, e.g.&nbsp;<a href="https://ucsdnews.ucsd.edu/feature/uc-san-diego-data-science-undergrads-… San Diego Data Science Undergrads Help Keep K-12 Students COVID-Safe</a>, and&nbsp;<a href="https://www.hpcwire.com/2022/01/13/sdsc-supercomputers-helped-enable-sa… Supercomputers Helped Enable Safer School Reopenings</a>&nbsp; (HPCWire 2022 Editors' Choice Award)</p>

Status: Finishing Up
Exploring Small Metal Doped Magnesium Hydride Clusters for Hydrogen Storage Materials
Murray State University

<p>Solid metal hydrides are an attractive candidates for hydrogen storage materials. Magnesium has the benefit of being inexpensive, abundant, and non-toxic. However, the application of magnesium hydrides is limited by the hydrogen sorption kinetics. Doping magnesium hydrides with transition metal atoms improves this downfall, but much is still unknown about the process or the best choice of dopant type and concentration.</p><p>In this position, the student will study magnesium hydride clusters doped with early transition metals (e.g., Ti and V) as model systems for real world hydrogen storage materials.&nbsp; Specifically, we will search each cluster's potential energy surface for local and global minima and explore the relationship of cluster size and dopant concentration on different properties.&nbsp; The results from this investigation will then be compared with related cluster systems.</p><p>The student will begin by performing a literature search for this system, which will allow the student to pick an appropriate level of theory to conduct this investigation.&nbsp; This level will be chosen by performing calculations on the MgM, MgH, and MH (M = Ti and V) diatomic species (and select other sizes based on the results of the literature search) and comparing the predictions with experimentally determined spectroscopic data (e.g., bond length, stretching frequency, etc.).&nbsp; The student will then perform theoretical chemistry calculations using the Gaussian 16 and NBO 7 programs on the EXPANSE cluster housed at the San Diego Supercomputing Center (SDSC) through ACCESS allocation CHE-130094.&nbsp; First, this student will generate candidate structures for each cluster size and composition using two global optimization procedures.&nbsp; One program utilizes the artificial bee colony algorithm, whereas the second basin hoping program is written and compiled in-house using Fortran code.&nbsp;Additional structures will be generated by hand from our prior knowledge.&nbsp; All candidate structures will then be further optimized by the student at the appropriate level determined at the start of the semester.&nbsp; Higher level (e.g., double hybrid density functional theory) calculations will also be performed as further confirmation of the predicted results.&nbsp;Various results will be visualized with the Avogadro, Gabedit, and Gaussview programs on local machines.&nbsp;</p>

Status: Finishing Up
Investigation of robustness of state of the art methods for anxiety detection in real-world conditions
University of Illinois at Urbana-Champaign

<p>I am new to ACCESS. I have a little bit of past experience running code on NCSA's Blue Waters. As a self-taught programmer, it would be interesting to learn from an experienced mentor.&nbsp;</p><p>Here's an overview of my project:</p><p>Anxiety detection is topic that is actively studied but struggles to generalize and perform outside of controlled lab environments. I propose to critically analyze state of the art detection methods to quantitatively quantify failure modes of existing applied machine learning models and introduce methods to robustify real-world challenges. The aim is to start the study by performing sensitivity analysis of existing best-performing models, then testing existing hypothesis of real-world failure of these models. We predict that this will lead us to understand more deeply why models fail and use explainability to design better in-lab experimental protocols and machine learning models that can perform better in real-world scenarios. Findings will dictate future directions that may include improving personalized health detection, careful design of experimental protocols that empower transfer learning to expand on existing reach of anxiety detection models, use explainability techniques to inform better sensing methods and hardware, and other interesting future directions.</p>

Status: Complete