Exploring Small Metal Doped Magnesium Hydride Clusters for Hydrogen Storage Materials
Solid metal hydrides are an attractive candidates for hydrogen storage materials. Magnesium has the benefit of being inexpensive, abundant, and non-toxic. However, the application of magnesium hydrides is limited by the hydrogen sorption kinetics. Doping magnesium hydrides with transition metal atoms improves this downfall, but much is still unknown about the process or the best choice of dopant type and concentration.
In this position, the student will study magnesium hydride clusters doped with early transition metals (e.g., Ti and V) as model systems for real world hydrogen storage materials. Specifically, we will search each cluster's potential energy surface for local and global minima and explore the relationship of cluster size and dopant concentration on different properties. The results from this investigation will then be compared with related cluster systems.
The student will begin by performing a literature search for this system, which will allow the student to pick an appropriate level of theory to conduct this investigation. This level will be chosen by performing calculations on the MgM, MgH, and MH (M = Ti and V) diatomic species (and select other sizes based on the results of the literature search) and comparing the predictions with experimentally determined spectroscopic data (e.g., bond length, stretching frequency, etc.). The student will then perform theoretical chemistry calculations using the Gaussian 16 and NBO 7 programs on the EXPANSE cluster housed at the San Diego Supercomputing Center (SDSC) through ACCESS allocation CHE-130094. First, this student will generate candidate structures for each cluster size and composition using two global optimization procedures. One program utilizes the artificial bee colony algorithm, whereas the second basin hoping program is written and compiled in-house using Fortran code. Additional structures will be generated by hand from our prior knowledge. All candidate structures will then be further optimized by the student at the appropriate level determined at the start of the semester. Higher level (e.g., double hybrid density functional theory) calculations will also be performed as further confirmation of the predicted results. Various results will be visualized with the Avogadro, Gabedit, and Gaussview programs on local machines.
It is anticipated the student working on this project will have already been trained in the computer models and completed safety training and requirements for work with PI Lyon. Regardless, a refresher training will be performed by the mentor at the start of the semester to quickly catch the student up to speed. This training may include the “Expanse 101 Webinar” recorded at the San Diego Supercomputing Center on 10/8/2020 and the Cornell University’s “Introduction to Linux Virtual Workshop,” both of which are available online. Any additional new or refresher training needed by the student throughout the semester will be performed by the mentor. Although scheduled weekly meetings will occur between the student and mentor where progress and problems encountered will be discussed, in person interactions will likely occur more frequently every day or two throughout the semester as has been done in the lab in the past. Dr. Lyon has previously mentored several undergraduate student projects in similar techniques.
Analysis of Indian Court Data with ChatGPT
India is facing an air pollution crisis. It houses 11 of the 15 of the world’s most polluted cities (World Air Quality Report, 2021). This has high costs. Air pollution accounted for 1.6 million premature deaths and a loss of 0.3-0.9 percent of GDP in a single year. The failure of executive and legislative policies has pushed citizens to approach India’s courts for solutions. This project studies the overall impact of judicial policies on air quality. We take a novel approach to estimate the impact of environmental litigation on environmental as well as human capital outcomes. We begin by constructing a unique database of all cases that pertain to air pollution that have been heard in the higher judiciary of India for the past 30 years. This unique data set consists of approximately 7,500 court orders. For 2,500 cases that directly cited the Air Act, we rely on manual reading, interpretation and categorization by a team of law students. For these cases, we are also using ChatGPT to determine whether a particular judgment is likely to have a positive impact on the environment. In addition to the environmental impact of cases, we combine data on judgements with additional data on the characteristics of judges as well as air pollution levels. In the coming months, we will work to compare the analysis of the court cases by ChatGPT with that of the human coders. We will identify similarities and discrepancies. With this in hand, we will examine the impact of “pro-green” cases on actual environmental outcomes. Given the significance of these cases in Indian law and society, we seek to prepare a comprehensive database that can be made available to other researchers in this area. Given that there are 84 million court cases that are now electronically available from the court system of India, our methodology presents a unique opportunity to leverage new tools of AI to understand how Indian courts dispense justice and how the real-life impacts of these decisions can be studied by researchers.
Investigation of robustness of state of the art methods for anxiety detection in real-world conditions
I am new to ACCESS. I have a little bit of past experience running code on NCSA's Blue Waters. As a self-taught programmer, it would be interesting to learn from an experienced mentor.
Here's an overview of my project:
Anxiety detection is topic that is actively studied but struggles to generalize and perform outside of controlled lab environments. I propose to critically analyze state of the art detection methods to quantitatively quantify failure modes of existing applied machine learning models and introduce methods to robustify real-world challenges. The aim is to start the study by performing sensitivity analysis of existing best-performing models, then testing existing hypothesis of real-world failure of these models. We predict that this will lead us to understand more deeply why models fail and use explainability to design better in-lab experimental protocols and machine learning models that can perform better in real-world scenarios. Findings will dictate future directions that may include improving personalized health detection, careful design of experimental protocols that empower transfer learning to expand on existing reach of anxiety detection models, use explainability techniques to inform better sensing methods and hardware, and other interesting future directions.
GPU-accelerated ice sheet flow modeling
Sea levels are rising (3.7 mm/year and increasing!)! The primary contributor to rising sea levels is enhanced polar ice discharge due to climate change. However, their dynamic response to climate change remains a fundamental uncertainty in future projections. Computational cost limits the simulation time on which models can run to narrow the uncertainty in future sea level rise predictions. The project's overarching goal is to leverage GPU hardware capabilities to significantly alleviate the computational cost and narrow the uncertainty in future sea level rise predictions. Solving time-independent stress balance equations to predict ice velocity or flow is the most computationally expensive part of ice-sheet simulations in terms of computer memory and execution time. The PI developed a preliminary ice-sheet flow GPU implementation for real-world glaciers. This project aims to investigate the GPU implementation further, identify bottlenecks and implement changes to justify it in the price to performance metrics to a "standard" CPU implementation. In addition, develop a performance portable hardware (or architecture) agnostic implementation.
Run Markov Chain Monte Carlo (MCMC) in Parallel for Evolutionary Study
My ongoing project is focused on using species trait value (as data matrices) and its corresponding phylogenetic relationship (as a distance matrix) to reconstruct the evolutionary history of the smoke-induced seed germination trait. The results of this project are expected to increase the predictability of which untested species could benefit from smoke treatment, which could promote germination success of native species in ecological restoration. This computational resources allocated for this project pull from the high-memory partition of our Ivy cluster of HPCC (Centos 8, Slurm 20.11, 1.5 TB memory/node, 20 core /node, 4 node). However, given that I have over 1300 species to analyze, using the maximum amount of resources to speed up the data analysis is a challenge for two reasons: (1) the ancestral state reconstruction (the evolutionary history of plant traits) needs to use the Markov Chain Monte Carlo (MCMC) in Bayesian statistics, which runs more than 10 million steps and, according to experienced evolutionary biologists, could take a traditional single core simulation up 6 months to run; and (2) my data contain over 1300 native species, with about 500 polymorphic points (phylogenetic uncertainty), which would need a large scale of random simulation to give statistical strength. For instance, if I use 100 simulations for each 500 uncertainty points, I would have 50,000 simulated trees. Based on my previous experience with simulations, I could design codes to parallel analyze 50,000 simulated trees but even with this parallelization the long run MCMC will still require 50000 cores to run for up to 6 months. Given this computational and evolutionary research challenge, my current work is focused on discovering a suitable parallelization methods for the MCMC steps. I hope to have some computational experts to discuss my project.
Adapting a GEOspatial Agent-based model for Covid Transmission (GeoACT) for general use
GeoACT (GEOspatial Agent-based model for Covid Transmission) is a designed to simulate a range of intervention scenarios to help schools evaluate their COVID-19 plans to prevent super-spreader events and outbreaks. It consists of several modules, which compute infection risks in classrooms and on school buses, given specific classroom layouts, student population, and school activities. The first version of the model was deployed on the Expanse (and earlier, COMET) resource at SDSC and accessed via the Apache Airavata portal (geoact.org). The second version is a rewrite of the model which makes it easier to adjust to new strains, vaccines and boosters, and include detailed user-defined school schedules, school floor plans, and local community transmission rates. This version is nearing completion. We’ll use Expanse to run additional scenarios using the enhanced model and the newly added meta-analysis module. The current goal is to make the model more general so that it can be used for other health emergencies. GeoACT has been in the news, e.g. https://ucsdnews.ucsd.edu/feature/uc-san-diego-data-science-undergrads-help-keep-k-12-students-covid-safe, and https://www.hpcwire.com/2022/01/13/sdsc-supercomputers-helped-enable-safer-school-reopenings/ (HPCWire 2022 Editors' Choice Award)