MATCH Engagements

MATCHPlus Engagements

Analysis of Indian Court Data with ChatGPT
Shareen Joshi
Georgetown University
Status: Reviewing Applicants

India is facing an air pollution crisis. It houses 11 of the 15 of the world’s most polluted cities (World Air Quality Report, 2021). This has high costs. Air pollution accounted for 1.6 million premature deaths and a loss of 0.3-0.9 percent of GDP in a single year. The failure of executive and legislative policies has pushed citizens to approach India’s courts for solutions. This project studies the overall impact of judicial policies on air quality. We take a novel approach to estimate the impact of environmental litigation on environmental as well as human capital outcomes. We begin by constructing a unique database of all cases that pertain to air pollution that have been heard in the higher judiciary of India for the past 30 years. This unique data set consists of approximately 7,500 court orders. For 2,500 cases that directly cited the Air Act, we rely on manual reading, interpretation and categorization by a team of law students. For these cases, we are also using ChatGPT to determine whether a particular judgment is likely to have a positive impact on the environment. In addition to the environmental impact of cases, we combine data on judgements with additional data on the characteristics of judges as well as air pollution levels. In the coming months, we will work to compare the analysis of the court cases by ChatGPT with that of the human coders. We will identify similarities and discrepancies. With this in hand, we will examine the impact of “pro-green” cases on actual environmental outcomes. Given the significance of these cases in Indian law and society, we seek to prepare a comprehensive database that can be made available to other researchers in this area. Given that there are 84 million court cases that are now electronically available from the court system of India, our methodology presents a unique opportunity to leverage new tools of AI to understand how Indian courts dispense justice and how the real-life impacts of these decisions can be studied by researchers.

Investigation of robustness of state of the art methods for anxiety detection in real-world conditions
Abdulrahman Alkurdi
University of Illinois at Urbana-Champaign
Status: In Progress

I am new to ACCESS. I have a little bit of past experience running code on NCSA's Blue Waters. As a self-taught programmer, it would be interesting to learn from an experienced mentor. 

Here's an overview of my project:

Anxiety detection is topic that is actively studied but struggles to generalize and perform outside of controlled lab environments. I propose to critically analyze state of the art detection methods to quantitatively quantify failure modes of existing applied machine learning models and introduce methods to robustify real-world challenges. The aim is to start the study by performing sensitivity analysis of existing best-performing models, then testing existing hypothesis of real-world failure of these models. We predict that this will lead us to understand more deeply why models fail and use explainability to design better in-lab experimental protocols and machine learning models that can perform better in real-world scenarios. Findings will dictate future directions that may include improving personalized health detection, careful design of experimental protocols that empower transfer learning to expand on existing reach of anxiety detection models, use explainability techniques to inform better sensing methods and hardware, and other interesting future directions.

GPU-accelerated ice sheet flow modeling
Anjali Sandip
University of North Dakota
Status: In Progress

Sea levels are rising (3.7 mm/year and increasing!)! The primary contributor to rising sea levels is enhanced polar ice discharge due to climate change. However, their dynamic response to climate change remains a fundamental uncertainty in future projections. Computational cost limits the simulation time on which models can run to narrow the uncertainty in future sea level rise predictions. The project's overarching goal is to leverage GPU hardware capabilities to significantly alleviate the computational cost and narrow the uncertainty in future sea level rise predictions. Solving time-independent stress balance equations to predict ice velocity or flow is the most computationally expensive part of ice-sheet simulations in terms of computer memory and execution time. The PI developed a preliminary ice-sheet flow GPU implementation for real-world glaciers. This project aims to investigate the GPU implementation further, identify bottlenecks and implement changes to justify it in the price to performance metrics to a "standard" CPU implementation. In addition, develop a performance portable hardware (or architecture) agnostic implementation.

Run Markov Chain Monte Carlo (MCMC) in Parallel for Evolutionary Study
Yanni Chen
Texas Tech University
Status: In Progress

My ongoing project is focused on using species trait value (as data matrices) and its corresponding phylogenetic relationship (as a distance matrix) to reconstruct the evolutionary history of the smoke-induced seed germination trait. The results of this project are expected to increase the predictability of which untested species could benefit from smoke treatment, which could promote germination success of native species in ecological restoration. This computational resources allocated for this project pull from the high-memory partition of our Ivy cluster of HPCC (Centos 8, Slurm 20.11, 1.5 TB memory/node, 20 core /node, 4 node). However, given that I have over 1300 species to analyze, using the maximum amount of resources to speed up the data analysis is a challenge for two reasons: (1) the ancestral state reconstruction (the evolutionary history of plant traits) needs to use the Markov Chain Monte Carlo (MCMC) in Bayesian statistics, which runs more than 10 million steps and, according to experienced evolutionary biologists, could take a traditional single core simulation up 6 months to run; and (2) my data contain over 1300 native species, with about 500 polymorphic points (phylogenetic uncertainty), which would need a large scale of random simulation to give statistical strength. For instance, if I use 100 simulations for each 500 uncertainty points, I would have 50,000 simulated trees. Based on my previous experience with simulations, I could design codes to parallel analyze 50,000 simulated trees but even with this parallelization the long run MCMC will still require 50000 cores to run for up to 6 months. Given this computational and evolutionary research challenge, my current work is focused on discovering a suitable parallelization methods for the MCMC steps. I hope to have some computational experts to discuss my project.

Adapting a GEOspatial Agent-based model for Covid Transmission (GeoACT) for general use
Ilya Zaslavsky
University of California San Diego
Status: In Progress

GeoACT (GEOspatial Agent-based model for Covid Transmission) is a designed to simulate a range of intervention scenarios to help schools evaluate their COVID-19 plans to prevent super-spreader events and outbreaks. It consists of several modules, which compute infection risks in classrooms and on school buses, given specific classroom layouts, student population, and school activities. The first version of the model was deployed on the Expanse (and earlier, COMET) resource at SDSC and accessed via the Apache Airavata portal ( The second version is a rewrite of the model which makes it easier to adjust to new strains, vaccines and boosters, and include detailed user-defined school schedules, school floor plans, and local community transmission rates. This version is nearing completion. We’ll use Expanse to run additional scenarios using the enhanced model and the newly added meta-analysis module. The current goal is to make the model more general so that it can be used for other health emergencies. GeoACT has been in the news, e.g., and  (HPCWire 2022 Editors' Choice Award)

Sample Engagements from the Northeast and CAREERS Cyberteams

High Performance Computing vs Quantum Computing for Neural Networks supporting Artificial Intelligence
Pace University
Status: Complete

A personalized learning system that adapts to learners' interests, needs, prior knowledge, and available resources is possible with artificial intelligence (AI) that utilizes natural language processing in neural networks. These deep learning neural networks can run on high performance computers (HPC) or on quantum computers (QC). Both HPC and QC are emergent technologies. Understanding both systems well enough to select which is more effective for a deep learning AI program, and show that understanding through example, is the ultimate goal of this project. The entry to learning technologies such as HPC and QC is narrow at present because it relies on classical education methods and mentoring. The gap between the knowledge workers needed, which is in high demand, and those with the expertise to teach, which is being achieved at a much slower rate, is widening. Here, an AI cognitive agent, trained via deep learning neural networks, can help in emergent technology subjects by assisting the instructor-learner pair with adaptive wisdom. We are building the foundations for this AI cognitive agent in this project.

The role of the student facilitator will involve optimizing a deep learning neural network, comparing and contrasting with the newest technologies, such as a quantum computer (and/or a quantum computer simulator) and a high performance computer and showing the efficiency of the different computing approaches. The student facilitator will perform these tasks at the rate described in the proposal. Milestone work will be displayed and shared publicly via posting to the Jupyter Notebooks on Google Colab and linked to regular Github uploads.

Developing Computational Labs for Upper Level Physical Chemistry II Course
Bridgewater State University
Status: Complete

Out of all the upper level chemistry courses, physical chemistry is the only course that provides an in-depth insight into the fundamental principles underpinning the concepts taught in various sub-disciplines of chemistry. Further, physical chemistry provides a connection between microscopic and macroscopic worlds of chemistry through mathematical models and experimental methods to test the validity of those models. Therefore, computational techniques are a perfect vehicle to teach content of physical chemistry course to undergraduate students. Additionally, American Chemical Society recommends computational chemistry to be incorporated into undergraduate chemistry curriculum. At Bridgewater State University (BSU) physical chemistry is a two-semester course referred to as 'physical chemistry I' and 'physical chemistry II'. While the overarching goal is to develop computational experiments (referred to as 'dry-labs'), project proposed here focuses on designing and developing dry labs for 'Physical Chemistry II' course at BSU. The inherently theoretical nature of this course along with its connection to wide range of spectroscopic techniques commonly used by chemists and physicists makes this course a perfect choice for assessing BSU students' reception to the idea of dry labs. It should be noted that there are no computational experiments in the current physical chemistry curriculum (both I and II) at BSU. The proposed project focuses on developing 4 - 6 computational experiments to be introduced (in spring 2018) as either stand-alone dry-lab experiments or accompany currently existing experiments. These dry labs will be developed on Gaussian 09 platform, which is currently installed on C3DDB server at MGHPCC. Finally, I also expect to make these experiments available to other New England instructors teaching physical chemistry II or equivalent course interested in incorporating computational chemistry into their curriculum.

UVM Art and AI Initiative
University of Vermont
Status: Complete

The UVM Art and AI Initiative is exploring approaches to artistic image production, comparing the results of StyleGAN and Genetic Algorithms*. More broadly, the project explores emerging artistic practices with Machine Learning and AI while referencing an artistic lineage to the artists Wassily Kandinsky, Jonn Cage and Yoko Ono; these artists employ(ed) instructions and systems in their non-digital artworks. Kandinsky distinguished systems and developed a science of aesthetics with the basic elements of point, line and plane; Cage used the oracle 'I Ching' like a computer to inform his compositional decisions; Ono writes poetic scores that turn her audience into active participants when they follow a series of imaginative instructions. Through this ongoing research and practice, we intend to join the larger conversation about art and A.I and design new curriculum for UVM undergraduate students.

This work began in February 2020 and is led by Jennifer Karson of UVM’s Department of Art and Art History and the CEMS UVM FabLab. The team has included three UVM students: two graduate students in data science and one undergraduate mechanical engineering student. The team currently uses RunwayML for the StyleGAN experiments and Processing, an open-source language and development environment built on top of the Java programming language, for Genetic Algorithms.

Additional summer funding ($2,000) is sought for one of the UVM Art and A.I. Initiative student coders. The funding will assist the team in reaching a short-term goal to present initial findings this July at Alife 2020 Montreal; a longer-term goal is to create an art installation for the UVM Fleming Museum of Art in the spring of 2021. This is a unique opportunity to exhibit as part of the statewide project 2020 Vision: Seeing the World through Technology and alongside the work of internationally renowned computer artist and co-founder of the Processing programming language Casey Reas.

Milestone 1:

Genetic Algorithms: Develop successful genetic algorithm code that meets compositional standard (color, architecture, appropriate datasets) while creating new compositions from the elements of existing hand-drawn compositions. The program should output image files that can be stored and printed at high resolutions on paper.

StyleGAN: Transition from RunwayML to coding in Python and employing VACC computer cluster. The process should output image files that can be stored and printed at high resolutions and on paper to be exhibited.

Milestone 2:

Genetic Algorithms: Create an interactive version of the program that allows for audience participation; can be exhibited in a museum gallery and online.


StyleGAN: Develop video that can be exhibited in museum gallery and online.

*Our Genetic Algorithm base code was developed by Daniel Shiffman