MATCH Engagements

MATCHPlus Engagements

Transient cooling of composite spherical moving droplet at high temperature with phase change and non-homogeneous boundary conditions
Western New England University

To objective of this project is to develop a model that can be used to facilitate the development of a process for industrial-scale production of High Temperature Phase Change Materials (PCM). The numerical model will first be validated with simple boundary conditions with which analytical solution exists. Then it will be used to predict the cooling curve of the PCM droplets with different process attributes. 

  • bash
  • batch-jobs
  • job-sizing
  • parameter-sweeps
  • performance-tuning
Using station data and downscaled reanalysis to assess the occurrence of extreme weather
University of Maine, Augusta

Meteorological observations across North America and Europe suggest a significant increase in the frequency and intensity of extreme weather (heat waves, cold waves, precipitation events) coincident with satellite-measured major decline of Arctic sea ice over the past decade. This project will assess the occurrence and impact of extreme weather events across Northern New England using both station data and climate reanalysis models. Weather and climate are critically important across Northeast New England, owing to the heavy reliance of natural resources for its economy. In particular this project involves mining multi-terabyte databases and model outputs to visualize data in a variety of formats.

  • data-management
Parallel computing for interactions between fluids and flexible structures with application to suspended longline aquaculture farms
University of Maine, Augusta

The structural dynamics of the aquaculture farms in unsteady flow are essential to assess the performance and resilience of aquaculture farms in environmental change. Moreover, the feedback of the aquaculture farms to the flow is significant for the environment, ecology, and coastal management, such as hydrodynamics impacts, habitat resilience, nutrient transportation, wave attenuation, coastal erosion control, etc. The computational fluid dynamics (CFD) method is used to analyze the interaction between aquaculture farms and the flow. The longline aquaculture farms such as kelp farms and mussel farms are consisting of multiple flexible structures such as mussel droppers and kelp blades. Considering hundreds or thousands of large deformed structures in the fluid-structure interaction (FSI) computing is time-consuming. Therefore, computer science research and parallel computing implementation are essential to make progress on this project. The computer science aspects we initially envision are converting the FSI code to c++ from MATLAB, as well as parallelizing the code. If you have any ideas beyond that, we would love to hear them.

  • matlab
  • mpi
  • programming
Understanding Covid-19 Pandemic through Social Media Discussion
Bryant University

Dr. Li has been collecting covid-19 tweets since March 2020 and currently has about 1.2 billion tweets. She is still collecting the tweets and expects to have more in the future. This project focuses on the understanding of the impact of covid-19 pandemic through social media discussion on Twitter. The following topics will be explored: 1). What are the top topics discussed regarding covid-19? How has the discussion of the topics changed over time? 2). What is sentiment/emotion of the topic by time, location, and gender? and 3). How to identify misinformation/fake news about covid-19.

The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning.

  • ai
  • data-analysis
  • natural-language-processing
  • programming
  • programming-best-practices
  • python
Coral Genomics: An assessment of metabolic pathways and genes influencing coral bleaching. Implications for the development of hydra as a model organism.
Wilmington University

Project Description

This project is predicted to be a multi-semester project to broadly assess genes metabolic function found in both humans, coral (Acropora palmata, A. cervicornis, A. millepora) and the hydra Aiptasia pallida.

The first goal is to train students on basic coral biology, and issues influencing coral decline. Students will also be trained to perform blast analyses.

Second, as a lab, we aim to characterize our own strain of Aiptasia, including:

  1. Develop inoculated and bleached strains, along with conducting imaging of the strains.
  2. Conduct genetic sequencing of the hydra. This initial goal is to provide gene sequencing training, for both the instructor and student, and gather preliminary sequencing data. This would be a new competency to the lab.
  3. Test novel substances on hydra hypothesized to influence both growth rate and bleaching resilience.

This will include development of experimental design.
Aiptasia pallida is a model organism for coral reef studies, and these cnidarians are an interesting potential model organism for humans in comparison to the fruit fly. Implications here include:
* potential insight into coral bleaching mechanisms;
* insight into human evolution and assessment of potential model organisms 
* development of hydra as a model organisms for both corals and humans. Hydra are already well developed as coral models, although this needs significant development.

Lastly, students will examine the DARWIN cluster and how it might aid project development.

  • genomics
Model Mie scattering and light propagation through a high scattering medium using Monte Carlo simulation
Southern Connecticut State University

In this project, we will first use numerical approaches to model light scattering off single particles using Monte Carlo simulation. We will obtain results that follow Rayleigh scattering and Mie scattering. The program will then be extended to simulate light propagation in a highly scattering turbid medium like biological tissue which consists of various arrangements of particles and bulk geometry and calculate the light distribution in the medium and on the boundary. The program will eventually be used for imaging tumors in biological tissue, which will be achieved through an inverse problem.

  • matlab
  • monte-carlo
  • programming
  • scheduling
Developing subject-specific models for the study of traumatic brain injury
Robert Morris University

Our research group is developing a workflow for generating subject-specific finite element head models from medical imaging data. These head models are applied to simulate blunt impact and blast loading scenarios with a goal of predicting the severity of brain injury. The model generation process includes segmenting the functional and structural regions of the brain, generating an appropriate finite element mesh, and incorporating structural details from medical imaging data. We are seeking a student who is interested in learning the steps in the workflow and developing new methods for improving the anatomical accuracy of these models.

  • finite-element-analysis
  • image-processing
  • unix-environment
  • workflow
Computational Simulation of One-Dimensional Porous Polymers
The University at Albany, SUNY

In this project, we are using computation to optimize the design of one-dimensional porous polymers. We aim to relate chemical motifs with pore size and internal surface area. Our workflow consists of polymer design, construction, and incorporation into quasi-amorphous periodic cells that enable determination of pore size and surface area via established Monte Carlo methods. We are currently using Materials Studio for our work, however, we are seeking a Mentor who can offer expertise in areas including streamlining our workflow via scripts, leveraging high-performance computing resources to which we currently lack access, and insight into other potential software options that may improve our ability to screen large numbers of polymers with a minimum of manual intervention.

  • computational-chemistry
  • materials-science
  • monte-carlo
Hybrid systems and LIFE methods for Mycobacterium tuberculosis
Rutgers-Camden

Mycobacterium tuberculosis infected one third of world population and current therapies involve up to 4 antibiotics and 6 months of treatment. Using MTB gene expression data from the main available drugs, KEGG and other databases for pathways and Linear-in-flux-expression (briefly LIFE) methodology, we aim to evaluate the potential effectiveness of drug combination therapies. We can do this by simulating the evolution of metabolites with the LIFE technique.

Another goal is to include hybrid methods to model metabolic pathway changes in MTB due to immune system, drug action, and other environmental conditions. Large scale metabolic and gene-regulation network dynamics will be used to assess drug treatment.

  • matlab
Bearing Condition Monitoring using Machine Learning
Western New England University

Machine failure and downtime was considerably low for less sophisticated machines developed during the first two industrial revolutions. Modern manufacturing facilities use highly complex and advance machines that require continuous health monitoring systems. Bearings are widely used in rotating equipment and machines to support load and to reduce friction. The presence of micron sized defects on the mating surfaces of the bearing components can lead to failure through a passage of time. Bearing health can be monitored by analyzing vibration signals acquired using an accelerometer and developing a machine learning framework for feature extraction and classification of the bearing conditions. The large size defects on bearing elements can be detected/identified by time domain and frequency domain analysis of its vibration signals. However, it becomes difficult to detect local bearing defects at their initial stage either due to their smaller size or presence of noise. In the proposed project, detection of local defects like crack and pits on bearing races will be carried out using machine learning. As a pilot project, simulated data of bearing conditions will be generated from MATLAB Simulink models and used for developing machine learning based predictive maintenance and condition monitoring algorithms. The trained model will be evaluated against the real bearing data and ground truth results. The project will be first implemented on a local machine and once successfully developed, will be ported to a cluster.

The machine learning frame work will include functions for exploring, extracting, and ranking features using data-based and model-based techniques, including statistical, spectral, and time-series analysis. The health of bearings will be monitored by extracting features from vibration data using frequency and time-frequency methods. A student will learn how to organize and analyze sensor data imported from local files, cloud storage, and distributed file systems. The student will learn the complete machine learning project pipeline from data importing, filtering, feature extraction, data distribution, training, validation and testing of multiple machine learning algorithms and working with the clusters. The developed machine learning pipeline will be shared with the research community and the work will be published in a conference proceeding. The project requires MATLAB toolboxes for signal processing, machine learning, predictive maintenance, statistical analysis and deep learning. The future work of the project includes a large datasets of real bearing data and simulated data for predictive maintenance of the bearing using cluster-based machine learning framework. The estimated defect sizes will be predicted, compared and validated through measured actual crack width or pit diameter.

  • ai
  • data-analysis
  • data-transfer
  • machine-learning
  • neural-networks
  • programming
Data Presentation for the Living Bridge
University of New Hampshire

The Memorial Bridge between Kittery, ME and Portsmouth, NH has several sensors on and below that collect data and record that data in an on-bridge database in a raw form. This project is extracting data from the on-bridge database into a new researcher-facing database that will offer publicly-available dataset extractions and on-server calculations for those researchers who wish to understand the work being done on the Living Bridge. The work in this project involves the understanding of the sensor data on the bridge, the existing schema of the database, developing a new schema for the researcher-facing database, and populating the database with data from the bridge. Future scope work will involve presenting and visualizing the bridge data as parameters are entered.

  • data-wrangling
  • storage
  • hardware
  • hpc-storage
  • vpn
  • ssh
  • distributed-computing
Big Data Portal for Sharing Real-world Bioinformatics Data Sets to the Public Domain
University of Maine, Augusta

This project aims to facilitate the sharing of large data sets for research and education across Maine as well as across the Open Storage Network. It is the intention of Mount Desert Island Biological Laboratories (MDIBL) to make data files and metadata publicly available in exchange for free access. This data is of interest and value to Data Science faculty at the University of Maine Augusta, for teaching and research as part of a system-wide data science degree.

The project requires the development of a front-end and back-end system, preferably developed in Go and deployed in a container, preferably Docker. The end result will allow uploading, downloading, metadata tagging, and HPC job submissions that use the data.

  • big-data
  • bioinformatics
  • data-management
  • data-wrangling
  • hpc-storage
  • metadata
  • science gateway
  • storage
Deep Learning High-Resolution Land Cover Mapping for Vermont
University of Vermont

Executive Summary

Funding is requested from the Northeast CyberTeam to support an undergraduate intern who will help advance remote sensing deep learning workflows supporting Vermont’s high-resolution land cover initiative. The internship will be based out of the University of Vermont Spatial Analysis Laboratory (SAL) and supervised by SAL Director and faculty member Jarlath O’Neil-Dunne. This internship will make extensive use of the Vermont Advanced Computing Core (VACC), particularly the DeepGreen GPU cluster.

Background

The State of Vermont is under both regulatory and public pressure to improve the water quality of Lake Champlain. State agencies must have access to high-resolution land cover information that is detailed enough to provide parcel-level quantification of land cover features. The University of Vermont, with funding from the State of Vermont, led the development of the 2016 statewide, high-resolution land cover dataset. This 2016 land cover dataset is the most accurate, detailed, and comprehensive land cover map ever made of Vermont. The existing workflows employed to develop this land cover dataset are slow and expensive, running on individual desktop computer workstations. Moreover, the land cover dataset was already out of date the moment it was produced.

In February 2020, a meeting was held consisting of the state agency representatives, the Vermont Advanced Computing Core, and the Spatial Analysis Laboratory. State agencies voiced their desire to have an approach to land cover mapping that would allow for more rapid updates of high-resolution land cover products, and that would capture fine-scale changes that could influence water quality, such as the construction of a new building.

Activity

This project will focus on integrating deep learning approaches into the SAL’s feature extraction workflows. Deep learning has shown tremendous potential for mapping land cover from high-resolution remotely sensed datasets. Deep learning techniques by themselves may not always be optimal for updating existing land cover datasets as false change can result in differences stemming from the source data or errors in the mapping itself. We propose to leverage deep learning to more efficiently update the Sate’s high-resolution land cover maps through a hybrid approach. Our desire is to take advantage of the potential that deep learning offers while still employing the methodologies that ensure quality specifications are met. The goal of this hybrid approach is to have a faster, more efficient, and more accurate approach to updating existing high-resolution land cover products. High-performance computing will be employed to tackle the most computationally intensive aspects of deep learning, the model training process. These models will then be integrated into the existing workflows to produce areas showing areas of change, and the existing high-resolution land cover to enable rapid updating of the statewide landcover data set. This project will leverage the University of Vermont’s recent investments in high-performance computing architecture. Deep Green, an NSF-funded supercomputer, will be employed.

The phases for this project are: 1) deep learning system design, 2) deep learning system development, 3) deep learning system implementation, 4) integration of deep learning into object-based feature extraction workflow, 5) production of an updated statewide land cover map. The software technologies employed will include TensorFlow and eCognition for feature extraction and ArcGIS for visualization.

This project is incredibly valuable to the state of Vermont as the State is struggling to meet regulatory requirements to reduce non-point source pollution to Lake Champlain, the state’s largest lake that extends into New York and Quebec. Access to current, accurate high-resolution land cover is imperative if the State is going to make decisions on how to reduce non-point source pollution best and fund these activities. Furthermore, the State has no dedicated remote sensing scientists on staff and lacks the computing and technical resources to carry out land cover mapping on this scale. The intern funded as part of this project will work with a talented team that consists of individuals who are internationally recognized for their expertise in automated feature extraction.

  • arcgis
  • big-data
  • distributed-computing
  • geographic-information-system
  • image-processing
  • machine-learning
  • python
Medical Diagnostics of Chest X-Ray Images with Deep Convolutional Neural Networks (CNN)
Kean University

Our research group has been working on applying machine learning and deep learning models for building predictive analytics towards real-world applications in health science. Last summer, we did preliminary data exploration and model synthesis on Chest X-ray images to predict lung disease. In medical diagnostics, chest radiography is one of the powerful methods to identify lung diseases such as Edema, Pneumonia, etc. However, manual examination of Chest X-rays demands the expert’s time, which is expensive and taxing on stakeholders. Promising developments in the computer vision application of artificial intelligence (AI) suggest that at least preliminary screening can be expeditiously conducted for a large number of X-rays for the benefit of patients, medical practitioners, and providers with AI applications. Several groups are currently developing AI platforms to predict lung diseases from chest radiography. The Stanford machine learning group recently published a large Chest X-ray set, each labeled possible for multiple conditions. One of our goals is to optimize the deep Convolutional Neural Networks (CNN) based on Resnet (Residual Network) models, which have shown to be accurate for image detection and segmentation, to make the multi-class predictions on the Chest X-ray images. In the first step, the student will check how Resnet models will work on this multi-class prediction analytics on different sub-datasets to estimate the minimal number of images required to build the deep learning model. In the next stage, the student will explore the impact of uncertainty with the labels. One of the challenges with this multi-class prediction is that the dataset has an uncertainty associated with the labels. The student will train the model without uncertainty and then add the uncertainty to understand its impact on the prediction. Then the student will examine a few Resnet models to improve the efficiency. All of this work will be benchmarked on the GPU resources at the Rutgers Campus Cluster and the Pittsburgh Supercomputing Center. Now we have have built the software stack and tested a simple CNN model to run on bridge-2 cluster at PSC.

  • ai
  • big-data
  • gpu

MATCHPremier Engagements

Optimization and Parallelization of A Numerical Gravitational-Wave Model

Next-generation gravitational-wave (GW) detectors, such as the Laser Interferometer Space Antenna (LISA), will detect GW signals from extreme mass-ratio inspirals. High fidelity and fast GW models are essential for achieving the full scientific potential of LISA. We have developed a high-accuracy, data-driven (surrogate) model for LISA-type sources. The code is currently in a Jupyter notebook, but to enable data analysis studies, we require the model to operate as an optimized, stand-alone library. This project aims to accomplish this goal by porting the model into two publicly available, community-driven packages GWSurrogate and the Black Hole Perturbation Toolkit. In this project, the student will port the model to these existing codebases before optimizing. The model data will be stored in HDF5 file format. One of the main computational bottlenecks is likely to be the large matrix-vector multiplication required to compute each harmonic mode. The student will explore offloading this cost to a GPU through the cupy package and parallelization over mode computations. Code profiling will also be carried out to identify other parts of the code that could benefit from further optimizations.

  • gravitational-waves
Statistical Analysis of criminal cases in the United States District Court of Puerto Rico
Salve Regina University

For the purposes of submitting an amicus brief to the US Supreme Court, the Puerto Rico Association of Criminal Defense Lawyers (PRACDL) compiled several indictments and docket sheets from the PACER system. Data from these documents were extracted and analyzed with sociodemographic data from the US Census. Nevertheless, there is still an opportunity to continue to analyze the remaining data to present a visual representation of not only the type of cases seen in this court but also the length of time that the case is "open", the percentage of persons represented by a court-appointed attorney, the average length of sentences, the number of persons granted bail, the number of persons with bail violations and the reasons for those violations, among others. An understanding of these data will facilitate related future social justice projects in this jurisdiction.

  • ai
  • data-analysis
  • machine-learning
  • python
UVM Art and AI Initiative
University of Vermont

The UVM Art and AI Initiative is exploring approaches to artistic image production, comparing the results of StyleGAN and Genetic Algorithms*. More broadly, the project explores emerging artistic practices with Machine Learning and AI while referencing an artistic lineage to the artists Wassily Kandinsky, Jonn Cage and Yoko Ono; these artists employ(ed) instructions and systems in their non-digital artworks. Kandinsky distinguished systems and developed a science of aesthetics with the basic elements of point, line and plane; Cage used the oracle 'I Ching' like a computer to inform his compositional decisions; Ono writes poetic scores that turn her audience into active participants when they follow a series of imaginative instructions. Through this ongoing research and practice, we intend to join the larger conversation about art and A.I and design new curriculum for UVM undergraduate students.

This work began in February 2020 and is led by Jennifer Karson of UVM’s Department of Art and Art History and the CEMS UVM FabLab. The team has included three UVM students: two graduate students in data science and one undergraduate mechanical engineering student. The team currently uses RunwayML for the StyleGAN experiments and Processing, an open-source language and development environment built on top of the Java programming language, for Genetic Algorithms.

Additional summer funding ($2,000) is sought for one of the UVM Art and A.I. Initiative student coders. The funding will assist the team in reaching a short-term goal to present initial findings this July at Alife 2020 Montreal; a longer-term goal is to create an art installation for the UVM Fleming Museum of Art in the spring of 2021. This is a unique opportunity to exhibit as part of the statewide project 2020 Vision: Seeing the World through Technology and alongside the work of internationally renowned computer artist and co-founder of the Processing programming language Casey Reas.

Milestone 1:

Genetic Algorithms: Develop successful genetic algorithm code that meets compositional standard (color, architecture, appropriate datasets) while creating new compositions from the elements of existing hand-drawn compositions. The program should output image files that can be stored and printed at high resolutions on paper.

StyleGAN: Transition from RunwayML to coding in Python and employing VACC computer cluster. The process should output image files that can be stored and printed at high resolutions and on paper to be exhibited.

Milestone 2:

Genetic Algorithms: Create an interactive version of the program that allows for audience participation; can be exhibited in a museum gallery and online.

 

StyleGAN: Develop video that can be exhibited in museum gallery and online.

*Our Genetic Algorithm base code was developed by Daniel Shiffman

  • image-processing
Developing Computational Labs for Upper Level Physical Chemistry II Course
Bridgewater State University

Out of all the upper level chemistry courses, physical chemistry is the only course that provides an in-depth insight into the fundamental principles underpinning the concepts taught in various sub-disciplines of chemistry. Further, physical chemistry provides a connection between microscopic and macroscopic worlds of chemistry through mathematical models and experimental methods to test the validity of those models. Therefore, computational techniques are a perfect vehicle to teach content of physical chemistry course to undergraduate students. Additionally, American Chemical Society recommends computational chemistry to be incorporated into undergraduate chemistry curriculum. At Bridgewater State University (BSU) physical chemistry is a two-semester course referred to as 'physical chemistry I' and 'physical chemistry II'. While the overarching goal is to develop computational experiments (referred to as 'dry-labs'), project proposed here focuses on designing and developing dry labs for 'Physical Chemistry II' course at BSU. The inherently theoretical nature of this course along with its connection to wide range of spectroscopic techniques commonly used by chemists and physicists makes this course a perfect choice for assessing BSU students' reception to the idea of dry labs. It should be noted that there are no computational experiments in the current physical chemistry curriculum (both I and II) at BSU. The proposed project focuses on developing 4 - 6 computational experiments to be introduced (in spring 2018) as either stand-alone dry-lab experiments or accompany currently existing experiments. These dry labs will be developed on Gaussian 09 platform, which is currently installed on C3DDB server at MGHPCC. Finally, I also expect to make these experiments available to other New England instructors teaching physical chemistry II or equivalent course interested in incorporating computational chemistry into their curriculum.

  • computational-chemistry
  • gaussian
High Performance Computing vs Quantum Computing for Neural Networks supporting Artificial Intelligence
Pace University

A personalized learning system that adapts to learners' interests, needs, prior knowledge, and available resources is possible with artificial intelligence (AI) that utilizes natural language processing in neural networks. These deep learning neural networks can run on high performance computers (HPC) or on quantum computers (QC). Both HPC and QC are emergent technologies. Understanding both systems well enough to select which is more effective for a deep learning AI program, and show that understanding through example, is the ultimate goal of this project. The entry to learning technologies such as HPC and QC is narrow at present because it relies on classical education methods and mentoring. The gap between the knowledge workers needed, which is in high demand, and those with the expertise to teach, which is being achieved at a much slower rate, is widening. Here, an AI cognitive agent, trained via deep learning neural networks, can help in emergent technology subjects by assisting the instructor-learner pair with adaptive wisdom. We are building the foundations for this AI cognitive agent in this project.

The role of the student facilitator will involve optimizing a deep learning neural network, comparing and contrasting with the newest technologies, such as a quantum computer (and/or a quantum computer simulator) and a high performance computer and showing the efficiency of the different computing approaches. The student facilitator will perform these tasks at the rate described in the proposal. Milestone work will be displayed and shared publicly via posting to the Jupyter Notebooks on Google Colab and linked to regular Github uploads.

  • ai
  • big-data
  • deep-learning
  • github
  • hpc-cluster-architecture
  • hpc-operations
  • jupyterhub
  • machine-learning
  • natural-language-processing
  • proposal-development
  • python
  • quantum-mechanics
  • research-facilitation
  • research-grants
  • resources
  • scikit-learn
  • singularity
  • technical-training-for-hpc
  • vectorization