Submission information
Submission Number: 17
  Submission ID: 34
  Submission UUID: ff4a13e3-3528-40bd-931f-af61c546c22a
      Submission URI: /form/project
          Created: Tue, 09/03/2019 - 13:23
  Completed: Tue, 09/03/2019 - 13:26
  Changed: Thu, 04/28/2022 - 13:11
  Remote IP address: 130.215.55.243
  Submitted by: Northeast Cyberteam
  Language: English
  Is draft: No
    Webform: Project
    
| Project Title | Machine learning for material property prediction | 
|---|---|
| Program | Northeast | 
| Project Image |   | 
| Tags | big-data (4), data-wrangling (6), computational-chemistry (81), molecular-dynamics (288), machine-learning (272), python (69), gpu (80) | 
| Status | Complete | 
| Project Leader | Liping Yu | 
| liping.yu@maine.edu | |
| Mobile Phone | |
| Work Phone | (207) 581-1029 | 
| Mentor(s) | Chris Wilson, Larry Whitsel, Bruce Segee | 
| Student-facilitator(s) | Michael Butler | 
| Mentee(s) | |
| Project Description | Density Functional Theory based methods for calculating material properties from first principles require large computation facilities and significant computation time. This project aims to develop novel machine learning models and workflows in order to better predict material properties in a fraction of the computation the time that current techniques require. --- Traditionally, the physical properties of solids are calculated from solving the many-body Schrödinger Equation, using atomic compositions as inputs. Such methods are often computationally demanding and the data generated in this way are disconnected from each other. Although massive amounts of physical property data of materials have been accumulated, the calculation of properties for a new compound does not benefit from them. This CITeam project is aimed at developing machine learning models, software, and workflows that can be used in the development of a material descriptor, which can be used to accurately predict the properties of novel materials. The development of such software is a key first step in high-throughput discovery of novel energy materials. This research integrates theory, computation, and data mining. The results and machine learning workflows generated under this project will benefit researchers in multiple disciplines such as physics, chemistry, material science, and computational science. This project will require significant computation time and hardware resources and thus benefit greatly from having access to the Umaine super computer. | 
| Project Deliverables | 1) Software: a) A main script from which specified ML models and material descriptors can be deployed on the computing clster. This script will parallelize the training of the ML models and marshal the necessary compute resources. b) Individual functions/scripts/classes implementing each ML model c) Scripts to parse material structure files and produce material descriptors d) Scripts to compare ML model and descriptor pairs and analyze their efficacy 2) A report evaluating the efficacy of a variety of machine learning models and machine descriptors in predicting new material properties. | 
| Project Deliverables | |
| Student Research Computing Facilitator Profile | Graduate student studying physics with further background in computer science, machine learning, and high performance computing. (Michael Butler, UMaine Orono) | 
| Mentee Research Computing Profile | |
| Student Facilitator Programming Skill Level | Practical applications | 
| Mentee Programming Skill Level | |
| Project Institution | University of Maine Orono | 
| Project Address | 105 Bennett Hall Orono, Maine. 04469 | 
| Anchor Institution | NE-University of Maine | 
| Preferred Start Date | 01/01/2019 | 
| Start as soon as possible. | No | 
| Project Urgency | Already behind3Start date is flexible | 
| Expected Project Duration (in months) | |
| Launch Presentation | |
| Launch Presentation Date | |
| Wrap Presentation | |
| Wrap Presentation Date | |
| Project Milestones | |
| Github Contributions | |
| Planned Portal Contributions (if any) | A module on implementing machine learning models on a HPC cluster. | 
| Planned Publications (if any) | The discovery of a novel, improved material descriptor or machine learning model would have a very high probability of being published. | 
| What will the student learn? | The student will learn how to build and efficiently train large, robust machine learning models in a distributed HPC environment. | 
| What will the mentee learn? | |
| What will the Cyberteam program learn from this project? | The Cyberteam will learn how to better support a distributed machine learning environment so that future researchers will have an easier time writing and deploying such software. | 
| HPC resources needed to complete this project? | Training models will require tens - hundreds of hours of HPC time. Further validation of those models should take a comparable amount of time to that of training. CUDA capable nodes would be helpful, but are not necessary. | 
| Notes | |
| What is the impact on the development of the principal discipline(s) of the project? | This project helped make advancements in predicting material properties using neural networks, impacting physics | 
| What is the impact on other disciplines? | The project helped make advancements in computer science by utilizing neural networks in a novel way. | 
| Is there an impact physical resources that form infrastructure? | |
| Is there an impact on the development of human resources for research computing? | The faculty researcher formed a human regional network through the ERN as a result of this work. | 
| Is there an impact on institutional resources that form infrastructure? | |
| Is there an impact on information resources that form infrastructure? | |
| Is there an impact on technology transfer? | |
| Is there an impact on society beyond science and technology? | |
| Lessons Learned | |
| Overall results |