Submission information
Submission Number: 129
Submission ID: 226
Submission UUID: 555a8c58-506b-47fa-8113-46e20a781360
Submission URI: /form/project
Created: Tue, 11/23/2021 - 15:06
Completed: Tue, 11/23/2021 - 15:06
Changed: Mon, 06/03/2024 - 13:02
Remote IP address: 128.118.7.103
Submitted by: Rob Mathers
Language: English
Is draft: No
Webform: Project
Project Title: Calculation of Polymer Hydrophobicity Program: CAREERS (323) Project Image: {Empty} Tags: cleanup (368), git (457), github (490), optimization (509), python (69) Status: Complete Project Leader -------------- Project Leader: Rob Mathers Email: rtm11@psu.edu Mobile Phone: 412-779-7738 Work Phone: {Empty} Project Personnel ----------------- Mentor(s): Thomas Langford (510) Student-facilitator(s): Sander Cohen-Janes (1684) Mentee(s): {Empty} Project Information ------------------- Project Description: After writing python code to calculate physical properties of polymer molecules in 2021, we are interested in cleaning up the code, addressing some calculation issues, and putting the code on GitHub. The code is written using an open-source cheminformatics package called RDKit. Prior to using RDKit, we had been using commercial software (Materials Studio, Chem3D) from 2014 to 2019. The physical property of interest relates to hydrophobicity or the oil-like characteristics of polymers. Our method is inspired by the medicinal chemistry approach to describe drug-like molecules using partition coefficients. These coefficients, which are often referred to as LogP values, can be positive or negative. Positive LogP values indicate oil solubility while negative LogP values suggest water soluble molecules. Since the 1980s, the pharmaceutical industry has spawned many computational methods to calculate LogP. Our method constructs SMILES strings for a short segment of a polymer. These SMILES strings represent 3D chemical structures using ACSII symbols. Then, we use RDKit to convert the SMILES string to a 3D molecule, optimize the conformation, and calculate the surface area (SA). Afterwards, we calculate LogP. The resulting ratio of LogP/SA has provided predictive capability in a number of collaborative projects. Since 2015, we have published 18 journal articles that use LogP and LogP/SA values. Project Information Subsection ------------------------------ Project Deliverables: The goals of the project include the following: 1) Clean up code 2) Optimize calculation method a. Reduce time needed for jobs b. Automatically adjust to available hardware resources c. Determine how many conformations are needed 3) Data output a. Output graph or list values (csv etc) b. Provide options for selecting axes for graph (x-axis could be number of monomer units (N), 1/N, logN, N^1/3, N^1/2 etc.) 4) Put code on GitHub Project Deliverables: {Empty} Student Research Computing Facilitator Profile: - Experience with Python - Experience with or interested in learning Git and using GitHub Mentee Research Computing Profile: {Empty} Student Facilitator Programming Skill Level: Some hands-on experience Mentee Programming Skill Level: {Empty} Project Institution: Penn State-New Kensington Project Address: New Kensington, Pennsylvania Anchor Institution: CR-Penn State Preferred Start Date: {Empty} Start as soon as possible.: Yes Project Urgency: Already behind3Start date is flexible Expected Project Duration (in months): 5 Launch Presentation: https://support.access-ci.org/system/files/webform/project/226/Sander_Cohen-Janes_Launch_Presentation.pdf Launch Presentation Date: 07/20/2022 Wrap Presentation: {Empty} Wrap Presentation Date: {Empty} Project Milestones: - Milestone Title: Clean up the code Milestone Description: Currently, the code needs to be cleaned up and possibly reorganized or restructured. Completion Date Goal: 2022-06-30 - Milestone Title: Optimize calculation method Milestone Description: While running the code, we have noticed that calculating LogP is relatively fast. However, calculating surface area (SA) is much slower with most of the computational time spent finding the best conformation (i.e. lowest energy). If possible, we would like to reduce calculation time by automatically adjusting to available hardware resources and finding a better way to conduct the conformation searching. Completion Date Goal: 2022-08-11 - Milestone Title: Data output Milestone Description: Currently, the code outputs a graph. We would like options to save a graph, list values (csv etc), or select axes for graphing the data. Completion Date Goal: 2022-09-30 - Milestone Title: Post code on Github Milestone Description: Get code ready and make available on Github. Completion Date Goal: 2022-10-30 Github Contributions: {Empty} Planned Portal Contributions (if any): {Empty} Planned Publications (if any): {Empty} What will the student learn?: {Empty} What will the mentee learn?: {Empty} What will the Cyberteam program learn from this project?: {Empty} HPC resources needed to complete this project?: {Empty} Notes: {Empty} Final Report ------------ What is the impact on the development of the principal discipline(s) of the project?: {Empty} What is the impact on other disciplines?: {Empty} Is there an impact physical resources that form infrastructure?: {Empty} Is there an impact on the development of human resources for research computing?: {Empty} Is there an impact on institutional resources that form infrastructure?: {Empty} Is there an impact on information resources that form infrastructure?: {Empty} Is there an impact on technology transfer?: {Empty} Is there an impact on society beyond science and technology?: {Empty} Lessons Learned: {Empty} Overall results: From the Project PI: "So far, we have created The Hydrophobicity Project on GitHub to build a community of users. Sanders code is available on this site. https://github.com/TheHydrophobicityProject We have been testing and using the code since last October. A manuscript is in preparation." From the student facilitator: "This was the first project I worked on where I used version control. Now I can't live without it. This was also the first project in which I was creating a "product" for general consumption, rather than internal tools. It was useful to collaborate with people that weren't active coders on the project so I could get usability feedback off of which I could iterate. Those are two things I will keep with me regardless of the other technologies I use in my future projects."