Submission information
Submission Number: 61
Submission ID: 92
Submission UUID: ef874fa2-e0e2-4987-a7dc-f0d9cacd71d9
Submission URI: /form/project
Created: Wed, 08/12/2020 - 13:54
Completed: Wed, 08/12/2020 - 15:10
Changed: Wed, 07/06/2022 - 15:10
Remote IP address: 165.230.224.100
Submitted by: Galen Collier
Language: English
Is draft: No
Webform: Project
Project Title | End-to-end learning of protein-protein interactions |
---|---|
Program | CAREERS |
Project Image |
![]() |
Tags | bash (242), bioinformatics (277), computational-chemistry (81), debugging (38), machine-learning (272), programming (5), programming-best-practices (49), python (69), scripting (243), slurm (71), software-installation (211), tensorflow (51), tuning (217) |
Status | Halted |
Project Leader | Guillaume Lamoureux |
guillaume.lamoureux@rutgers.edu | |
Mobile Phone | |
Work Phone | |
Mentor(s) | Galen Collier |
Student-facilitator(s) | |
Mentee(s) | |
Project Description | Protein-protein interactions (PPIs) are involved in numerous fundamental biological processes and a model that can reliably predict whether two proteins interact — and predict the effect of protein variation on an existing interaction — opens up new avenues for systems biology and for protein design. Current state-of-the-art PPI prediction models rely on sequence similarity with proteins known to interact and have an intrinsically limited accuracy for the protein variants of interest for cancer or viral/bacterial infection. The goal of the project is to train deep learning models for PPI prediction in absence of structural information about the protein complex. We have recently developed models to predict the structure of any complex formed by two proteins A and B of known structure (see our preprint “Protein-protein docking using learned three-dimensional representations”, https://www.biorxiv.org/content/10.1101/738690v2), and we now aim at developing models that generate the structure of the AB complex at once, without explicitly searching for the optimal relative orientations of the two proteins, and that predict the binding affinity of proteins A and B directly from their structures. Such models have two main advantages: (1) they are much more computationally efficient, since they avoid a costly grid search in the space of translations and rotations, and (2) they are differentiable, which means they can be used as building blocks for larger neural architectures that, for instance, also predict the structures of the individual proteins A and B themselves. This project is enabled by the development of TorchProteinLibrary, a computationally efficient library of differentiable primitives for deep neural network models of protein structure (see our preprint “TorchProteinLibrary: A computationally efficient, differentiable representation of protein structure” https://arxiv.org/abs/1812.01108). The library implements the functionalities needed to perform end-to-end learning of protein structure prediction. |
Project Deliverables | Research workflow development: successful training of deep learning models for PPI prediction in absence of structural information about the protein complex. Communicating the findings in the form of presentations and/or publications. |
Project Deliverables | |
Student Research Computing Facilitator Profile | - Grad or undergrad - Interested in structural biology research - Experienced Linux or Unix user - Comfortable working in a remote Linux environment (HPC cluster) - Some experience with Python programming - Structural modeling experience (understanding general concepts) will be helpful - Familiarity with machine learning concepts will be helpful |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | Practical applications |
Mentee Programming Skill Level | |
Project Institution | Rutgers University–Camden |
Project Address | 303 Cooper St Camden, New Jersey. 08102 |
Anchor Institution | CR-Rutgers |
Preferred Start Date | 09/01/2020 |
Start as soon as possible. | No |
Project Urgency | Already behind3Start date is flexible |
Expected Project Duration (in months) | |
Launch Presentation | |
Launch Presentation Date | |
Wrap Presentation | |
Wrap Presentation Date | |
Project Milestones | |
Github Contributions | |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | |
What will the student learn? | |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | Effort involved in recruiting and training junior-level research software engineers. |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | |
What is the impact on other disciplines? | |
Is there an impact physical resources that form infrastructure? | |
Is there an impact on the development of human resources for research computing? | |
Is there an impact on institutional resources that form infrastructure? | |
Is there an impact on information resources that form infrastructure? | |
Is there an impact on technology transfer? | |
Is there an impact on society beyond science and technology? | |
Lessons Learned | |
Overall results |