Submission Number: 27
Submission ID: 44
Submission UUID: a95c8eb3-6b76-4498-ba27-c393179db12f
Submission URI: /form/project

Created: Tue, 09/03/2019 - 14:08
Completed: Tue, 09/03/2019 - 14:10
Changed: Fri, 07/10/2020 - 17:05

Remote IP address: 130.215.55.243
Submitted by: Northeast Cyberteam
Language: English

Is draft: No
Webform: Project
Project Title Genetics and Big Data
Program Northeast
Project Leader Dawei Li
Email Dawei.Li@uvm.edu
Mobile Phone
Work Phone
Mentor(s) Katia Bulekova
Student-facilitator(s) Abigail Waters
Mentee(s)
Project Description The overarching goal of this project is to identify a predictive, quantitative framework describing individual differences in genetic, epigenetic, cognitive, and behavioral markers of emotion-cognition regulation in response to academically stressful situations. Each year, large numbers of young adults drop out of college and university due to self-sabotaging and seemingly irrational behaviors when faced with academic stressors in their young adulthood. This proposal utilizes a cross-disciplinary approach to understanding neuro-biological functionalities and resultant behaviors across a spectrum of neuro-typical and neuro-atypical young adults, the latter being identified as those with diagnosed learning disabilities, such as dyslexia, ADHD, and college-able autism. This project-partnership includes faculty and students from the University of Vermont (sequencing data analyses), Landmark College (research subject recruitment), University of New Hampshire (research subject recruitment), University of Maine (model simulation), and Vermont Genetics Network. Dawei's group has done some trial work at MGHPCC and has been VERY pleased with the results. He would like to scale up -- currently to run one sample, he uses 2TB storage and 5 days of processing with 64GB memory and 12 cores. The planned project has 3,000 samples. To finish them, the storage will be 2TB X 3,000 = 6 PB. Computational time is estimated at 15,000 computing days (5 days X 3,000) using a single processor with 64GB and 12 cores.
Project Deliverables
Project Deliverables
Student Research Computing Facilitator Profile Recommend a graduate student with expertise in dealing with large data sets -- might be more enjoyable if they have an interest in biology but not required.
Mentee Research Computing Profile
Student Facilitator Programming Skill Level
Mentee Programming Skill Level
Project Institution UVM
Project Address University of Vermont
Burlington, Vermont. 05405
Anchor Institution NE-University of Vermont
Preferred Start Date 07/10/2017
Start as soon as possible.
Project Urgency
Expected Project Duration (in months)
Launch Presentation
Launch Presentation Date
Wrap Presentation
Wrap Presentation Date
Project Milestones
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn? Stephen needs to provide
What will the mentee learn?
What will the Cyberteam program learn from this project?
HPC resources needed to complete this project?
Notes Note from Stephen: It remains unclear to me why he needs so much storage and whether he is appropriately compressing files, removing temp files, etc. If I understand his workflow correctly he uses a series of programs he has collected from others -- unclear if he uses scripts to link them all together or not.
What is the impact on the development of the principal discipline(s) of the project?
What is the impact on other disciplines?
Is there an impact physical resources that form infrastructure?
Is there an impact on the development of human resources for research computing?
Is there an impact on institutional resources that form infrastructure?
Is there an impact on information resources that form infrastructure?
Is there an impact on technology transfer?
Is there an impact on society beyond science and technology?
Lessons Learned
Overall results