Submission Number: 82
Submission ID: 116
Submission UUID: 26441743-38b1-4c88-b997-bd9a19bc0115
Submission URI: /form/project

Created: Sat, 12/19/2020 - 14:01
Completed: Sat, 12/19/2020 - 14:18
Changed: Fri, 06/24/2022 - 22:01

Remote IP address: 67.176.36.130
Submitted by: Anita Schwartz
Language: English

Is draft: No
Webform: Project
Project Title Project WiCCED ML
Program CAREERS
Project Image WiCCED.png
Tags docker (35), hadoop (12), jupyterhub (214), kubernetes (210), machine-learning (272), programming-best-practices (49), python (69), r (32), software-installation (211), spark (91), tensorflow (51), tuning (217)
Status Complete
Project Leader Tina Callahan
Email tina.callahan@udel.edu
Mobile Phone
Work Phone
Mentor(s) Christina Callahan, Matt Shatley
Student-facilitator(s) Rainier Delarosa
Mentee(s)
Project Description NSF EPSCoR-funded Project WiCCED aims to assess threats and develop solutions to mitigate the human, agricultural, and natural pressures threatening water security in Delaware’s changing coastal environment. The WiCCED Data Core team provides an important platform for storage and dissemination of data produced from diverse resources and develops advanced data analysis techniques and data-backed decision-support systems.

The student working on this project will interface with faculty, staff, and students to … enable access/retrieval of data from a big data system(GeoMesa/Accumulo based) for machine learning projects. This student will review, test, and create code to access data via Python and R, documenting their process for others to follow. They will also assist in retrieving/converting data to researchers desired file format. Time permitting, the student will also assist with data ingest.
Software utilized:
Python
R
Geomesa
Accumulo
Hadoop/MapReduce
Spark/PySpark
Jupyter(notebook)
Kubernetes
Project Deliverables Updated Python code uploaded to the GitHub repository
User-based documentation created and in the GitHub wiki
Project Deliverables
Student Research Computing Facilitator Profile - Grad or undergrad
- Experienced Linux or Unix user
- Comfortable working in a remote Linux environment (HPC cluster)
- Some experience with Python and/or R programming
- Modeling experience (understanding general concepts) will be helpful
- Familiarity with machine learning concepts will be helpful
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Some hands-on experience
Mentee Programming Skill Level
Project Institution University of Delaware
Project Address Newark, Delaware. 19716
Anchor Institution CR-University of Delaware
Preferred Start Date 03/01/2021
Start as soon as possible. No
Project Urgency Already behind3Start date is flexible
Expected Project Duration (in months)
Launch Presentation
Launch Presentation Date 03/10/2021
Wrap Presentation
Wrap Presentation Date 07/14/2021
Project Milestones
  • Milestone Title: Beginning
    Milestone Description: Review Python code, become familiar with and set up project, and test setup; CAREERS project Launch presentation
    Completion Date Goal: 2021-04-01
    Actual Completion Date: 2021-05-03
  • Milestone Title: Middle
    Milestone Description: Rewrite existing API as needed, document inside code, upload code to GitHub.
    Completion Date Goal: 2021-06-14
    Actual Completion Date: 2021-07-02
  • Milestone Title: End
    Milestone Description: Develop wiki documentation with use-cases for users to understand and use the Python API with user testing; CAREERS project Wrap-up presentation.
    Completion Date Goal: 2021-07-14
    Actual Completion Date: 2021-07-14
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn? End-user support, Python/R programming, Utilizing Big Data software, Data Ingest/Engineering experience
What will the mentee learn?
What will the Cyberteam program learn from this project? Effort involved in recruiting and training
HPC resources needed to complete this project? NRP/PRP cluster access will be sufficient. University of Delaware Caviness cluster and DARWIN cluster access if available.
Notes
What is the impact on the development of the principal discipline(s) of the project? Improves data access capabilities for researchers (specifically those researchers participating in the Data Core Working Group of the NSF EPSCoR funded Project WiCCED.

Also serves as a means of technical documentation (of sorts).
What is the impact on other disciplines? Improved ability to leverage data holdings for ML and data analytics (in any discipline).
Is there an impact physical resources that form infrastructure? No impact on physical resources.
Is there an impact on the development of human resources for research computing? Not applicable for this particular project.
Is there an impact on institutional resources that form infrastructure? Not applicable.
Is there an impact on information resources that form infrastructure? Development of the Python API and documentation serve as a means to improve access to data holdings based on the software infrastructure.
Is there an impact on technology transfer? The documentation created for this project as well as the Python API code was submitted to GitHub. This will assist in the technology/software transfer of knowledge to those members working with data on the current hardware/software configuration.
Is there an impact on society beyond science and technology? In general, the CAREERS project is valuable for the mentee development and applied learning opportunities that cannot be replicated in the classroom setting.
The Project WiCCED ML CAREERS project results will contribute to efficiency of data access and analysis for Project WiCCED participants. In doing so, this will have an impact on the science that is produced to inform state and local policy with regards to water availability and security in Delaware.
Lessons Learned Overall, the project resulted in both a Python API uploaded to GitHub and the development of a GitHub wiki to improve researcher access to current data holdings relating to Project WiCCED. The student facilitator from the project learned how to:
Review and test existing code
Recognize what new features were needed and to add new methods
Document inline code so that the code is more meaningful to future users
Submit code to GitHub and collaborate
Develop a GitHub wiki and technical documentation, including style formatting
Review, test, and edit technical documentation
Overall, the facilitator developed working knowledge of GitHub, Python libraries, and Project WiCCED data access needs

On a more personal note, the student facilitator learned more about his learning style and that his lack of self-confidence could become a communication barrier. An improved review of learning styles before matching facilitators to a project might help to reduce possible personal impediments.
Overall results Contributions made to the project:
Improvement of an existing Python API that includes inline documentation: https://github.com/mshatley/epscor
A Getting Started technical guide to assist researchers with configuration and access to the Python API: https://github.com/mshatley/epscor/wiki/Getting-Started
GitHub wiki that documents the usages of the methods within the Python API: https://github.com/mshatley/epscor/wiki