Submission Number: 82
Submission ID: 116
Submission UUID: 26441743-38b1-4c88-b997-bd9a19bc0115
Submission URI: /form/project

Created: Sat, 12/19/2020 - 14:01
Completed: Sat, 12/19/2020 - 14:18
Changed: Fri, 06/24/2022 - 22:01

Remote IP address: 67.176.36.130
Submitted by: Anita Schwartz
Language: English

Is draft: No
Webform: Project
Project WiCCED ML
CAREERS
WiCCED.png
docker (35), hadoop (12), jupyterhub (214), kubernetes (210), machine-learning (272), programming-best-practices (49), python (69), r (32), software-installation (211), spark (91), tensorflow (51), tuning (217)
Complete

Project Leader

Tina Callahan
{Empty}
{Empty}

Project Personnel

Rainier Delarosa
{Empty}

Project Information

NSF EPSCoR-funded Project WiCCED aims to assess threats and develop solutions to mitigate the human, agricultural, and natural pressures threatening water security in Delaware’s changing coastal environment. The WiCCED Data Core team provides an important platform for storage and dissemination of data produced from diverse resources and develops advanced data analysis techniques and data-backed decision-support systems.

The student working on this project will interface with faculty, staff, and students to … enable access/retrieval of data from a big data system(GeoMesa/Accumulo based) for machine learning projects. This student will review, test, and create code to access data via Python and R, documenting their process for others to follow. They will also assist in retrieving/converting data to researchers desired file format. Time permitting, the student will also assist with data ingest.
Software utilized:
Python
R
Geomesa
Accumulo
Hadoop/MapReduce
Spark/PySpark
Jupyter(notebook)
Kubernetes

Project Information Subsection

Updated Python code uploaded to the GitHub repository
User-based documentation created and in the GitHub wiki
{Empty}
- Grad or undergrad
- Experienced Linux or Unix user
- Comfortable working in a remote Linux environment (HPC cluster)
- Some experience with Python and/or R programming
- Modeling experience (understanding general concepts) will be helpful
- Familiarity with machine learning concepts will be helpful
{Empty}
Some hands-on experience
{Empty}
University of Delaware
Newark, Delaware. 19716
CR-University of Delaware
03/01/2021
No
Already behind3Start date is flexible
{Empty}
03/10/2021
07/14/2021
  • Milestone Title: Beginning
    Milestone Description: Review Python code, become familiar with and set up project, and test setup; CAREERS project Launch presentation
    Completion Date Goal: 2021-04-01
    Actual Completion Date: 2021-05-03
  • Milestone Title: Middle
    Milestone Description: Rewrite existing API as needed, document inside code, upload code to GitHub.
    Completion Date Goal: 2021-06-14
    Actual Completion Date: 2021-07-02
  • Milestone Title: End
    Milestone Description: Develop wiki documentation with use-cases for users to understand and use the Python API with user testing; CAREERS project Wrap-up presentation.
    Completion Date Goal: 2021-07-14
    Actual Completion Date: 2021-07-14
{Empty}
{Empty}
End-user support, Python/R programming, Utilizing Big Data software, Data Ingest/Engineering experience
{Empty}
Effort involved in recruiting and training
NRP/PRP cluster access will be sufficient. University of Delaware Caviness cluster and DARWIN cluster access if available.
{Empty}

Final Report

Improves data access capabilities for researchers (specifically those researchers participating in the Data Core Working Group of the NSF EPSCoR funded Project WiCCED.

Also serves as a means of technical documentation (of sorts).
Improved ability to leverage data holdings for ML and data analytics (in any discipline).
No impact on physical resources.
Not applicable for this particular project.
Not applicable.
Development of the Python API and documentation serve as a means to improve access to data holdings based on the software infrastructure.
The documentation created for this project as well as the Python API code was submitted to GitHub. This will assist in the technology/software transfer of knowledge to those members working with data on the current hardware/software configuration.
In general, the CAREERS project is valuable for the mentee development and applied learning opportunities that cannot be replicated in the classroom setting.
The Project WiCCED ML CAREERS project results will contribute to efficiency of data access and analysis for Project WiCCED participants. In doing so, this will have an impact on the science that is produced to inform state and local policy with regards to water availability and security in Delaware.
Overall, the project resulted in both a Python API uploaded to GitHub and the development of a GitHub wiki to improve researcher access to current data holdings relating to Project WiCCED. The student facilitator from the project learned how to:
Review and test existing code
Recognize what new features were needed and to add new methods
Document inline code so that the code is more meaningful to future users
Submit code to GitHub and collaborate
Develop a GitHub wiki and technical documentation, including style formatting
Review, test, and edit technical documentation
Overall, the facilitator developed working knowledge of GitHub, Python libraries, and Project WiCCED data access needs

On a more personal note, the student facilitator learned more about his learning style and that his lack of self-confidence could become a communication barrier. An improved review of learning styles before matching facilitators to a project might help to reduce possible personal impediments.
Contributions made to the project:
Improvement of an existing Python API that includes inline documentation: https://github.com/mshatley/epscor
A Getting Started technical guide to assist researchers with configuration and access to the Python API: https://github.com/mshatley/epscor/wiki/Getting-Started
GitHub wiki that documents the usages of the methods within the Python API: https://github.com/mshatley/epscor/wiki