Submission information
Submission Number: 82
Submission ID: 116
Submission UUID: 26441743-38b1-4c88-b997-bd9a19bc0115
Submission URI: /form/project
Created: Sat, 12/19/2020 - 14:01
Completed: Sat, 12/19/2020 - 14:18
Changed: Fri, 06/24/2022 - 22:01
Remote IP address: 67.176.36.130
Submitted by: Anita Schwartz
Language: English
Is draft: No
Webform: Project
Project Title: Project WiCCED ML Program: CAREERS (323) Project Image: https://support.access-ci.org/system/files/webform/project/116/WiCCED.png Tags: docker (35), hadoop (12), jupyterhub (214), kubernetes (210), machine-learning (272), programming-best-practices (49), python (69), r (32), software-installation (211), spark (91), tensorflow (51), tuning (217) Status: Complete Project Leader -------------- Project Leader: Tina Callahan Email: tina.callahan@udel.edu Mobile Phone: {Empty} Work Phone: {Empty} Project Personnel ----------------- Mentor(s): Christina Callahan (583), Matt Shatley (585) Student-facilitator(s): Rainier Delarosa (570) Mentee(s): {Empty} Project Information ------------------- Project Description: NSF EPSCoR-funded Project WiCCED aims to assess threats and develop solutions to mitigate the human, agricultural, and natural pressures threatening water security in Delaware’s changing coastal environment. The WiCCED Data Core team provides an important platform for storage and dissemination of data produced from diverse resources and develops advanced data analysis techniques and data-backed decision-support systems. The student working on this project will interface with faculty, staff, and students to … enable access/retrieval of data from a big data system(GeoMesa/Accumulo based) for machine learning projects. This student will review, test, and create code to access data via Python and R, documenting their process for others to follow. They will also assist in retrieving/converting data to researchers desired file format. Time permitting, the student will also assist with data ingest. Software utilized: Python R Geomesa Accumulo Hadoop/MapReduce Spark/PySpark Jupyter(notebook) Kubernetes Project Information Subsection ------------------------------ Project Deliverables: Updated Python code uploaded to the GitHub repository User-based documentation created and in the GitHub wiki Project Deliverables: {Empty} Student Research Computing Facilitator Profile: - Grad or undergrad - Experienced Linux or Unix user - Comfortable working in a remote Linux environment (HPC cluster) - Some experience with Python and/or R programming - Modeling experience (understanding general concepts) will be helpful - Familiarity with machine learning concepts will be helpful Mentee Research Computing Profile: {Empty} Student Facilitator Programming Skill Level: Some hands-on experience Mentee Programming Skill Level: {Empty} Project Institution: University of Delaware Project Address: Newark, Delaware. 19716 Anchor Institution: CR-University of Delaware Preferred Start Date: 03/01/2021 Start as soon as possible.: No Project Urgency: Already behind3Start date is flexible Expected Project Duration (in months): {Empty} Launch Presentation: https://support.access-ci.org/system/files/webform/project/116/Project%20WiCCED%20ML_%20Careers.project.launch%20%28Rainier%20Delarosa%29.pdf Launch Presentation Date: 03/10/2021 Wrap Presentation: https://support.access-ci.org/system/files/webform/project/116/Careers%20Wrap%20Up%20Presentation%20%28Rainier%20Delarosa%29.pdf Wrap Presentation Date: 07/14/2021 Project Milestones: - Milestone Title: Beginning Milestone Description: Review Python code, become familiar with and set up project, and test setup; CAREERS project Launch presentation Completion Date Goal: 2021-04-01 Actual Completion Date: 2021-05-03 - Milestone Title: Middle Milestone Description: Rewrite existing API as needed, document inside code, upload code to GitHub. Completion Date Goal: 2021-06-14 Actual Completion Date: 2021-07-02 - Milestone Title: End Milestone Description: Develop wiki documentation with use-cases for users to understand and use the Python API with user testing; CAREERS project Wrap-up presentation. Completion Date Goal: 2021-07-14 Actual Completion Date: 2021-07-14 Github Contributions: {Empty} Planned Portal Contributions (if any): {Empty} Planned Publications (if any): {Empty} What will the student learn?: End-user support, Python/R programming, Utilizing Big Data software, Data Ingest/Engineering experience What will the mentee learn?: {Empty} What will the Cyberteam program learn from this project?: Effort involved in recruiting and training HPC resources needed to complete this project?: NRP/PRP cluster access will be sufficient. University of Delaware Caviness cluster and DARWIN cluster access if available. Notes: {Empty} Final Report ------------ What is the impact on the development of the principal discipline(s) of the project?: Improves data access capabilities for researchers (specifically those researchers participating in the Data Core Working Group of the NSF EPSCoR funded Project WiCCED. Also serves as a means of technical documentation (of sorts). What is the impact on other disciplines?: Improved ability to leverage data holdings for ML and data analytics (in any discipline). Is there an impact physical resources that form infrastructure?: No impact on physical resources. Is there an impact on the development of human resources for research computing?: Not applicable for this particular project. Is there an impact on institutional resources that form infrastructure?: Not applicable. Is there an impact on information resources that form infrastructure?: Development of the Python API and documentation serve as a means to improve access to data holdings based on the software infrastructure. Is there an impact on technology transfer?: The documentation created for this project as well as the Python API code was submitted to GitHub. This will assist in the technology/software transfer of knowledge to those members working with data on the current hardware/software configuration. Is there an impact on society beyond science and technology?: In general, the CAREERS project is valuable for the mentee development and applied learning opportunities that cannot be replicated in the classroom setting. The Project WiCCED ML CAREERS project results will contribute to efficiency of data access and analysis for Project WiCCED participants. In doing so, this will have an impact on the science that is produced to inform state and local policy with regards to water availability and security in Delaware. Lessons Learned: Overall, the project resulted in both a Python API uploaded to GitHub and the development of a GitHub wiki to improve researcher access to current data holdings relating to Project WiCCED. The student facilitator from the project learned how to: Review and test existing code Recognize what new features were needed and to add new methods Document inline code so that the code is more meaningful to future users Submit code to GitHub and collaborate Develop a GitHub wiki and technical documentation, including style formatting Review, test, and edit technical documentation Overall, the facilitator developed working knowledge of GitHub, Python libraries, and Project WiCCED data access needs On a more personal note, the student facilitator learned more about his learning style and that his lack of self-confidence could become a communication barrier. An improved review of learning styles before matching facilitators to a project might help to reduce possible personal impediments. Overall results: Contributions made to the project: Improvement of an existing Python API that includes inline documentation: https://github.com/mshatley/epscor A Getting Started technical guide to assist researchers with configuration and access to the Python API: https://github.com/mshatley/epscor/wiki/Getting-Started GitHub wiki that documents the usages of the methods within the Python API: https://github.com/mshatley/epscor/wiki