Submission information
Submission Number: 156
Submission ID: 1744
Submission UUID: ed99afdf-d8fb-4166-aa1c-be2ff5a75ac8
Submission URI: /form/project
Created: Mon, 10/31/2022 - 20:56
Completed: Mon, 10/31/2022 - 20:56
Changed: Mon, 06/03/2024 - 13:05
Remote IP address: 71.200.179.51
Submitted by: Stanley Nwoji
Language: English
Is draft: No
Webform: Project
Project Title | Natural Language Processing of a Low Resource Language (Igbo, an African Language) |
---|---|
Program | CAREERS |
Project Image | |
Tags | |
Status | Complete |
Project Leader | Stanley Nwoji |
SNwoji@harrisburgu.edu | |
Mobile Phone | +1443.941.4064 |
Work Phone | +1717.901.5152 |
Mentor(s) | Iheb Abdellatif |
Student-facilitator(s) | Atajan Abdyyev |
Mentee(s) | |
Project Description | Though there are only 20 languages that fall into the high-resource category, most natural language processing (NLP) advancements have been accomplished in these 20 languages, excluding thousands of the low-resource languages spoken by millions of people in the world. It's not only a technological problem; equity is also in danger. This study seeks to fill this gap. The lack of low-resource language corpora and other linguistic resources is one of the causes of this knowledge gap. We must create a corpus of the African Igbo language to solve this problem. We will employ NLP machine and deep learning techniques to analyze the corpus. The outcome of this project could be applications like text categorization, information extraction, summarization, dialogue systems, and machine translation in the Igbo language. Currently, we have started building the Igbo_News corpus with Sketch Engine. |
Project Deliverables | Phase 1: Development of the Igbo Corpora from News content (~4 Weeks) Cleaning of the Corpora (~3 Weeks) Statistical Analysis of Corpora (~4 Weeks) Phase 2: Text Categorization using the Corpora (~3 Weeks) Information Extraction on the Corpora (~3 Weeks) Machine Translation using the Corpora (~4 Weeks) |
Project Deliverables | |
Student Research Computing Facilitator Profile | Student facilitator should posses the following: 1. High emotional intelligence to work with other students, the mentor, and the PI 2. Experience with the python language 3. Experience/Interest in NLP techniques and analyses 4. Good writing and communication skills. |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | Practical applications |
Mentee Programming Skill Level | |
Project Institution | |
Project Address | |
Anchor Institution | CR-Penn State |
Preferred Start Date | |
Start as soon as possible. | No |
Project Urgency | Already behind3Start date is flexible |
Expected Project Duration (in months) | 6 |
Launch Presentation | |
Launch Presentation Date | 09/13/2023 |
Wrap Presentation | |
Wrap Presentation Date | 05/08/2024 |
Project Milestones |
|
Github Contributions | |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | |
What will the student learn? | |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | |
What is the impact on other disciplines? | |
Is there an impact physical resources that form infrastructure? | |
Is there an impact on the development of human resources for research computing? | |
Is there an impact on institutional resources that form infrastructure? | |
Is there an impact on information resources that form infrastructure? | |
Is there an impact on technology transfer? | |
Is there an impact on society beyond science and technology? | |
Lessons Learned | |
Overall results |