Submission information
Submission Number: 130
Submission ID: 228
Submission UUID: b482ba13-6da9-4325-a459-2fffd9c0c277
Submission URI: /form/project
Created: Sun, 12/05/2021 - 13:17
Completed: Sun, 12/05/2021 - 13:17
Changed: Tue, 08/09/2022 - 15:16
Remote IP address: 74.103.220.121
Submitted by: Gaurav Khanna
Language: English
Is draft: No
Webform: Project
Project Title | Understanding Covid-19 Pandemic through Social Media Discussion |
---|---|
Program | CAREERS |
Project Image |
![]() |
Tags | ai (271), data-analysis (422), natural-language-processing (274), programming (5), programming-best-practices (49), python (69) |
Status | Complete |
Project Leader | Suhong Li |
sli@bryant.edu | |
Mobile Phone | |
Work Phone | |
Mentor(s) | Suhong Li |
Student-facilitator(s) | Brenna Rojek |
Mentee(s) | |
Project Description | Dr. Li has been collecting covid-19 tweets since March 2020 and currently has about 1.2 billion tweets. She is still collecting the tweets and expects to have more in the future. This project focuses on the understanding of the impact of covid-19 pandemic through social media discussion on Twitter. The following topics will be explored: 1). What are the top topics discussed regarding covid-19? How has the discussion of the topics changed over time? 2). What is sentiment/emotion of the topic by time, location, and gender? and 3). How to identify misinformation/fake news about covid-19. The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning. |
Project Deliverables | |
Project Deliverables | |
Student Research Computing Facilitator Profile | |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | |
Mentee Programming Skill Level | |
Project Institution | Bryant University |
Project Address | |
Anchor Institution | CR-University of Rhode Island |
Preferred Start Date | |
Start as soon as possible. | No |
Project Urgency | Already behind3Start date is flexible |
Expected Project Duration (in months) | |
Launch Presentation | |
Launch Presentation Date | 03/09/2022 |
Wrap Presentation | |
Wrap Presentation Date | 07/20/2022 |
Project Milestones |
|
Github Contributions | |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | |
What will the student learn? | |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | This project focuses on the understanding of the impact of the covid-19 pandemic through social media discussion on Twitter and explore a dataset of over 13 million tweets with the keywords related to covid-19 and ‘vaccine’ or ‘vax’, spanning from March 2020 to February 2022. Due to the size of the data, the analysis was done on the Unity cluster. Various analysis, including topic modelling and emotion analysis were conducted to understand how the topic of the vaccine was discussed in Twitter, how the discussion of the topics changed over time and what is people’s emotion regarding this topic and how it differs by time and location. The project explores the possibility/challenges of running state of the art natural language processing algorithm on a big data set using HPC. |
What is the impact on other disciplines? | This project contributes to our knowledge in the field of psychology and health care. The result of this project will provide insights on people’s attitude and emotion toward covid-19 vaccination, how such emotion differs by time and location. This finding helps understand the psychological impact of the pandemic and may facilitate the adoption of covid-19 vaccination. |
Is there an impact physical resources that form infrastructure? | None |
Is there an impact on the development of human resources for research computing? | |
Is there an impact on institutional resources that form infrastructure? | None |
Is there an impact on information resources that form infrastructure? | None |
Is there an impact on technology transfer? | None |
Is there an impact on society beyond science and technology? | As mentioned previously, the project is timely and will deepen our understanding of the impact of covid-19 pandemic by identifying dominant topics discussed and people’s emotions associated with this topic. |
Lessons Learned | The student (Brenna Rojek) working on this project was able to learn start-of-art natural language processing algorithms and learn to use GPU cluster. Due to the large data size, it takes a very long time (more than one week) to process all data. A better approach needs to be developed to scale the data better in the future. |
Overall results | The four emotions (joy, optimism, sadness, and anger) were extracted from each tweet using Huggingface Carddiff NLP emotion model. The results show the dominant emotion regarding covid1-19 are anger and sadness. In addition, people’s emotion toward covid-19 vaccination change over time. There is a substantial increase in anger since August 2021 toward the discussion of covid-19 vaccination. In addition, some states (Arizona, Wyoming, and Florida) also show a higher level of anger compared to other states. https://public.tableau.com/app/profile/brenna.rojek/viz/shared/KYCRFGDWT |