|Home Institution:||Sonoma State University|
|Mentor:||Dr. Christie Nelson|
|Project:||Topic Modeling and Sentiment Analysis of COVID-19 Press Conference Transcripts|
This project is focused on the current pandemic of COVID-19. In particular, our goals are to apply topic modeling and assess sentiment towards COVID-19 via press conferences by state to predict outbreaks or diminishing cases of COVID-19.
This was a very productive week as I spent most of my time reading articles related to machine learning techniques as well as sentiment analysis tied to social media. Additionally, I was able to create and refine the methodology for this project with the help and guidance of my mentor, Dr. Christie Nelson.
For this week, one of my goals were to obtain various datasets that pertained to my project. I am happy with the datasets that I found, and the datasets cover a wide time frame. The next step is to understand how to apply machine learning techniques to the data.
I used the machine learning technique of Latent Dirichlet Allocation (LDA) to analyze the datasets I found for my project, and I utilized R to conduct this process. This was the first time working with machine learning. I was happy that I was able to develop a running model for the data.
I finished running LDA on my first dataset and had to determine which visualization techniques are best for the data. Additionally, this week I read a handful of articles pertaining to resource allocation models as well as health surveillance. As a result of reading these articles, I developed a better understanding for the potential next steps for the project.
Since I obtained a new dataset this week, I cleaned most of the data and ran LDA by using R. With this particular set of data, 22 various states were involved resulting in the processing of this data such as converting to the proper file type and cleaning being a lengthy procedure. I was also able to generate relevant word frequency visualizations for the data. The goal for the upcoming week is to have 4 more states analyzed.
I continued running LDA and visual word frequencies on the remaining datasets I found. The goal is to accomplish this phase of the project by this week or next week. The consequent goal is to conduct sentiment analysis as well as mapping those results.
The beginning of the week I brainstormed ways that I could potentially use the results that I found. To help with this process, I referred to prior studies conducted. Towards the end of the week my mentor and I developed a plan in regard to running sentiment analysis.
With the help of various online resources, I was able to generate sentiment analysis for one of the states. I may look for additional ways to conduct sentiment analysis since there are many avenues one can take in regard to this type of analysis.
Time has flown by! It is hard to believe that the program is now completed. I have learned so much such as strengthening my skills with R! During the week, I finalized my report and presentation.
This work was carried out while the author, Therese Azevedo, was a Rutgers IC CAE Research Fellow participating in the 2020 DIMACS REU program, supported through the DIA grant for Intelligence Community Centers for Academic Excellence - Critical Technology Studies Program.