DIMACS REU: Rachael Tovar

Weekly Updates

Week 1: (5.26.2020 - 5.31.2020)

My mentor, Dr. Nelson, and I discussed potential research projects. We decided on researching resource allocation during the Covid-19 Pandemic. Researching a variety of subjects was the first step: Medical ethics, various forms of text regression, and medical ethics in a pandemic. With this information, I started to create various model plans for my research project and began to create a presentation illustrating my work for my peers.

Week 2: (6.1.2020 - 6.7.2020)

The beginning of week two marked the finalization of my presentation, as well as presenting my work done in week one to my peers. This week will encompass the bulk of my data collection using a variety of tools and resources such Twitter API (application programming interface) and the CDC's comprehensive database. Also utilizing Tableau software to understand data and visualize it.

Week 3: (6.8.2020 - 6.14.2020)

More into the data gathering process for week three. Population, state area, and county area was collected to calculate population density of each county and each state. Tableau was utilized to visualize how rate of positive test results increased as time progressed, as well as visualizing Mobility Patterns for each county. This week also marked the start of writing code to average week data for states and county (starting on February 15th, 2020). Weekly Data includes (for States):

Mobility Changes from baseline

Retail
Grocery
Parks
Transit
Workplaces
Residential
Averages of all mobility data

Positive and Negative Test Results
Regulations from State Authorities

Week 4: (6.15.2020 - 6.21.2020)

Week 5: (6.22.2020 - 6.28.2020)

Data cleaning occurred in week five. For counties, weekly entries were made and any missing data was filled. I was able to get approved to be a twitter developer, and create an app that allowed me to utilize Twitter API (Application Processing Interface). I collected data sets of Tweets relating to Covid-19, these data sets contained Tweet IDs (unique number correlating to a tweet, that functions as an ID). Twitter does not allow for the distribution of full JSON datasets containing all information from tweet to third parities, but Twitter does allow the distribution of data sets only containing Tweet IDs. These data sets are considered to be filled with "dehydrated Tweets", and this week I was able to "hydrate" these datasets to looks at all of the information that was contained in them. These hydrated Tweets now contains:

Date posted and time stamp
Coordinates of the where the Tweet was made
The text of the Tweet itself and hashtags
Description of the users (bios)
Screen names of users
Follower count of users

With this information I will be able to preform LDA to help in the resource allocation part of my project.

Week 6: (6.29.20 - 7.5.20)

This week marked the start of statistical modeling with LDA (Latent Dirichlet allocation). Which draws connections to different subjects based on importance and frequency in the text it is preformed on. For the newly rehydrated Twitter IDs, I preformed LDA on weekly collections. Then, I preformed LDA again on the weeks, disregarding "grab" words or, words that would probably be the most frequent in the text but the least helpful in providing insight on this different needs of the people who sent out the tweets. These words included, “corona”, "coronavirus", "covid", "pandemic", and the different variants seen from "sarscov2", "nCov", "covid-19", "ncov2019", and "2019ncov". After disregarding these words, different information on word frequency was seen. Words like "mask" and "help" were seen as LDA topics.

Week 7: (7.6.20 - 7.12.20)

Word clouds After LDA processing, word frequency was taken into account, which entailed iterating through the text of the Twitter Data and calculating word frequency to created word clouds for each week of the data, this word frequency just like the previous process of LDA disregarded "grab" words. After plotting the number of Geo-tagged tweets per week in comparison to the number of cases in the US, there seemed to be a inverse relationship. Meaning where there were spikes in cases, there was dips in the number of Tweets - this may indicate a lag in public response to the number of cases.

Week 8: (7.13.20 - 7.19.20)

After preforming LDA and word frequency analysis, data was collected regarding State mandates. Data sets were downloaded as well as transcripts from governors of states to create a large database of mandates with the dates they were issued by state. These mandates included:

Masks Usage and Enforcement
School Closures
Stay-At-Home orders
Non-Essential Buisness Closures
Resturant Closures
Bar Closures

These state mandates were then put through a grading system created by me, based on risk of person to person transmission.

Masks Usage and Enforcement

No mandates issued
Mask mandate
Mask mandate enforced

School Closures

No mandates issued
Closed K-12s
Closed Day cares
Reopened Day cares

Stay-At-Home orders

No Mandates issued
Stay-At-Home mandate
Ended or Relaxed mandate

Non-Essential Buisness Closures

No Mandates issued
Closure of all non-essential business
Re-open with no masks, employees or otherwise
Re-open with masks, employees only
Re-open with masks, employees and public
Re-open with masks, employees and public enforced

Resturant Closures

No Mandates issued
Closure of food establishments, except take-out
Re-open with no masks, employees or otherwise
Re-open with masks, employees only
Re-open with masks, employees and public
Re-open with masks, employees and public enforced

Bar Closures

No Mandates issued
Closure of all non-essential business or re-closure
Re-open with no masks, employees or otherwise
Re-open with masks, employees only
Re-open with masks, employees and public
Re-open with masks, employees and public enforced

States with higher grades had more mandates to combat person-to-person transmission.

Week 9: (7.20.20 - 7.24.20)

Week 9 entailed creating a risk calculator for individuals in counties. This was calculated by getting weekly counts of cases seen in each county, dividing by the population of the county, and multiply by 100. These Calculations where then put into visualization software.