Research Project: Covid-19 Data Analysis and Risk Management Assessment

DIMACS REU

Logs by Rachael Tovar

Weekly Updates

Week 1: (5.26.2020 - 5.31.2020)
My mentor, Dr. Nelson, and I discussed potential research projects. We decided on researching resource allocation during the Covid-19 Pandemic. Researching a variety of subjects was the first step: Medical ethics, various forms of text regression, and medical ethics in a pandemic. With this information, I started to create various model plans for my research project and began to create a presentation illustrating my work for my peers.
Week 2: (6.1.2020 - 6.7.2020)
The beginning of week two marked the finalization of my presentation, as well as presenting my work done in week one to my peers. This week will encompass the bulk of my data collection using a variety of tools and resources such Twitter API (application programming interface) and the CDC's comprehensive database. Also utilizing Tableau software to understand data and visualize it.
Week 3: (6.8.2020 - 6.14.2020)
More into the data gathering process for week three. Population, state area, and county area was collected to calculate population density of each county and each state. Tableau was utilized to visualize how rate of positive test results increased as time progressed, as well as visualizing Mobility Patterns for each county. This week also marked the start of writing code to average week data for states and county (starting on February 15th, 2020). Weekly Data includes (for States):
  1. Mobility Changes from baseline
    • Retail
    • Grocery
    • Parks
    • Transit
    • Workplaces
    • Residential
    • Averages of all mobility data
  2. Positive and Negative Test Results
  3. Regulations from State Authorities
Week 4: (6.15.2020 - 6.21.2020)
Data collection for this week was focused mainly on medical facilities and care facilities. Looking to see what qualified as a medical/care facility, what said facilities needed to PPE, and when to use the PPE. Another aspect of the data collection and project is finding out how to safely and effectively use PPE (optimization of PPE). Need for PPE increases as exposure risks increase. Information of this nature was collected from the CDC and OSHA guidelines.
Week 5: (6.22.2020 - 6.28.2020)
Data cleaning occurred in week five. For counties, weekly entries were made and any missing data was filled. I was able to get approved to be a twitter developer, and create an app that allowed me to utilize Twitter API (Application Processing Interface). I collected data sets of Tweets relating to Covid-19, these data sets contained Tweet IDs (unique number correlating to a tweet, that functions as an ID). Twitter does not allow for the distribution of full JSON datasets containing all information from tweet to third parities, but Twitter does allow the distribution of data sets only containing Tweet IDs. These data sets are considered to be filled with "dehydrated Tweets", and this week I was able to "hydrate" these datasets to looks at all of the information that was contained in them. These hydrated Tweets now contains:
  • Date posted and time stamp
  • Coordinates of the where the Tweet was made
  • The text of the Tweet itself and hashtags
  • Description of the users (bios)
  • Screen names of users
  • Follower count of users
With this information I will be able to preform LDA to help in the resource allocation part of my project.
Week 6: (6.29.20 - 7.5.20)
This week marked the start of statistical modeling with LDA (Latent Dirichlet allocation). Which draws connections to different subjects based on importance and frequency in the text it is preformed on. For the newly rehydrated Twitter IDs, I preformed LDA on weekly collections. Then, I preformed LDA again on the weeks, disregarding "grab" words or, words that would probably be the most frequent in the text but the least helpful in providing insight on this different needs of the people who sent out the tweets. These words included, “corona”, "coronavirus", "covid", "pandemic", and the different variants seen from "sarscov2", "nCov", "covid-19", "ncov2019", and "2019ncov". After disregarding these words, different information on word frequency was seen. Words like "mask" and "help" were seen as LDA topics.
Week 7: (7.6.20 - 7.12.20)
Word clouds After LDA processing, word frequency was taken into account, which entailed iterating through the text of the Twitter Data and calculating word frequency to created word clouds for each week of the data, this word frequency just like the previous process of LDA disregarded "grab" words. After plotting the number of Geo-tagged tweets per week in comparison to the number of cases in the US, there seemed to be a inverse relationship. Meaning where there were spikes in cases, there was dips in the number of Tweets - this may indicate a lag in public response to the number of cases.
Week 8: (7.13.20 - 7.19.20)
After preforming LDA and word frequency analysis, data was collected regarding State mandates. Data sets were downloaded as well as transcripts from governors of states to create a large database of mandates with the dates they were issued by state. These mandates included:
  • Masks Usage and Enforcement
  • School Closures
  • Stay-At-Home orders
  • Non-Essential Buisness Closures
  • Resturant Closures
  • Bar Closures
These state mandates were then put through a grading system created by me, based on risk of person to person transmission.
  • Masks Usage and Enforcement
    1. No mandates issued
    2. Mask mandate
    3. Mask mandate enforced
  • School Closures
    1. No mandates issued
    2. Closed K-12s
    3. Closed Day cares
    4. Reopened Day cares
  • Stay-At-Home orders
    1. No Mandates issued
    2. Stay-At-Home mandate
    3. Ended or Relaxed mandate
  • Non-Essential Buisness Closures
    1. No Mandates issued
    2. Closure of all non-essential business
    3. Re-open with no masks, employees or otherwise
    4. Re-open with masks, employees only
    5. Re-open with masks, employees and public
    6. Re-open with masks, employees and public enforced
  • Resturant Closures
    1. No Mandates issued
    2. Closure of food establishments, except take-out
    3. Re-open with no masks, employees or otherwise
    4. Re-open with masks, employees only
    5. Re-open with masks, employees and public
    6. Re-open with masks, employees and public enforced
  • Bar Closures
    1. No Mandates issued
    2. Closure of all non-essential business or re-closure
    3. Re-open with no masks, employees or otherwise
    4. Re-open with masks, employees only
    5. Re-open with masks, employees and public
    6. Re-open with masks, employees and public enforced
States with higher grades had more mandates to combat person-to-person transmission.
Week 9: (7.20.20 - 7.24.20)
Week 9 entailed creating a risk calculator for individuals in counties. This was calculated by getting weekly counts of cases seen in each county, dividing by the population of the county, and multiply by 100. These Calculations where then put into visualization software.