Name:	Michael Bsales
Email:	mbsales@nd.edu
Home Institution:	University of Notre Dame
Project:	Projects in Data Analysis
Mentor:	Dr. Christie Nelson

About My Project:

I will be researching current and prospective Security Check System protocols and technologies in use by mass transit systems with a focus on the challenges posed by subway systems. My goal is to find and recommend a security check system which is able to reduce the threat of terrorism and mass violence in metro systems without affecting transit system throughput or use.

Research Log

Week 1:

After moving in, we began reading literature pertaining to metro system security checks, computer vision on public transport, and security check systems at event venues. The goal of this initial literature review was to better formulate our problem statement and prepare our initial presentation for the REU cohort by understanding what is currently done in public transit SCS and what may be possible in the present and near future.

Week 2:

The presentation, given on Monday to the entire DIMACS REU cohort, made it clear that our project's goal of improving safety in public transit easily resonates with people. We continued a literature review while also diving into our available datasets on terrorism, crime, and general transit. Through our research this week, we have narrowed in on a summer long goal of creating a mass-casualty crime risk model which will help to direct finite resources to the most at-risk subway stations.

Week 3:

This week we focused on risk modeling and security check technologies. I investigated multiple types of risk models for transit and other contexts. I also researched screening alternatives to walk through metal detectors, looking for advantages in screening time and robustness.

Week 4:

I used this week to dive headlong into our data. I started to clean 8 datasets this week, wrangling, disentangling and reconnecting them to make sense of it. I then started to make simple visualizations in Tableau. This was my first time working with mapping in Tableau, but after a short tutorial from their website I was able to create some interesting maps illustrating the prevelence of certain 911 dispatch events across NYC.

Week 5:

When looking to expand the scope of the visualizations I was making last week to include hate crimes, I found that reorganizing the manner that I cleaned and segmented my datasets would be very helpful. Thus, I broke up the 911 dispatch dataset into its different event descriptions, such as Train Order/Maintanance Sweeps or Suspicious Packages. This meant I could work with smaller datasets which easier to load and use. The NYC Hate Crime Data I have access to could be very useful, as many mass-casualty events are motivated by perpetrator biases. However, the dataset doesn't have any location data. I found that the Hat Crime dataset originated from another NYC crime dataset which notes all felonies, misdimeanors, and other violations in NYC. I've managed to join these datasets over a few common attributes, only losing a small number of hate crimes in the process.

Week 6:

This week I really expanded the number of visualizations I was able to make. Before being able to predict and create a model for the data statistically, I have to understand it visually. I cannot ask any good questions without some baseline knowledge. Looking into the hate crimes, I noticed a huge increase in Anti-Asian hate crimes in March, which aligned with a well reported national uptick in such crimes. I also saw the locality of those crimes. I wonder if a risk model for train stations could incorperate the risk of vulnerable populations and temporary changes in bias.

Week 7:

Following the trail of assessing station risk based on vulnerable communities, I am focused on adding census data to my dataset. I found an API on the FCC's website which allows to find a census block from a longitude/latitude coordinate. This is great, as census blocks in New York City are precise geographically but still contain a lot of people. This should allow me to use it as a rough clustering method and to make some quick and easy visualizations with 911 dispatch, hate crime, and census data. However, the massive size of my datasets (29million+ instances) means that it's not quite as simple as I once thought. I've used multithreading and some other techniques to speed the process up about 98%, but 29 million API calls still takes a long time!

Week 8:

Since the cohort of Prague students are leaving at the end of this week, the final presentations for the REU are at the end of week 8. The middle part of the week was used to construct a comprehensive slide deck and presentation of everything we have done. We are designing the slide deck to have multiple uses: one as the final presentation for the REU and another as a summary of all of the work we have done. As such, there are lots of pieces of the longform presentation which aren't getting included when we present on Friday, including plenty of in depth literature review which I hope to include in my paper!

Week 9:

The final week of the DIMACS REU! To wrap up this tremendous program we are writing our final paper. Until I was able to write up and review it, I did not understand how much I had done this summer. I never thought I would come to understand and learn so much about security check systems, transit security, or risk assessment systems.

Acknowledgements:

This project has been made possible by the NSF HDR TRIPODS award CCF-1934924.

About Me: