Data Analysis:

Improving Security for Mass Transit in NJ

Project Summary

The current state of surveillance and security measures in public transit, such as high-traffic subway and train stations, is very vulnerable to mass casualty and terrorist attacks. In recent years, since the start of the pandemic, violent crime rates (e.g., shootings) have grown, impacting public transportation use. Exploring new and emerging screening technologies and publicly available datasets on NYC and NJ crimes, transportation, and census data can be utilized to create a predictive model that assesses which stations are at greater risk. This would involve using a risk analysis model to make these predictions. Overall, the end goal of this project is to use the predictions and understanding of different types of active and passive security measures to improve stations without adversely affecting passenger throughput.


Thank you to the HDR TRIPODS award CCF-1934924 for funding this project and the DIMACS REU program, and my mentor, Dr. Christie Nelson, for this opportunity.

Week 1: May 31 - June 4

All the DIMACS participants moved into the apartments and attended various introduction meetings and workshops. After settling in, Michael and I met up with our mentor, Dr. Nelson, and were given various literature reviews to fine-tune our group project. We were asked to explore the current approaches to this problem through literature review, so we reviewed various approaches such as aviation-style screening measures, public transit security around the world using machine learning, and passive security measures.

Week 2: June 5 - June 11

On Monday, all the REU participants put together short presentations to explain their projects and their goals for this program. Michael and I presented that we will be focusing on how to improve safety in public transit, specifically for detecting terrorist acts. Throughout this week, we continued our literature review and started the data review. The data review consisted of researching and checking what publicly available datasets we could use, such as crime, demographic, terrorist, and general public transit info in NYC and NJ regions. This week was mainly used to continue to fine-tune our summer-long goal for this project to create a mass-casualty crime risk model for high-risk subway and train stations.

Week 3: June 12 - June 18

Based on the last two weeks of literature review, we needed to start researching different types of security measures and risk models to analyze the probability of terrorist/mass casualty attacks in public transit. Dr. Nelson gave us some resources to start off with some known and well-used risk analysis models such as the Risk Terrain Model and Route Risk Model. I researched this risk model as well as a Road Risk Model, though it is used for crash prediction, the risk model in this paper [Paper link] helped us understand how to start structuring our own model. We also sent out a few emails to researchers in the field of transportation security to ask about their work and their risk analysis models.

Week 4: June 19 - June 25

Once there was more clarity on how to approach the risk model for terrorist/mass casualty attacks, we split up the datasets for NYC and NJ regions. I am working with the NJ data; the biggest difference compared to the NYC data is that there is very little data available. These datasets include subway/train routes, stations, census information, traffic density, etc. Using all these types of datasets allows us to merge, filter, and search through them to find important patterns and trends in public transit. I explored and worked heavily with FME (Feature Manipulation Engine) to find these trends and Tableau to create visualizations of the datasets.

Week 5: June 26 - July 2

During our weekly meeting, Dr. Nelson showed us more datasets that could be used for NJ. Up until now, the issue was that NJ public transportation and overall data on the state is very limited due to the population when compared to densely populated states like New York. However, I was able to overcome this problem when I started to explore a dataset on hate crime in NJ. This dataset has 20,000 entries from the years 1991 to 2020; this helped further prove certain trends in crimes before and after COVID and datasets. Using this hate crime dataset and merging it with other datasets such as domestic violence and gun violence would be very useful when creating the risk analysis model. For now, I am just cleaning up all the crime datasets by finding the latitude and longitude coordinates of the crimes using a Python script.

Week 6: July 3 - July 7

Now that all the datasets (crime, census, transportation, etc.) are correctly formatted, I can focus on creating the risk analysis model. By using all the crime datasets, I will be working to give weights based on the severity of the crime and find the distance of certain crimes from trains throughout NJ. Once this is done, I can determine what security measures NJ needs for its stations. Using the knowledge gained from our literature review, I can further explain the best types of security measures for NJ stations. So, this past week, I spent some time coding and experimenting with the best models and algorithms to solve this problem.

Week 7: July 10 - July 14

The first step to creating a risk model is being able to cluster crime datasets in a meaningful way. This week, I was able to do this using k-means clustering on the hate crime, gun violence, and terrorist datasets individually, which shows the density of crimes and where they tend to happen in NJ. The next step would be to use the merged crime datasets, cluster them, and then start clustering all the crimes based on their location to the nearest transit station.

Week 8: July 17 - July 21

This week, Michael and I worked on finishing up some visualizations for the final DIMACS presentations. We were able to create an excellent presentation highlighting all the key findings about public transit in the NYC and NJ regions, and this was possible with Dr. Nelson's input. The final presentations took place on the mornings of Thursday and Friday. After the Thursday presentations, we were given our DIMACS shirts, and since then, we have all been trying to solve the puzzle on the shirts!

Week 9: July 24 - July 29

This is the final week of the project, and I am working on trying to finish up the last-minute requirements for DIMACS, but no need to worry about my progress on this project. Michael and I are going to continue working with Dr. Nelson to complete this risk analysis model :)

Results: Data Analysis and Visuals

The images below are visualizations of some of the NJ datasets. I created these images using FME, Tableau, and Python.

About Nithya Nalluri

Name: Nithya Nalluri
Home Institution: The College of New Jersey
Project: Data Analysis
Mentor: Dr. Christie Nelson
REU Student Collaborator: Michael Bsales