Week 1: May 31 - June 4
All the DIMACS participants moved into the apartments and attended various introduction meetings and workshops. After settling in, Michael and I met up with our mentor, Dr. Nelson, and were given various literature reviews to fine-tune our group project. We were asked to explore the current approaches to this problem through literature review, so we reviewed various approaches such as aviation-style screening measures, public transit security around the world using machine learning, and passive security measures.
Week 2: June 5 - June 11
On Monday, all the REU participants put together short presentations to explain their projects and their goals for this program. Michael and I presented that we will be focusing on how to improve safety in public transit, specifically for detecting terrorist acts. Throughout this week, we continued our literature review and started the data review. The data review consisted of researching and checking what publicly available datasets we could use, such as crime, demographic, terrorist, and general public transit info in NYC and NJ regions. This week was mainly used to continue to fine-tune our summer-long goal for this project to create a mass-casualty crime risk model for high-risk subway and train stations.
Week 3: June 12 - June 18
Based on the last two weeks of literature review, we needed to start researching different types of security measures and risk models to analyze the probability of terrorist/mass casualty attacks in public transit. Dr. Nelson gave us some resources to start off with some known and well-used risk analysis models such as the Risk Terrain Model and Route Risk Model. I researched this risk model as well as a Road Risk Model, though it is used for crash prediction, the risk model in this paper [Paper link] helped us understand how to start structuring our own model. We also sent out a few emails to researchers in the field of transportation security to ask about their work and their risk analysis models.
Week 4: June 19 - June 25
Once there was more clarity on how to approach the risk model for terrorist/mass casualty attacks, we split up the datasets for NYC and NJ regions. I am working with the NJ data; the biggest difference compared to the NYC data is that there is very little data available. These datasets include subway/train routes, stations, census information, traffic density, etc. Using all these types of datasets allows us to merge, filter, and search through them to find important patterns and trends in public transit. I explored and worked heavily with FME (Feature Manipulation Engine) to find these trends and Tableau to create visualizations of the datasets.
Week 5: June 26 - July 2
During our weekly meeting, Dr. Nelson showed us more datasets that could be used for NJ. Up until now, the issue was that NJ public transportation and overall data on the state is very limited due to the population when compared to densely populated states like New York. However, I was able to overcome this problem when I started to explore a dataset on hate crime in NJ. This dataset has 20,000 entries from the years 1991 to 2020; this helped further prove certain trends in crimes before and after COVID and datasets. Using this hate crime dataset and merging it with other datasets such as domestic violence and gun violence would be very useful when creating the risk analysis model. For now, I am just cleaning up all the crime datasets by finding the latitude and longitude coordinates of the crimes using a Python script.
Week 6: July 3 - July 7
Now that all the datasets (crime, census, transportation, etc.) are correctly formatted, I can focus on creating the risk analysis model. By using all the crime datasets, I will be working to give weights based on the severity of the crime and find the distance of certain crimes from trains throughout NJ. Once this is done, I can determine what security measures NJ needs for its stations. Using the knowledge gained from our literature review, I can further explain the best types of security measures for NJ stations. So, this past week, I spent some time coding and experimenting with the best models and algorithms to solve this problem.
Week 7: July 10 - July 14
The first step to creating a risk model is being able to cluster crime datasets in a meaningful way. This week, I was able to do this using k-means clustering on the hate crime, gun violence, and terrorist datasets individually, which shows the density of crimes and where they tend to happen in NJ. The next step would be to use the merged crime datasets, cluster them, and then start clustering all the crimes based on their location to the nearest transit station.
Week 8: July 17 - July 21
This week, Michael and I worked on finishing up some visualizations on Tableau and adding them to the website. We met up with Dr. Nelson to discuss how to structure our final paper, which is based on a Risk Analysis Model for NJ Public Transit. So far, we have all the data cleaned and the analysis model structure. We also received feedback from some researchers in transportation security who explained how they created their models and the gaps in their approaches. The final step is putting everything together in our paper and getting some feedback from our peers.
Week 9: July 24 - July 28
This week, we focused on creating visualizations of different types of crime datasets for the final paper and writing the final sections of our paper. I added some final touches to the website, such as adding new graphs, the link to our final paper, and our weekly updates. Michael and I also practiced our final presentation, which we will be presenting to the DIMACS faculty next week.
Week 10: July 31 - August 4
Michael and I gave our final presentation on the last day of the program. Our presentation focused on the problem, our goal, and how we used risk analysis and machine learning models to solve this problem. The presentation went well, and we received a lot of positive feedback from our peers and mentors. Overall, this was a great experience that helped me understand how to approach and solve real-world problems using data analysis and machine learning.
Results: Data Analysis and Visuals
The images below are visualizations of some of the NJ datasets. I created these images using FME, Tableau, and Python.