DIMACS
DIMACS REU 2018

General Information

me
Student: Ryan Gross
Office: CORE 448
School: Rutgers University
Degree: B.A. Mathematics and Statistics, Minor in Economics
E-mail: ryan.gross@rutgers.edu
Project: Approximate computing: An effective likelihood-free method with statistical guarantees

Contents


Project Description

Background
Approximate Bayesian computing (ABC) is a powerful likelihood-free method that has grown increasingly popular since early applications in population genetics. However, complications arise in the theoretical justification for Bayesian inference when using ABC with a non-sufficient summary statistic. In this project, we study a new computational technique called 'approximate confidence distribution computing' (ACC), that yields theoretical support for the use of non-sufficient summary statistics in likelihood-free methods.
Data
Madison Square Park is an outdoor public park in the heart of midtown New York City traversed by thousands of people every day. Park directors have provided us with a dataset concerning daily entrance/exit counts from 2015-2017. However, the dataset is incomplete; due to financial constraints, only 2 of the 9 entrances at a time are equipped with infrared sensors that measure traffic through the particular entrance. One counter is stationary at a single entrance, and the other rotates among the other 8 entrances on a biweekly basis. Moreover, seasonal patterns, special events, and inconsistencies with the counters introduce further variability into the data.
Goals
Given the incomplete data, our goal is to estimate the total number of people using the park over a given time period. We must find an appropriate method to model the pedestrian traffic data, and then utilize approximate computing techniques (ABC, ACC) to run simulations that produce an accurate estimate.


Weekly Log

Pre-REU:

Prior to the official start of the REU program, I met several times with my mentors, Dr. Minge Xie and Suzanne Thornton. They introduced me to approximate computing, specifically their research on the ACC method. We also looked at the Madison Square Park data and brainstormed ways to approach this project. I started to explore the datasets and improve the layout, ultimately combining everything into a single dataset that would be easy to analyze and run simulations on in the future.

  • Main Accomplishments
    • Developed an understanding of the purpose and goals for this project
    • Cleaned up the park datasets
    • Began learning and practicing with R, the programming language we will mainly use for our statistical analysis and approximate computing methods
  • Goals for week 1
    • Use statistical methods and programming to analyze the dataset
Week 1:

Having been introduced to the problem and the dataset by the start of the REU, I now conducted an initial analysis of the data. This included calculating summary statistics, determining trends and inconsistencies, and compairing time series steps. We will come back to this analysis once we begin the approximate computing stage of the project, as the ACC method relies on prior estimations to improve its computational efficiency. I also began reading papers on approximate computing techniques, specifically focusing on Minge and Suzanne's paper on the ACC method [1]. With this information, I created an initial presentation describing the background and goals of the project, to be presented next week. Finally, I began learning the basics of HTML in order to create this website.

  • Main Accomplishments
    • Ran an initial data analysis on the Madison Square Park entry/exit data
    • Developed better understanding of approximate computing methods
    • Created intial presentation and website
  • Goals for next week
    • Explore potential models on pedestrian traffic flow
    • Build off of the initial data analysis to determine the best summary statistics and time series steps used in the ACC method
Week 2:
In this week, we focused on the modeling aspect of the project. I began reading papers on potential models that we could use to accurately describe the pedestrian traffic flow throughout each entrance to the park. Many models have applications to pedestrian traffic flow, including stochastic transition models and spatial point processes, among others. Then once a proper model is determined, we will use approximate computing methods to run simulations with the actual data to come up with an accurate estimation. I also continued to build off the data anaylsis from last week, now calculating and comparing the rates of entry/exit through each location. These rates will likely be used as the guiding summary statistics for the ACC method.
  • Main Accomplishments
    • Developed better understanding of pedestrian traffic flow models
    • Analyzed rates of entry/exit to add to the summary statistics
  • Goals for next week
    • Focus in on specific models that could accurately fit our data
    • Build off of the initial data analysis to determine the best summary statistics and time series steps used in the ACC method
Week 3:
Having been unable to hone in on a specific model for our data, we set out to seek help in understanding pedestrian traffic flow, given that Dr. Xie and Suzanne do not have much experience in this field. Fortunately, Dr. Benedetto Piccoli of Rutgers University-Camden, who has written multiple texts on pedestrian traffic, agreed to meet with us to discuss the problem. After learning about our particular problem, he determined that a multiscale micro-macro model would be appropriate. After our discussion, I sent him a more thorough report detailing our problem and the analysis done so far, and he plans to send us further information about the model and relevant code. At the end of the week, I read further into Dr. Piccoli's text [2], and am awaiting the next steps of this project.
  • Main Accomplishments
    • Decided on the model to be used for our data
    • Began learning about pedestrian traffic flow through the lens of a multiscale model
  • Goals for next week
    • Receive the model code from Dr. Piccoli and learn how to compile it
    • Fit the model to our particular problem to set it up for running simulations on our data
Week 4:
This was a slow week of research while waiting to receive the code for the multiscale model. In the meantime, I continued to read about multiscale modeling for traffic flow through Dr. Piccoli's paper [3]. I also found out that the code would be written in C, so I began to learn the basics of C and how to compile through the command line. Moreover, I used this time to focus on GRE studying and exploring my outside interest in sports analytics using R. Finally, at the end of the week Dr. Piccoli sent us his code, which I will dive into next week.
  • Main Accomplishments
    • Learned more about multiscale modeling in relation to traffic flow
    • Learned how to compile C code, to be used in running our model
  • Goals for next week
    • Fit the model to our park dimensions and run simulations on our data
Week 5:
After receiving the model code, the first thing I noticed was that all comments and variable names were written in Italian. So I relied heavily on Google Translate as I went through reading and understanding the code. Then, I ran the base model before making any modifications, locating and fixing the errors inherent in the code. Finally, the model outputted two images of vector fields that illustrated inflow and outflow of pedestrian traffic within a rectangular grid. With this setup, the next step was to add obstacles to the field to match the walkways and barriers present in Madison Square Park. After doing so, I produced vector fields that seemed to approximate the directional flow of traffic in the park. However, this did not account for the specific rates of inflow and outflow through each entrance/exit, which will be the next step of the process.
  • Main Accomplishments
    • Ran Dr. Piccoli's code of a multiscale model and developed an understanding of how it works
    • Adjusted the code to fit the dimesions of our park
  • Goals for next week
    • Adjust the rates of flow through each entrance/exit to match the rates present in our data
    • Use the model to run simulations and compare to our data
Week 6:
I started this week by readjusting the park model and honing the code. Then, I attempted to run the program that creates a visual simulation of the the vector field as it moves through the park. However, this simulation was not working properly, as it seemed unable to read through the outputted data files produced from the main code. So, much of my week was spent troubleshooting this code. Meanwhile, I read more in-depth into Dr. Piccoli's relevant papers to better understand the background concepts used to construct this model.
  • Main Accomplishments
    • Adjusted model and worked on running the simulation
  • Goals for next week
    • Second presentation (prepare for and present)
    • Fix simulation program
Week 7:
Most of this week was spent preparing for the second presentations. This required me to go through and organize all the progress I have made thus far to condese it into a relatively short presentation. Consequently, I recalled a lot of good insight that I made earlier on in my initial data analysis (from before we moved onto the modeling and simulation), which I can use to further guide my current model and the approximate computing later on. Furthermore, after emailing Dr. Piccoli to update him of my progress and questions, he finally sent a missing file that is needed to run the simulations. I didn't have time this week to test it out, but hopefully that should fix the issues. Finally, we gave our second presentations to the full REU over the course of two days. My presentation personally went great, and then I got to enjoy learning about everyone else's projects as well.
  • Main Accomplishments
    • Organized my progress to set up the last remaining weeks of the program
    • Prepared for and gave second presentations
  • Goals for next week
    • Fix simulations using updated code
    • Collaborate with mentors on my progress and next steps
Week 8:
This week was spent working with the simulations, which were now working correctly. Particularly, when the code is properly run, it outputs a "movie" consisting of hundreds of instances of time as objects (pedestrians) move through a grid (park). The initial conditions, including density and location of the objects, can be adjusted, as well as the locations and sizes of the obstacles. I also went back to the initial analysis to determine the relative rates of entrance/exit through the entrances. Then, I scaled the sizes of the corresponding entrances to reflect those relative rates, which allow more pedestrians to pass through the more popular gates. However, this can continue to be adjusted as the ACC method is used to produce better estimates. I updated Dr. Piccoli of this progress, and we scheduled a meeting with him next week at his office in Rutgers- Camden. Finally, I began working on the final report, as next week is the official last week of the REU program.
  • Main Accomplishments
    • Successfully ran the simulation
    • Adjusted the park model and initial conditions of the simulation to better reflect our initial data
  • Goals for next week
    • Meet with Dr. Piccoli
    • Work on final report
Week 9:
This week, Dr. Xie, Suzanne and I traveled to Rutgers- Camden to meet with Dr. Piccoli in person for the first time. We informed him of our progress and recieved his insight on the traffic flow portion of the project. Together, we outlined the next steps of this project to plan where to go from here. This includes diving deeper into the ABC and ACC methods to produce better estimates of the data. Accordingly, this was the final week of the REU program, so I wrapped up those final responsibilities, including working on the final report. However, for my particular project we plan to continue working through the rest of the summer and into the fall semester. Overall the REU program was a great experience, and I certainly learned a lot about data analysis, statistical research methods, and the nature of academic research as a profession.
  • Main Accomplishments
    • Met with the whole team, reviewed progress, and planned next steps
    • Worked on the final report
  • Goals for future
    • Continue updating park model and approximate computing methods
    • Put together research paper and submit for publication
Post-REU:

References


Presentations


Additional Information