Nigel Seymour DIMACS REU 2024

DIMACS REU 2024

General Information

Student:	Nigel Seymour
Office:	440 CoRE Building
School:	University of Maryland- Baltimore County
E-mail:	nigels1@umbc.edu
Mentor:	Anand Sarwate

Project Description

Differential Privacy and Visualization:
Differential privacy is mathematical framework for quantifying the privacy risks when computing using sensitive data about individuals. This is a statistical/probabilistic notion of privacy which can be used to generate privacy-protecting summaries of data. The 2020 US Census used differential privacy to publishing results from the 2020 decennial census and many government agencies are trying to learn how to use differential privacy for their work. In many cases, private data is summarized through visualizations and descriptive statistics. Much of the work on differential privacy has focused on machine learning and data publishing. The student on this project will survey existing work in differential privacy and private visualization, choose a type of visualization, and evaluate the methods on data sets. The goal is to implement different data visualization techniques and see how privacy affects our qualitative (visual) understanding of the structure of data.

Weekly Log

Week 1:

On my first day I moved into the apartment and met other DIMACS peers and played some games with them. The following day I met with my PI and we discussed the project and our goals for the summer. The first goal is to learn different ways to measure correlation and then implement/test them on data. But before I can learn that I have to read up on differential privacy, and I also read a few chapters of a Probability textbook. Over the weekend I worked on my presentation for the following week. I also explored the campus by scooter :)

Week 2:

In my second week I finally met with my mentor in person rather than on zoom. This week I learned about different types of correlation, including the “Big Three”. Pearson’s correlation coefficient, Spearman’s, and Kendall’s. Pearson’s correlation is the most common way of measuring a linear correlation. The coefficient is a number between -1 and 1 that measures the direction and strength of the relationship between two variables. If the coefficient is between 0 and 1 then it is a positive correlation, meaning that if one variable increases that the other variable will also increase. If the coefficient is between 0 and -1 then it is an anti correlation, which means the opposite. If the Pearson correlation coefficient is - then there is no relationship between the variables.

Pearson's coefficiant: $r = \frac{\sum(x i -m x)(y i -m y)}{\sqrt{\sum {(x i -m x)}^{2} \sum {(y i -m y)}^{2}}}$

Now for the fun stuff. On Saturday my mentor invited me out to go hiking with him and some of the people in his lab. I loved the sights I got to see and we got great pictures out of it. Then the next day me and a few other people from DIMACS took the train to Brooklyn. In Brooklyn we crossed the famous bridge and we went to the flea market. I definitely want to visit New York again, hopefully my future self will go again.

Week 3:

This week, I attended several lectures from the "Current Trends in Mathematics" series held by DIMACS. The talks were highly engaging and covered topics ranging from the relative complexity of mathematical problems to Lagrangians and Markov numbers. In terms of my research, my mentor and I decided it was time to implement the correlations I have been studying. He mentioned having a dataset on HMDA. With that in mind, I decided to use this dataset to investigate whether there is a correlation between mortgage approvals and race. To do this, I had to download specific applications and learn how to manage this extensive dataset.

Week 4:

This week in Jupyter Lab, I made significant progress analyzing Census data. I focused on extracting key demographics like age, marriage status, education, sex, wages, and race. To visualize these factors, I created informative tables and graphs, allowing for a clear understanding of the data distribution. Next, I delved deeper into the relationship between wages and age. By creating my own Pearson correlation coefficient, I discovered a weak negative correlation of 0.006. I also double checked this by using the Pearson COrrelation in the pandas library. Finally, I created a scatter plot graph using Matplotlib that compared the wage distribution between males and females. Overall, this week's work provided valuable insights into the Census data and further strengthened my data analysis capabilities.

Week 5:

This week, I tackled a new challenge in analyzing the sensitivity of Pearson correlation. I explored how removing a single data point (agent) affects the correlation coefficient. This involved manual calculations and valuable guidance from my mentor's graduate student.

Week 6:

This week, I faced the same challenge but didn’t quite succeed. On the bright side, I learned a lot about differential privacy from my mentor. We also hosted students from the NYC Discrete Math REU, sharing food, research, and our overall experiences.

Week 7:

This week, the highlight was visiting Nokia Bell Labs. We attended talks and explored the famous Anechoic Chamber, once the quietest room in the world. Although it's now equipped with more technology, it still holds that unique allure. Omar also hosted a graduate student panel, offering insights into applications and grad life. On the research front, I concentrated on data visualization and began planning my final presentation for next week. Over the weekend, I headed to NYC with some friends from the REU and met up with others from a local program. We enjoyed great food, explored a museum, and had an all-around good time.

Week 8:

This week, I dedicated time to perfecting my presentation for the REU. I'm grateful to my labmates for organizing a mock session, which really helped calm my nerves and provided valuable feedback on communicating my research more effectively. The presentations on Wednesday and Thursday were impressive, showcasing everyone's hard work. Afterward, we celebrated with lunch, especially as some colleagues were heading to Prague. It was so nice meeting all the people from Charles University, the summer would have not been the same without them. Over the weekend, I finally fulfilled a summer goal: attending a Yankees game. Despite a tough 9-1 loss to the Tampa Bay Rays, I had a fantastic time. BUT I acquired a hat and a Aaron Judge jersey so the day was not completely lost.

Week 9:

This week marked my final days at the DIMACS REU. I had an in-person meeting with my mentor to discuss finalizing my report, summarizing the work I did over the summer. Afterward, my labmates and I enjoyed some friendly competition in multiple games of ping pong, celebrating our time together. I want to express my heartfelt gratitude to all of them for their invaluable assistance during this research experience. I hope to continue collaborating with them for the rest of the summer. The other REU participants and I went on a final walk, which was a lovely way to reflect on our journey. I am deeply thankful to everyone at DIMACS for making my first research experience so incredible. A special thank you to my mentor, who managed to meet with me regularly despite a busy travel schedule. This has truly been an amazing experience, and I wish I could stay longer. Alas, it’s time to return to Maryland. Goodbye, DIMACS.

Acknowledgments

I would like to express my deepest gratitude to Dr. Sarwate for his invaluable guidance and support throughout my research. My thanks also go to the DIMACS REU program for providing a platform to conduct and develop my research skills. Additionally, I am grateful for the financial support provided by the National Science Foundation under grant number CNS-2150186, which made this research possible. Thank you all for your contributions to my academic and professional growth.