DIMACS
DIMACS REU 2017

General Information

me
Student: Akilesh Tangella
Office: CoRE Building Room 448
School: University of Pennsylvania
E-mail: akilesh@seas.upenn.edu
Project: How to Provide Distributed Differential Privacy?

Project Description

Differential Privacy on Graphs, Recommendations, Distributed Differential Privacy


Weekly Log

Week 1/2:
Going through some parts of differentially privacy book by Aaron Roth and Cynthia Dwork, reading paper on differential privacy on graphs by Adam Sealfon, reading paper on frequency estimation in histograms with differential privacy by Adam Smith and Raef Bassily. Working on creating notes on these papers, along with more in-depth notes on the local differential privacy model and recommendation systems. Watched a talk about matrix completion and its relation to recommendation systems by Ankur Moitra (but have to do more background reading on this topic). Met with my mentor Dr. Muthukrishnan (Muthu). Met with Dr. Rebecca Wright, the director of DIMACS who will also be helping to mentor me (and is a PI on the grant supporting the project) and also in contact with Dr. Paul Kantor who was the initial PI on the grant that is suporting me. Dr. Kantor sent me a poster about previous work relating to privacy and recommendation systems.
Week 3:
This week I have continued working on creating notes for the papers assigned to me and have also discovered other useful papers: a paper on applying differential privacy to the Netflix problem by McSherry and Mironov and a paper discussing the limits of differential privacy on social networks (graphs, where concept of differential privacy is edge differential privacy) by Korlova et al. I have also continued to update my notes and also have read notes from a Stanford machine learning class (CS229) because recommendation algorithms, at their core, are a machine learning problem.
Week 4: After doing some more research I have come up with two potential problems that I will try approaching. I am not sure if these are good problems to look at so I emailed my mentor for more advice. Here they are:1) The ubiquitous Netflix problem can be treated as being a matrix completion problem, where the input is a sparsely populated USERS-RATING matrix and there is a guarantee that this matrix has some structure (i.e. it has low rank). There is work done by Hardt and Roth applying differential privacy to topics related to matrix completion, but sometimes there are not just two dimensions (USERS and RATING), but there may be more, such as in the Groupon problem where there is an extra dimension of TIME. In this case, the problem changes from matrix completion to tensor completion. Recently, there has been work in the field of tensor completion using the sum-of-squares hiearchy (which I am not currently familiar with) -- one such paper is by Moitra and Barak (https://arxiv.org/pdf/1501.06521.pdf). However, I have not seen tensor completion applied to differential privacy so this could be one area to explore. 2) For the recommendation problem on graphs, in for instance, the social network setting, there is a nice work by Korlova et al. (http://theory.stanford.edu/~korolova/Personalized_social_recommendations_accurate_or_private.pdf), in which vertices represent people and edges represent the "links" or "friendships" between them, and recommendations are made not based on a person's individual features, but on solely on their friends. In the paper by Korlova et al. friendships (or edges) are sensitive and knowledge of their existence or lack thereof needs to be protected -- therefore, neighboring databases in the context of differential privacy are graphs that differ in one edge. One reason behind this formulation is that if you have only friend and this is known to an adversary, then for instance, if we are recommending products, since your recommendations are exactly what your friend buys, when an adversary sees what is recommended to you, your friend's purchase history is compromised. However, many times on social media nowadays, one can publicly view who is friends with who, and furthermore, it is unlikely that a person has an extremely low number of friends. Therefore, maybe another way to view privacy on a graph is have the topology of the graph (in other words, who is friends with who) be public (quite similar to the paper by Adam Sealfon), and their be a edge weight function on the graph that needs to be protected from adversaries representing how much a person trusts another person's recommendations -- these weights are then used to come up with better recommendations by weighting the purchases of people who the person trusts more. There is one difference, that will make this setting more cumbersome than the Sealfon paper, which is that the weight function is asymmetric since person A may trust person B more than person B trusts person A.
Week 5/6: I am lumping weeks 5 and 6 together since week 6 has been cut short by Independence day. In week 5, we visited IBM Research Labs in Yorktown Heights -- it was pretty cool, especially since we got to see the facilities where they are making quantum computers. I also learned about Yao's Minimax principle -- which is a method for proving lower bounds in randomized algorithms (for fun...). In terms of research I have begin done further work on the problems listed in week 4 and watched this talk by Moritz Hardt: https://simons.berkeley.edu/talks/moritz-hardt-2013-12-14

Presentations


Additional Information