General Information
Student: 
Zexi Song 
Office: 
CoRE 537 
School: 
Bard College 
Email: 
zs0811 (at) bard (dot) edu 
Project: 
Generating Private Synthetic Data 
Mentor: 
Dr. Anand Sarwate 
Project Description
We are gathering more and more digital information about people. As this data becomes more and more detailed, it’s easier and easier to “deanonymize” or reidentify individuals from their data alone. However, for most applications all we want is populationlevel statistics.
To ensure individual privacy while at the same time utilize the richness of the data, our approach is to publish “synthetic” data: we learn patterns and correlations in the population and then generate fake individuals/records so that the population statistics are similar. The trick is to learn the patterns and correlations in a privacypreserving way. The privacy model we will use is differential privacy, a new privacy framework that is rapidly gaining popularity.
The hope is to understand how much privacy is possible through this approach as well as to produce a toolkit that can be used by others to explore synthetic data generation. The work is a blend of probability, randomized algorithms, with some programming/simulation thrown in for good measure.
Weekly Log
 Week 1:

 Met with Dr. Sarwate and the group.
 Explored literature on differential privacy, probabilistic graphical models and the CLThres algorithm.
 Gave the first presentation.
 Week 2:

 Continued literature review from Week 1.
 Worked on the implementation of ChowLiu algorithm in Python.
 Studied the connection between sensitivity and differential privacy.
 Week 3:

 Implemented the simulation of ChowLiu algorithm based on doubly symmetric binary source.
 Extended the scope of the code to process generic discrete random variables.
 Worked on the sensitivity of mutual information.
 Week 4:

 Continued working on the sensitivity of mutual information.
 Selected and preprocessed data from UCI Machine Learning Repository.
 Week 5:

 Proved the sensitivity of the entropy function.
 Used the sensitivity of the entropy function to give a loose upper bound for the sensitivity of the plugin estimate of the mutual information function.
 Worked towards a better upper bound.
 Week 6:

 Proved a tight upper bound for the sensitivity of the plugin estimate of the mutual information function.
 Implemented the simulation of differentially private ChowLiu algorithm in Python, based on the planted tree model.
 Prepared for the second presentation
 Week 7:

 Evaluated our algorithm on the Adult dataset and recovered a robust tree structure.
 Gave the second presentation.
 Week 8:

 Worked on the extension of differentially private graphical models to continuous data.
 Gave the sensitivities of sample mean, sample variance and sample covariance of continuous data.
 Week 9:

 Worked on the sensitivity of continuous mutual information function based on the results from Week 8.
 Wrote up the final report.
Presentations