DIMACS REU 2022

Lakshay Patel

Math and Philosophy student at UC Berkeley

lakshay[at]berkeley[dot]edu

Project Title: Locality-Sensitive Hashing in Hamming Space

Week 1

On Monday I gave an introductory presentation on dimensionality reduction in Hamming space to my REU cohort (link below). I met with my mentor Karthik and his graduate student Surya to discuss dimensionality reduction for euclidian space and whether Euclidian results can be extended to Hamming space. This week I:

  1. Spent time understanding the basics of coding theory with an emphasis on properties and examples of linear codes in Hamming space
  2. Started writing up a proof of Johnson and Lindenstrauss' lemma for Euclidian space and a distorted version of this lemma for Hamming space.

Week 2

On Tuesday I met with Karthik and Surya to discuss upper and lower bounds for dimensionality reduction in Hamming space. No one has found any lower or upper bound yet, so we will focus on measuring the distortion of a particular dimensionality reduction technique using perfect codes. This week I:

  1. Finished a write-up of the Johnson-Lindenstrauss lemma and the random subsampling version for Hamming space
  2. Started writing a matlab script to understand the distortion of distances via Hamming decoding

Week 3

On Monday I met with Karthik and Surya to discuss ideal properties the Hammimg code should have in order for it to be useful in dimensionality reduction. We also discussed the Binary Golay Code. This week I:

  1. Finished the matlab script I started writing last week. After running the script to see certain distributions of mappings, I realized that I could calculate these by hand. This allowed me to quickly address (2):
  2. Compared the Binary Golay Code to the 7-4-3 Hamming Code

Week 4

On monday I met with Karthilk and Suriya to discuss what our dimensionality reduction theorem could look like and different ways of quantifying the distortion of a map. This week I:

  1. Tried to formulate the theorem for dimesionalty reduction in hamming space
  2. Looked at guarantees on distortion of reduction via random subsampling

Week 5

Our hamming code approach seems comparable to reduction via random subsampling. On Tuesday I met with Karthik and Suriya to discuss the problems with random subsampling and whether there are reduction schemes that preserve small distances between strings. This week I focused on finding maps that preserve small distances. Although the hamming code approach didn't work, it led me to a different approach that I think will work well (pending some calculations).

Week 6

This week I met with Karthik and Surya a couple times to talk about what the hamming code approach does, if not dimensionality reduction. I have a more precise statement for a dimensionality reduction algorithm that preserves point-sets with very small distances. We thought the hamming code procedure could be used for clustering but it doesn't seem to work very well.

Week 7

This week I met with Karthik to discuss the final presentation and final paper. Since our clustering idea didn't end up working, I spent most of my time preparing the final presentation that I gave on Thursday.

Week 8

This was the last week of our REU. On Monday Surya helped me finalize some proofs for my paper and I spent the rest of the week finishing my write-up.

Acknowledgements

I thank my mentor Dr. Karthik Srikanta and his graduate student Surya Teja Gavva. My Work is supported by NSF grant CCF-1852215 and the 2022 DIMACS REU program.