Ashwin Padaki, DIMACS REU 2023

DIMACS REU 2023

General Information

Student:	Ashwin Padaki
Office:	417
School:	Columbia University
Email:	apadaki22 (at) gmail (dot) com
Project:	Hardness of Approximation for Clustering Objectives [arXiv]
Partners:	Kyrylo Karlov, Styopa Zharkov

Project Description

Clustering is one of the most prevalent problems in computer science, with applications ranging from machine learning to bioinformatics. For many objective functions, finding an optimal clustering is NP-hard, which motivates the study of the problem's hardness of approximation. There are many gaps between known hardness results and existing approximation algorithms, an example of which is minimum diameter clustering in the L₂ metric. We aim to close gaps of this nature, thus advancing our understanding of clustering.

Weekly Log

Week 1 (5/31-6/4): We went over some foundational results in the hardness of clustering, particularly those concerned with the diameter objective function. I learned about basic reductions from vertex cover and graph coloring to approximate clustering. I was also introduced to existing approximation algorithms, the most well-known of which is Gonzalez's 2-approximation algorithm. After getting up to speed, I spent the week helping compile a document of known hardness results and proofs, brainstorming potential approaches to closing some hardness gaps, and preparing for our initial presentation.
Week 2 (6/5-6/11): We organized our understanding of the different parameters involved in clustering. The two main parameters are the number of clusters (either a fixed constant or potentially unbounded) and the L_p metric space of choice. In the setting of an unbounded number of clusters, hardness within a factor of 2 is known for both the L₁ and L_∞ metrics, making it a tight bound due to Gonzalez's Algorithm. Meanwhile, only hardness within a factor of 2 · cos(10°) ≈ 1.97 is known for the L₂ metric. After unsuccessfully trying to tighten this L₂ gap with a modified construction, we decided to move on and consider the setting of a constant number of clusters. A simple reduction from graph coloring shows hardness within a factor of sqrt(5/4) ≈ 1.12, and we were unable to find a way to improve this bound.
Week 3 (6/12-6/18): We began this week realizing that our struggle to push the loose L₂ lower bound was a sign that perhaps its upper bound was less rigid than we had thought. We achieved our first nontrivial result of the program, an improved √3-approximation algorithm for L₂ clustering with a constant number of clusters. By the end of the week, we were able to extend this algorithm to essentially achieve an approximation ratio of √2. Simultaneously, we began to consider ideas in coding theory to establish hardness bounds for the L₁ metric in the setting of a constant number of clusters.
Week 4 (6/19-6/25): This week, we established our first nontrivial result on the side of hardness of approximation, specifically hardness within a factor of 5/4 for a constant number of clusters in L₁. We did this by exhibiting a particular hard instance of 3-coloring that can be embedded nicely with a Hadamard-like code. We considered ways to improve this bound even further, but decided that we would likely need a different hard instance of 3-coloring to reduce from. On the algorithmic side of things, we were able to work out an exact clustering algorithm when the number of dimensions of our pointset, in addition to the number of clusters, is constant (in any L_p space).
Week 5 (6/26-7/2): We spent the start of this week committed to finding the appropriate hard instance of graph coloring for proving hardness of clustering in L₁. Each of our ideas failed, but we slowly began to suspect the existence of a structural barrier to our techniques. We observed that certain graphs, including paths of seven vertices, cannot be naively embedded in L₁. Since it is actually polynomially easy to 3-color graphs that do not contain these particular induced subgraphs, we concluded that our current line of techniques could not give us hardness within a factor of 2. By reframing this notion of naive embeddability as a linear program, we were able to computationally determine an exact threshold ratio of 5/3, beyond which proving hardness would necessitate a new approach.
Week 6 (7/3-7/9): This week, we made significant advances in our hardness results. In L₂, we constructed a nice geometric object on the surface of a sphere that gave us naturally embeddable hard instances of 3-coloring; we used it to obtain hardness of essentially sqrt(1 + 1/√2) ≈ 1.306. Meanwhile in L₁, we devised a simple embeddability gadget to obtain hardness within a factor of 4/3. By employing a more complicated gadget related to the Chvátal graph (my favorite graph of the summer), we were able to further improve this factor to 3/2. Despite all this progress, we still considered these results minor optimizations, as our techniques were still subject to the 5/3 barrier.
Week 7 (7/10-7/16): This week, we tried to think of ways to achieve hardness within a factor of 2 in L₁, which would also imply hardness within a factor of √2 in L₂, thus solving both of our problems in a single stroke. Even stronger than our condition for clearing the 5/3 barrier, we realized that our hard instance of 3-coloring would need unbounded odd girth. After observing an even more restrictive condition, we realized that the gulf between our current techniques and the ones needed to close the hardness gap was wider than we had anticipated. In the interest of time, we decided to pause thinking about new ideas and begin working on our final report and presentation.
Week 8 (7/17-7/23): The bulk of this week was spent crafting and preparing for our final presentation. After presenting, we got started on our final project writeup. On Friday, I flew to Prague as part of the exchange program with Charles University.
Week 9 (7/24-7/30): We spent this week completing our final writeup. Organizing all of our results and explaining them neatly turned out to be quite valuable. In particular, we realized that the proof of hardness in L₁ was much more ad hoc than the proof of hardness in L₂, perhaps a sign that we need to start from a more general framework in the former case. Not much research was done this week, as in addition to the writeup we attended seminars at Charles University in Ramsey Theory, discrete geometry and algorithmic game theory. My favorite talk of the week involved extremal properties of visibility problems.
Week 10 (7/31-8/4): This was the final week of the program, and it coincided with a combinatorics workshop at Charles University. Much of the content seemed to be geared towards experts in particular subfields of combinatorics, but it was definitely nice to learn about the various active areas of combinatorics research. We spent our free time this week doing a ton of sightseeing. Most of our tourism was inside of Prague, but my favorite excursion was actually a hike outside of the city, to Koněprusy Caves.

Presentations

Additional Information

Mentors: Karthik C.S. , Henry Fleischmann
Acknowledgements: This work was carried out while I was a participant in the 2023 DIMACS REU program at Rutgers University, supported by NSF grant CNS-2150186.