Benjamín Benčík, DIMACS REU 2024

General Information

Student:	Benjamín Benčík
Office:	CoRE 444
School:	Faculty of Mathematics and Physics, Charles University
Contact:	bencikben@gmail.com
Project:	Memory-query tradeoffs for property testing
Mentor:	Sumegha Garg

I am part of a group of students from Charles University that includes Todor Antic, Adam Dzavoronok, Jelena Glisic, Robert Jaworski, Tymur Kotkov, Sofia Kotsiubynska, Julia Krizanova, Volodymyr Kuznietsov, Tymofii Reizin and Jakub Sosovicka, Filip Uradnik. Patrik Zavoral.

About me

I study Computer Science and I am interested in cryptography combinatorics and complexity theory. For more info check out awesome personal page

Project Description

Property testing is a natural notion of approximation for decision problems, where given a property (or decision), the task is to distinguish whether a given instance has this property or is "far" from any instance having the property. Such tasks are widely used in various areas of TCS, such as coding theory, hardness of approximation and graph algorithms. In this project, we will focus on property testing on graphs, and explore the impact of memory constraints on the complexity of property testing. One of main measures of complexity for a property testing algorithm is its query complexity, which is the maximum number of input elements queried to determine whether the graph satisfies a given property.

Research Log

Week 1

Before the start of the project I have read Property testing by Dana Ron, which offers a comprehensive summary of the field. In the first week project I have looked at the results discovered in the field and tried to estimate memory complexity for the existing property testing algorithms.

bipartitness[1]: \(\widetilde{\mathcal{O}}(\epsilon^{-2})\)

k-colorability[1]: \(poly(\widetilde{\mathcal{O}}(k^4 / \epsilon^{6}))\)

p-clique[1]: \( \widetilde{\mathcal{O}}(p / \epsilon^{2}) \)

connectivity[2]: \( \mathcal{O}\left(\frac{d}{M/N \cdot \epsilon}\right) \)

eulerian[2]: \( \widetilde{\mathcal{O}}\left(\frac{d}{M/N \cdot \epsilon/2}\right) \)

The notation with ~ hide logarithmic factors. The memory complexity is given in terms of distance parameter \( \epsilon \), which is the maximum distance from the property.

We have not yet decided on the exact objective of the project, but next week I will decide which of the properties might make the most sense to study in terms of the memory complexity. The project will have mainly exploratory basis and we I focus on constructing lower bounds rather than designing a specific algorithm.

Week 2

Most of the properties I have looked at had trivial lower bounds for memory complexity. We have decided to study algorithms that decide if a graph has: triangle, \(k\)-clique or \(k\)-star. So far the bounds that I have achieved to show are only trivial.

Moreover I have read additional resources on property testing and techniques used in the field. I have looked at first three lectures of Information theory in computer science and the streaming model for graphs described in Data Streaming Algorithms.

Week 3

I have looked for known tester of the selected properties. My main focus was on the triangle-freeness property tester utilizing the Szemerédi’s Regularity Lemma. I have tried to come up with indistinguishable distributions for the lower bound on testing star-freeness but did not show anything better than the trivial bounds. With my mentor we have made the problem more specific and I started working testers for the star-freeness property on more specific graph families.

Week 4

This week my main focus was on the star-freeness property. I have tried to construct 2 testers where one optimized the query complexity and the other memory complexity. The aim is to be able to show some trade-off and prove lower bound for query complexity \( \times \) memory complexity. To be more specific I have tried to distinguish the following families:

random graph with \(n\) vertices and upper bound on degree \(3/4 n\)
the same as the previous family but one vertex is connected to the whole graph

I was able to find algorith with complexities \( \mathcal{O}(n^2), \mathcal{O}(\log{n}) \) and a second one with complexities \( \mathcal{O}(\log{n}), \mathcal{O}(n\log{n}) \) and prove their validity. However, my mentor eventually came up with an algorithm that achieved the optimum in both cases \( \mathcal{O}(n), \mathcal{O}(\log{n}) \) meaning that there was no tradeoff. I provided thorough analysis and proves in a technial writeup.

Week 5

I have studied a problem that tries to distinguish random graphs with degree bounds \(1/4n\) and \(3/4n\). In the design of various testers the gap between "memory optimal" and "query optimal" aproaches was small. That being said I did not find the memory size to make a huge trade-off for the property testing problems that I have studied. Usually the memory effiecient algorithm was counting and could get aways with constant memory. For this the section on Aproximating average degree from [3] was very helpful. On other hand I was struggling to get a significantly more efficient for large memory.\ I have looked at other approaches, I found out that graph adjacency matrix can be subjected to spectral analysis which offers some insight about the graph however the query complexity for approximating eigenvectors is expensive as well.

I have tried to consider different models. My mentor suggested looking at a semi-randomness model where tester can query a vertex \(v\) and recieves a random edge connect to \(v\). In this setting I have tried to distinguish two families:

Random graph where each edge is present with probability 1/2
Partition the vertex set into two random sets of size n/2. Both these sets have n/2 sized cliques.

Week 6

I came up with algorithm for the graph families that I have constructed the previous week but it mainly relied on the fact that the cliques were isolated. This instance of the problem is relatively easy and it can be computed in \(\sqrt{n}\) instances. We have tried to design a harder problem a specified graph families where the problem is to distinguish clique of size \(n/4\). Observation to this problem is that sampling into a random sets is probabily not going to make a valid tester since, it is not propbable that we will obtain an edge between those sets.

Week 7

We spent the week mostly by wrapping our notes up, and by preparing the final presentation.

After REU Logs

28.10.2024

During the rest of the summer I was studying for the final graduation test then I was moving to Technical University of Munich. Currectly we are working on \(n/3\)-clique tester for the semi-randomness model. Querying \(v\) gives another vertex \(u\) and a bit indicating if there exists an edge \((u, v)\). We managed to come up with algorithm for the property tester and this week I have been writing the analysis for this algorithm. We constructed stronger algorithm that can have access to the whole neighbourood and then showed that the quantity computed in our algorithm is a correct approximation of the idead quantity. We have also shown that the quantity is a good approximation of the number of triangles in the graph. The algorithm is based on the idea of counting triangles in the graph and then using the fact that the number of triangles is a good approximation of the number of \(n/3\)-cliques.

08.04.2025

Long time no see. The project is not more collaborative as Eric Van Grinsven has joined the team. Since the last time we needed to change the we are working with, which made the analysis for our algorithm more complicated. However we are getting closer to proving that the algorithm making \(\mathcal{O}(\sqrt{n})\) queries is possible. As the project is getting more complex maybe logging the progress is not such a bad idea after all. I will be more frequent:)

22.04.2025

Finally we are getting back to the paper on Streaming Lower Bounds for Approximating MAX-CUT, which might help us the establish the lower bounds for our problem. We have read parts of the paper with and tried to understand their techique although I still lack intuition for the analysis of boolean functions. I want to read also parts of Fourier analysis for probabilistic communication complexity. Tomorrow we will have a meeting where we will discuss the application of this technique to our problem.

📝 Streaming Max Cut Section 5 Proof Technique

Resources

Property Testing and Its Connection to Learning and Approximation 98 - Oded Goldreich, Shafi Goldwasser, Dana Ron
Property Testing in Bounded Degree Graphs 97 - Oded Goldreich, Dana Ron
Introduction to Property Testing 2017 - Oded Goldreich

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 823748.