Lauren Knopp | DIMACS REU 2025

About Me

Email: lgknopp@uvm.edu

Office: CoRE 417

I am a Computer Science and Mathematics student at the University of Vermont who loves complexity theory, probability, machine learning, unsupervised learning, neural networks, and AI ethics. When I'm not poring over textbooks and papers or furiously writing on whiteboards, you can find me organizing student-centered AI ethics talks at UVM, cooking falafel or massaman curry, or volunteering for GirlsWhoCode in the Burlington, VT area.

Project Description

Title: Property Testing Algorithms Under Space Constraints applied to PAC learning

This summer, I am working on proving memory-sample lower bounds for testing realizability in PAC learning frameworks using streaming and property testing techniques. In particular, I'm investigating whether general testers must obey a tradeoff of the form \( m \cdot s = \Omega(\mathrm{VCdim}(H)^2) \).

Weekly Research Updates

Week 1: May 27 - May 30

I started reviewing topics in Computational Learning Theory and Probability Theory I haven't been exposed to yet. Professor Garg and I met for the first time and discussed realizability in CLT and talked through what it means for a distribution of hypotheses to be realizable. I started writing my project presentation for next week, and am doing an in-depth review of computational learning theory this week utilizing textbooks and lectures from Cambridge and Oxford. I also built this website!

Week 2: June 2 - June 6

I delivered my presentation of Realizability Testers, giving the other REU participants some background into learning theory as well as my task for the remainder of the summer. I started to explore proofs of memory-sample lower bounds using reductions to problems in information theory. I also worked on proving that the maximum amount of samples needed to prove realizability with an accuracy of \(1 - \delta\) corresponds to Corollary 3.2 of Understanding Machine Learning: From Theory to Algorithms.

Week 3: June 9 - June 13

This week, I wrote a proof to establish that realizability testers have an upper bound on sample complexity that is \( m = \frac{1}{\varepsilon} ( \ln |H| + \ln \frac{1}{\delta}) \). In addition, I started reading papers on information theoretic techniques to prove lower bounds. I have practiced these techniques by writting multiple problem reductions to INDEX, which is a popular problem in information theory used to prove lower bounds on samples/memory of other complex problems. We believe that realizability (as a problem) can be reduced to INEDX.

Week 4: June 16 - June 19

I started to conjecture about low sample space realizability testers. My belief is that explicit label testing (or checking for data with matching x values but mismatching y classifications) is the most sample-effient algorithm to test realizability. I believe the lower bound on sample complexity here is \( \Omega(\frac{\delta}{\varepsilon}) \). I'll be working the remainder of this week to prove this lower bound and then think more about how samples and storage in the explicit label testing algorithm lead to a tight memory-sample lower bound on realizability testing. (With the set of binary functions on some finite set of input being the hypothesis class we will test on)

Week 5: June 23 - June 27

So, funny story. We've actually reached the conclusion that there is no memory sample tradeoff in realizability testing under the streaming model we are using. The reason for this is primarily due to the fact that we only require to have memory the same size as the VCdimension, with the samples we save being able to "shatter" the hypothesis class. With an already fixed sample size of \( \Omega(\frac{\delta}{\varepsilon}) \), there is no substantial memory sample tradeoff. I'll be proving the subsequent cases of potential hypothesis classes in weeks 6 and 7.

Week 6: June 30 - July 4

This week, I developed a better intuitive understanding of the limited memory required of the streaming algorithm we use to test realizability. I also started proofs of correctness in edge cases when adjusting memory and continuing to fix our sample size, where all hypotheses within a hypothesis class are \(2\epsilon\) far from realizable which is the large error instance case, as well as a hypothesis class that is within \(\epsilon\) of realizable (meaning that there exists a hypothesis such that true error is less than \(\epsilon\)). The Large error case was pretty ease to show, while the ambigious case of limited error is difficult. I may continue that proof into week 7.

Funding and Acknowledgements

Thank you to Prof. Sumegha Garg, Dr. Lazaros Gallos, and Lawrence Frolov for their mentorship and support throughout the program. Thank you also to Charles University in Prague and the DIMACS REU as a whole for their funding and accommodations. This work was supported by NSF grant CCF-2447342.