2022 DIMACS REU Projects

DIMACS is very proud for the achievements of the 2022 DIMACS REU students during their summer research experience. This page includes the abstracts of the final papers that the students submitted to the program, summarizing their summer research.




Enver Aman (Rutgers University-New Brunswick NJ)
Markoff Surfaces and Strong Approximation
Mentor: Alex Kontorovich, Mathematics

Abstract: The Markoff equation is the equation x2 + y2 + z2 = 3xyz: we are interested in the solutions of this equation over Z or Fp, for a prime p. The Strong Approximation Conjecture states that, for each prime p and nonzero solution (x1; x2; x3) to the Markoff equation modulo p, there is a corresponding solution (x1'; x2'; x3') to the same equation over Z such that xj=xj' (mod p). We worked on the properties of these solutions.




Jan Bronec (Charles University (Prague, Czech Republic))
Semi-external approach for fixed point decomposition
Mentor: James Abello, Computer Science

Abstract: Decomposition of a graph into sub-graphs that are invariant with respect to degree peeling, i.e. fixed points, is a useful method of community detection in networks and visualisation of massive graphs. However, with the analysis of massive graph comes the problem of memory size limitations. Currently, present algorithms for fixed point decomposition cannot be used in the scenario where the whole graph doesn't fit into memory,since those approaches would result in too large I/O time. We present two methods for the computation of these decompositions and we also propose a different decomposition that is not based on fixed points but can be computed more efficiently.




Michael Bsales (University of Notre Dame IN) & Nithya Nalluri (The College of New Jersey NJ)
Data-Driven Security Measurements to improve Safety in NYC and NJ Mass Transit
Mentor: Christie Nelson, Masters of Business and Science in Analytics and CCICADA

Abstract: Public transit in America in recent years has suffered from attacks and crimes since they are very vulnerable to terrorist/mass casualty attacks. These vulnerabilities are due to the lack of strict screening and content policing, unlike security at airports. Although current public transit is designed to efficiently allow a way that it allows for passengers to quickly travel as needed, there is not a strong security system in place. Utilizing metro station security check systems (SCS) can achieve great scrutiny by transit authorities and the public due to their high throughput, risk factors, and a demand for safety. Modern SCS achieves safety through many layers of active and passive security checks. In order to implement strong security around public transit around America, it is important to understand the different types of transit stations and how they are operated, and the types of passengers that utilize them. This paper aims to develop an understanding of the current state of security check systems as applicable to high-traffic subway stations and what must change in the near future to establish effective security check systems in metro systems. By working toward creating a proof-of-concept risk analysis model using crime and other types of publicly available data on the NYC and NJ regions was used to make predictions on how transit stations are more at risk than others. With these predictions, it would be up to local governments and stations to implement more appropriate security measures.




Mihir Dhanakshirur (Indian Institute of Science)
Extension of the FKG Inequality
Mentor: Siddhartha Sahi, Mathematics

Abstract: The 1971 Fortuin-Kasteleyn-Ginibre (FKG) inequality for two monotone functions on a distributive lattice is well studied and has shown to have many applications in statistical mechanics and other fields of mathematics. In 2008 Sahi conjectured an extended version of this inequality, called (E_n) for all n>2 monotone functions on a distributive lattice. We considered a special version, namely (F_n) and examined its properties on the space on n monotone Boolean functions. We showed that (F_n) does not satisfy the quasi-concave property by formulating a counter-example. We proved that (F_3), over one-dimensional monotone Boolean functions, does not possess non-zero minima and we examined a method to prove a more general version.




Henry Fleischmann (University of Michigan-Ann Arbor MI)
On Inapproximability of Steiner Tree Computation in Hamming and Rectilinear metrics
Mentor: Karthik Srikanta, Computer Science

Abstract: We consider the Steiner tree problem in Hamming space. While this problem is known to be APX-hard, no hardness of approximation factor is known. We show that the Hamming Steiner tree problem is NP-hard to approximate within a factor of 1.004. In showing this, we derive a structural correspondence between vertex covers of 4-regular graphs and optimal Hamming Steiner trees of particular point configurations in Hamming space. As a corollary of our results, we show that the Rectilinear Steiner tree problem is NP-hard to approximate within a factor of 1.004. We also show analagous results for discrete variants of the Steiner tree problem. That is, the Hamming and Rectilinear Discrete Steiner tree problems are NP-hard to approximate within a factor of 1.004.




Guillermo Gaboa (Charles University (Prague, Czech Republic))
Rate-1 Non-malleable Codes for Polysize Tampering
Mentor: Marshall Ball, Department of Computer Science, Courant Institute of Mathematical Sciences, NYU

Abstract: In this project, we attempted to construct a rate-1 compiler for a non-malleable code with respect to tampering from functions that can be computed by circuits of polynomial size. We extend the ideas presented by Ball, Dachman-Soled and Loss who presented such a code with rate 0. Ourc ompiler does not preserve the non-malleability properties of the code and we present an attack that shows this. Finally, we present some possible improvements in this direction.




Leah Ghazali (University of Richmond VA)
A reanalysis of the US Census Bureau's Disclosure Avoidance System
Mentor: Ruobin Gong, Statistics

Abstract: Differential privacy is a promising mechanism for privacy protection, providing a way to calculate the maximum amount of privacy risk when an individual's data is included in a study. The US Census Bureau recently incorporated differential privacy to protect census data using the Disclosure Avoidance System (DAS). The DAS essentially consists of 2 steps: noise injection and post-processing. Researchers have conducted analyses on the DAS to examine its effectiveness and found numerous biases, which they specifically credited to the post-processing step. However, the US Census Bureau did not release the census data following the noise injection but before the post-processing, so there is no way to differentiate the effects of each step. Through the production of noisy data and replication of research by Kenny et al. (2021, Science Advances), we found that the biases found were due to post-processing. More importantly, we found that the analyses we replicated presented misleading results. When analyzing the error of the DAS, the researchers used a fitted error, which they calculated using a generalized additive model (GAM). The model's predictor variables included parameters plotted on the x-axis of their figures, essentially creating the illusion of stronger trends.We found that using the fitted model resulted in deceptive images that exaggerate the effect of the biases. By replicating their study, we found that the biases are still present, though less intense. We hope that our findings will provide a more accurate depiction of the biases of the DAS.




Jacob Gray (University of Massachusetts-Amherst MA) & Pengxiang Wang (University of Michigan-Ann Arbor MI) & Saachi Mutreja (University of California-Berkeley CA)
Equivalence between Restricted Non-interactive Statistical Zero-knowledge Classes
Mentor: Eric Allender, Computer Science

Abstract: This project primarily focused on the class NISZK_L, introduced in a paper from 2020 done with REU students and Eric Allender. Little was known about this class beyond the results in that paper and results from REU students the following summer, mainly comprising of two complete problems, hardness results relating to MKTP, and equivalence to a weaker subclass of NISZK, NISKZ_AC^0. This summer, we expanded these results to primarily show equivalences between NISZK_AC^0, NISZK_L, and NISZK_PM, among an array of other related results.




Emma Hasson (Bard College at Simon's Rock MA)
Formalizing Elementary Analytic Number Theory in Lean
Mentor: Alex Kontorovich, Mathematics

Abstract: We formalized essential theorems in Elementary Analytic Number Theory into the Lean 3 Theorem Prover, with the eventual goal of contributing the work to the already vast library or formalized math in Lean called Mathlib. We focused on Dirichlet's Hyperbola Method as applied to the Divisor function beginning with a mathematical description of the theorem.




Sycamore Herlihy (Worcester Polytechnic Institute MA)
Tumor evolutionary model incorporating spatial heterogeneity
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: Intratumoral heterogeneity is a feature in the majority of human cancers. The capability to observe the spatiotemporal clonal dynamics in the tumor of a patient is severely limited, raising the need for a mathematical model to further the study of ITH. We present STEDISH, a stochastic mathematical model for tumor growth incorporating the concept of spatiotemporal heterogeneity. STEDISH allows examining the patterns of spatial ITH under different models of tumor evolution and other constraints.




Samuel Hiken (Carleton College MN)
Approximation Algorithms for token swapping on Graphs
Mentor: Nicole Wein, DIMACS

Abstract: Consider the following problem: we are given a graph with $n$ vertices and $n$ distinct tokens. Given two arrangements of the tokens on the vertices, we wish to know the length of the shortest possible sequence of "edge-swaps" needed to get from one arrangement to the other. Here, an edge-swap refers to the act of swapping tokens that lie on adjacent vertices. This problem, known as token swapping, is APX-hard, and a constant-factor approximation has been known since 2016. This summer, I studied the approximability of token swapping on general graphs.




Iris Horng (University of Pennsylvania PA)
Topological Analysis of Connectome Data
Mentor: Jie Gao, Computer Science

Abstract: Topological Data Analysis is a promising approach for analyzing brain connectome data due to its ability to abstract underlying geometric structures in complex networks. The research presented in this study investigates a proper filtration that can be used to generate meaningful barcodes in order to represent the functional brain network of humans. Filtration using curvature was also used to explore the neural network of C. Elegans. Analysis of barcodes established a pattern among brain networks. Based on these visualizations, it was determined that differences in connectivity can be observed among various connectomes.




Ishaan Ivaturi (Rutgers University-New Brunswick NJ)
Generating Meta-DAGs
Mentor: James Abello, Computer Science

Abstract: Graph Cities are a new framework for visualizing, exploring, and interpreting graphs that have the order of billions of edges. Graphs of this size are simply too much data to be meaningfully processed by humans, from the macro scale down to the actual semantics. Graph cities break such "huge" graphs into metaphorical buildings and meta-nodes that can be individually explored. In this manner, large graphs can be explored at different levels of detail to extract patterns at both the macro and micro scales. The focus of this paper is on speeding up the computation of the meta-graphs within each building, allowing the graph city to be explored interactively rather than needing to be precomputed.




Liron Karpati (University of Maryland-College Park MD)
Curvature Motifs in Connectome Data: Pitfalls with Path Motifs
Mentor: Jie Gao, Computer Science

Abstract: Nervous systems are organized for efficient integration of information. It has been shown that, in neuronal connectomes of the C. Elegans worm, high degree nodes are highly connected to form what is called a "rich-club" structure. This rich-club structure allows different neurons to reach one another in relatively few edge hops. The rich-club organization is a global architectural feature of the C. Elegans connectome. It has yet to be explored how the local organization of the connectome is supporting integration integration. By using the notion of course Ricci curvature (a generalization of Ricci curvature) and a path motif analysis, we found that the C. Elegans connectome exhibits local weak-bridge structures. More importantly, our analysis highlights some of the critical pitfalls in hypothesis testing graph properties. These pitfalls motivate the need for a principled investigation into statistical methods for determining the significance of graph properties.




Vikram Kher (University of Southern California CA) & George Li (University of Maryland-College Park MD)
Exploring Fine-Grained Buy-Many Mechanisms
Mentor: Ariel Schvartzman, DIMACS

Abstract: Multi-item revenue-optimal mechanisms are known to be extremely complex, often offering buyers randomized lotteries of goods. In the standard buy-one model it is known that optimal mechanisms can yield revenue infinitely higher than that of any "simple" mechanism, even for the case of just two items and a single buyer.We build off of the manuscript of Assadi and Schvartzman, who first defined the notion buy-$k$ mechanisms, which smoothly interpolate between the classical buy-one mechanisms and the recently studied buy-many mechanisms. Buy-$k$ mechanisms allow the buyer to buy up to $k$ many menu options. Our main results are that the revenue gap with respect to bundling, an extremely simple mechanism, is bounded by O(2n·n2) for any arbitrarily correlated distribution D over $n$ items for the case of buyer's with arbitrary monotone valuations. This is a generalization of the known O(n2) bound for additive valuations. In addition to this, we also conjecture that there exists a distribution D over two items such that Buy(k)Rev(D) > Buy(k+1)Rev(D). We make partial progress towards proving this conjecture by providing a candidate distribution D and a candidate optimal mechanism Mk which we believe witnesses such a gap.




Gaurav Kucheriya (Charles University (Prague, Czech Republic))
Visibility graphs of polygons
Mentor: James Abello, Computer Science

Abstract: The visibility graph recognition problem asks to determine if for a given graph G there is a polygon P having G as its visibility graph. It is not known to be in NP. Here we show that persistent graphs having a block of consecutive blocking vertices can be realized as a polygon.We conjecture that this technique can be extended to realize a larger subclass of persistent graphs.




Zachary Lihn (Columbia University in the City of New York NY)
Realizing a Fake Projective Plane as a Degree 25 Surface in P5
Mentor: Lev Borisov, Mathematics

Abstract: Fake projective planes are smooth complex surfaces of general type with Betti numbers equal to that of the usual fake projective plane. Recent explicit constructions of fake projective planes embed them via their bicanonical embedding in P9. In this paper, we study Keum's fake projective plane (a = 7, p = 2, {7},D327) and construct an embedding of the fake projective plane in P5. We also simplify the 84 cubic equations defining the fake projective plane in P9.




Max Lind (Princeton University NJ)
L-functions and the Langlands Program
Mentor: Alex Kontorovich, Mathematics

Abstract: Our goal was to introduce L-functions and explain their relevance to the Langlands conjectures. Along the way, we defined Godemond-Jacquet L-functions, which generalize all other known L-functions.




Shivesh Mehrotra (Yale University CT)
Iteratively Estimating Error for the Least Absolute Shrinkage and Selection Operator
Mentor: Pierre Bellec, Statistics

Abstract: We studied how to estimate out-of-sample and generalization error for Least Absolute Shrinkage and Selection Operator (LASSO) at each iteration of the Iterative Shrinkage-Thresholding Algorithm (ISTA). Our goal was to derive novel estimators for the aforementioned quantities as well as study their derivative structures. Due to the iterative nature of the problem, there was a natural connection to multitask learning. Thus the approach developed is analogous to an approach used to estimate these quantities in multitask learning. The results were confirmed with initial simulations.




Jachym Mierva (Charles University (Prague, Czech Republic)) & David Sychrovsky (Charles University (Prague, Czech Republic))
Understanding non-monotonicity for two independent items
Mentor: Ariel Schvartzman, DIMACS

Abstract: Sale of independent items is an often studied problem. Myerson's lemma provides an efficient way to realize an optimal trade of a single item. However, in a multi-item setting, the results can be non-intuitive. It has been shown before that in some cases, buyers having a higher valuation for sold items can lead to lower revenue of the seller. This non-monotonicity is not yet understood. In this paper, we provide conditions which guarantee monotonicity and an algorithm to generate non-monotone examples. We show that such examples are rare, even though the revenue can change significantly in some cases.




David Miksanik (Charles University (Prague, Czech Republic)) & Jan Soukup (Charles University (Prague, Czech Republic))
Non-manipulable tournament rules
Mentor: Ariel Schvartzman, DIMACS

Abstract: We consider a manipulability of tournament rules on $n$ teams, where the rules select (possibly randomly) a single winner based on the result of all $\binom{n}{2}$ matches. A tournament rule is said to be $k$-SNM-$\alpha$, if no $k$ teams can increase their joint probability of winning by fixing the $\binom{k}{2}$ matches between them; hence, $k$-SNM-$\alpha$ is a measure of manipulability of a tournament rule. Previous works show that among all Condorcet-consistent (i.e., undefeated teams always win with probability 1) monotonous (i.e., no team can increase its probability to win by throwing matches) rules the best we can hope for are $k$-SNM-$\frac{k-1}{2k-1}$ rules and show several examples of rules achieving this bound for $k=2$. In the case of $k=3$, there exists only one rule achieving non-trivial non-manipulability, specifically $3$-SNM-$\frac{31}{60}$. Our main result is an existence of a new rule that is Condorcet-consistent, monotonous, $2$-SNM-$\frac{1}{3}$, and $3$-SNM-$\frac{1}{2}$. Additionally, the analysis of this rule is tight and uses a new technique with the possibility of finding similar rules achieving even stronger non-manipulability. Our second result shows that rules for tournaments on $n$ teams satisfying a stronger version of Condorcet-consistency can be extended to rules for any number of teams with only slightly worse manipulability. Finally, for every $d \geq 3$, we generalize a~random single binary elimination bracket rule to a~random single $d$-ary elimination bracket (RS$d$EB) rule (the complete binary tree is replaced by the complete $d$-ary tree). We show that, for every $k \geq 3$, the rule RS$k$EB is Condorcet-consistent, monotone, and $k$-SNM-$\alpha$ for $\alpha < 1$ depending on $k$.




Archana Mohandas (Massachusetts Institute of Technology MA)
Proving Descartes Theorem in Lean
Mentor: Alex Kontorovich, Mathematics

Abstract: Developing a proof of Soddy-Gossett's Theorem that uses inversive geometry is a useful step towards advancing the Lean library and allowing more complex theorems to be proven in Lean. In this project, we produced a concise proof of Soddy-Gossett's Theorem using inversive geometry and began the implementation of the proof in Lean. Thus far, we have defined important concepts that are necessary in the proof such as the definitions of the co-radius, inversive coordinates, the inner product, and the inner product matrix. However, the more complex preliminary lemmas and the main statement of Soddy-Gossett's theorem are still in the process of being proved. A future goal after the conclusion of this project would be to complete the proof of Soddy-Gossett's Theorem in Lean and eventually extend the ideas of this proof to other theorems whose proofs can be simplified using inversive geometry.




Derek Montanez (Texas State University TX)
Exploring Trade-offs Between Compression and Accuracy in Neural Networks
Mentor: Waheed Bajwa, Electrical and Computer Engineering

Abstract: Machine learning, specifically neural networks, are at the heart of many important everyday jobs such as image classification, and natural language processing tasks. Advances in today's neural networks, have led to very impressive results in varying tasks, with the impressive results comes the disadvantage of very large neural network models, with parameters in the range of hundred millions to billions. There has been a lot of ongoing research to be able to provide solutions to the problem of compressing large neural network models, to perform as well as the original model or even better. The goal currently is to be able to compress deep neural networks to be able to deploy and use on edge devices, failure to deploy large models is inevitable due to memory and computational constraints on edge devices. We explores the trade-offs between compression and accuracy in neural networks using a neural network compression method 'Pruning Filters for Efficient ConvNets'. Neural network compression methods are used to reduce the memory space and computational costs of neural networks, and these methods include but are not limited to pruning, quantization, and knowledge distillation. The tests were conducted using the MNIST Fashion data-set, as well as the CIFAR10 data-set, both of which are image classification data-sets. The model SCNNB is a convolutional neural network that was used for conducting tests.




Saachi Mutreja (University of California-Berkeley CA)
Hardness Amplification of One Way Functions
Mentor: Periklis Papakonstantinou, Management Science and Information Systems

Abstract: In this project, I mainly explored how the hardness of one way functions is preserved. In particular, given a certain circuit C that computes a one way function that has certain hardness, what can one say about the existence of circuits that compute one way functions that are harder? We investigated a possible approach to prove lower bounds on the hardness of one way functions whose existence is implied by the existence of the circuit C.




Kabir Narayanan (Brown University RI) & Abigail Perryman (The University of Texas at Austin TX)
A Bohmian Analysis of Energy and Momentum in 1-D Scattering of Relativistic Particles
Mentor: Shadi Tahvildar-Zadeh, Mathematics

Abstract: In this paper, we lay the groundwork for a formal justification of the plane wave assumptions underlying Arthur Compton's scattering calculation. Using a recent relativistic formulation of Bohmian mechanics, we start by examining the behavior of a single free photon and a single free electron. We offer significant evidence that the electron wave function becomes asymptotically plane wave-like, meaning the particle's momentum and energy approach fixed values as assumed in Compton's calculations. We examine the effects the wave function's initial parameters have on the behavior of particle trajectories, and we explore different ways to visualize trajectories and this asymptotic behavior prior to interaction.




Lakshay Patel (University of California-Berkeley CA)
Dimensionality Reduction Using Error-Correcting Codes
Mentor: Karthik Srikanta, Computer Science

Abstract: The Johnson-Lindenstrauss lemma shows that for any n points in Rd and ε>0, there is a map into O(\log(n)ε-2)-dimensional Euclidean space that distorts the pair-wise ℓ2 distances of S by a multiplicative factor of at most (1+ε). This result is known to be optimal for ℓ2, whereas our understanding of dimensionality reduction in Hamming space is far from complete. In this project, we looked for dimensionality reduction methods in Hamming space, with a focus on an approach using the 7-4-3 Hamming code.




Travis Pence (University of South Carolina-Columbia SC)
Sequence-Dependent Geometric Modeling of Fluctuations of DNA
Mentor: Wilma Olson, Chemistry and Chemical Biology

Abstract: DNA minicircles of base pair length 336 have recently been used to inject genes into mice. The step-parameter model describes a DNA strand's configuration in space and is packaged into the software emDNA, which currently finds the minimal energy state for an input sequence. This project expands upon emDNA and adds the emDNA-move tool, allowing for the thermal fluctuation of constrained linear or circular sequences of DNA. This tool was developed in C++ and allows for the customization of the degree of random movement and fluctuation length. This project also compares the deformability of each unique tetramer and the coupling of step parameters.




Liubov Samborska (Yale University CT)
Two-way Communication Complexity of Weighted Maximum Cut
Mentor: Sepehr Assadi, Computer Science

Abstract: In this project, we consider the well-known Weighted Max-Cut problem in the context of two-party communication. We establish a two-way randomized communication complexity lower bound of Ω(n^2) for Weighted Max-Cut on graphs with n vertices, which differs from the trivial upper bound by only a logarithmic factor. To establish the desired lower bound, we construct a chain of communication reductions: Disjointness to 3-Coloring, 3-Coloring to NAE 3-SAT, and finally NAE 3-SAT to Weighted Max-Cut. The reductions we present are non-arbitrary, as the size of constructed instances and the amount of communication used in the reduction impact the strength of the resulting lower bound. We thus apply reductions in a white-box manner, establishing the communication complexity lower bound of Ω(n^2) for Weighted Max-Cut and additional lower bounds for the intermediate problems of 3-Coloring and NAE 3-SAT in the communication reduction chain.




David Wang (Hofstra University NY)
Motifs in billion-edge graphs
Mentor: James Abello, Computer Science

Abstract: We worked on a method which allows for human visualization and exploration of large graphs (i.e. billion edge graphs). Our method relies heavily upon the core decomposition algorithm developed by Abello and Nakhimovich. Using this method, we create interactive Graph Cities which makes human interpretation of exponentially larger graphs possible. The Graph Cities interface also makes it easier to catalogue patterns and subgraphs of the parent supergraph. We also studied compelling avenues in which this work could continue.



Return to REU Home Page
DIMACS Home Page
DIMACS Contact Information
Page last modified on August 22, 2022.