2026 DIMACS REU Proposed Project Descriptions

Please revisit this page frequently, since additional projects may be posted through January.
The projects are indicative of the research planned by the mentors, but may be modified by the time that the REU starts.



We only receive applications through the NSF ETAP system. Use this link to apply: https://etap.nsf.gov/award/8213/opportunity/11675

*Note that you must be a U.S. Citizen or Permanent Resident to be eligible for these projects.



Mallows model for choice modeling

Project #: DC2026-01
Mentor: Shahrzad Haddadan, Management Sciences and Information Systems

Abstract: Choice modeling offers a framework for analyzing and predicting how individuals make decisions when presented with a set of alternatives (an assortment). In rank-based choice models, it is typically assumed that each customer possesses an underlying preference ordering over the available items and selects the most preferred (highest-ranked) item within any offered assortment.
The Mallows model is a distance-based probability distribution over permutations, often viewed as an analogue of the Gaussian distribution for scalar variables. It provides a principled mathematical framework for modeling probabilistic preferences and has been employed effectively in choice modeling, where it has demonstrated superior predictive performance relative to other classical models.
In this project, we study several generalizations of the Mallows model and their key properties, which can be leveraged to design algorithms for choice prediction, revenue management, and related applications.
Useful skills: interest in discrete mathematics; familiarity with probability theory, concentration inequalities and algorithm design; proficiency in a programming language such as Python; and experience with data cleaning and data mining techniques.
Reference: Generalized Top-k Mallows Model for Ranked Choices



Who is Really Top-K? Quantifying Ranking Stability and Ambiguity under Model Multiplicity

Project #: DC2026-02
Mentor: Lesia Semenova, Computer Science

Abstract: Ranking systems (like search engines, recommender systems, or LLM response rankers) often suffer from model multiplicity. This occurs when many different models achieve nearly the same performance but produce drastically different rankings for the top-k items. This creates a reliability crisis: is an item top ranked because it is truly the best, or simply because of an arbitrary choice among equally good models? This project investigates the Rashomon Set, which is the set of all models that achieve near-optimal loss. Instead of training just one model, we will analyze the entire set to distinguish between stable rankings (items that are always top-k) and ambiguous rankings (items that vary wildly across equally good models). We will build interpretable diagnostics that tell us when we can trust a ranking and when we cannot. This is critical for high-stakes applications like hiring algorithms or ordering fact-based answers in Large Language Models.
Prerequisites: Python and PyTorch, background in linear algebra, optimization, probability, and machine learning concepts.



The Many Faces of Truth: Diverse Reasoning in Interpretable Models

Project #: DC2026-03
Mentor: Lesia Semenova, Computer Science

Abstract: Interpretable models, such as Concept Bottleneck Models (CBMs), often enforce a single "best" explanation for a prediction, masking the reality that different combinations of concepts can be equally valid (the Rashomon Effect). This project focuses on algorithms to discover diverse feature bases, meaning finding disjoint or complementary sets of concepts that have near-optimal performance. We will primarily work with CBMs to formulate optimization constraints that force the model to learn multiple, distinct explanations for the same data. In a final exploratory stage, we may investigate if these diversity metrics can be applied to multimodal systems, specifically by analyzing if diverse reasoning paths for complex image and text data rely on distinct internal feature activations. This provides a rigorous algorithmic foundation while connecting to timely questions in interpretable and foundation models.
Prerequisites: Python and PyTorch, background in linear algebra, optimization, probability, and machine learning concepts



Sample augmented-online algorithms

Project #: DC2026-04
Mentor: Roie Levin, Computer Science

Abstract: Theory has been slow to explain the explosive impact of machine learning methods on algorithms. In the real world, practitioners often eschew textbook algorithms in favor of learned heuristics that train on historical input data. In recent years, the nascent field of algorithms with predictions has attempted to explain the discrepancy by designing algorithms that are augmented with the ability ask for advice about the optimal solution from a (potentially untrustworthy) black box oracle. A recent and natural twist is the with-a-sample paradigm. In this setting, we do not assume the algorithm is given oracle access, but instead that it has some amount of raw representative historical data on which to train. The new goal is to directly exploit this history to give better-than-worst-case performance on the true input data. One may view such with-a-sample algorithms as end-to-end solutions that combine the design of the black box learning algorithm with design of the policy for how to exploit the oracle.
In this project, we will investigate such questions in the context of specific concrete problems from the world of online and approximation algorithms. Some candidates include Online Steiner Forest, Weighted Tree Augmentation, and Set Cover.



Fairness in Multi-Agent Resource Allocation

Project #: DC2026-05
Mentor: Arpita Biswas, Computer Science

Abstract: Fairness is a fundamental concern in resource allocation problems, particularly when a limited set of indivisible resources must be distributed among agents with competing preferences,--a setting that is rich in combinatorial structure and algorithmic challenges. The computational fair allocation literature at the intersection of computer science, mathematics, and economics. The project explores a timely question: Can allowing bounded, cost-sensitive sharing of indivisible resources overcome classical impossibility results and lead to provably fairer outcomes? The project aims at investigating whether bounded sharing can improve fairness guarantees such as relaxed forms of envy-freeness, proportionality, and maximin share fairness. The work will blend theoretical reasoning, algorithm design, and computational experimentation, with opportunities to apply data-driven and multi-agent AI techniques to model and elicit preferences of heterogeneous agents. Possible directions include designing scalable algorithms for fair allocation under cost-sensitive sharing and combinatorial constraints, and leveraging modern Agentic AI tools for shared-cost and preference elicitation.



Truth Learning in a Social Network

Project #: DC2026-06
Mentor: Jie Gao, Computer Science

Abstract: Individuals gather information by both independent observations and communication with others in a social environment. Social Learning studies how dispersed and self-interested agents aggregate information. In this project we study whether social learning can enable truth learning. It is well known that learning can fail due to herding, where agents collectively focus on an early false consensus and ignore subsequent observations. Therefore, a general goal in social learning is to get around the issue of herding and enable truth learning. In this project we study the computational aspect of social learning with different observation models, the knowledge landscape, the environment of learning and social processes such as peer pressure, social influence, network effect as well as the existence of network dynamics and adversarial players. This is the fourth year of this summer REU project and the work in previous three years all led to publishable results at international academic conferences.



Compressing Large Foundation Models Using Low Separation Rank (LSR) Tensor Decomposition

Project #: DC2026-07
Mentor: Waheed Bajwa, Electrical and Computer Engineering

Abstract: Large foundation models like GPT-4 and Llama 3 have dramatically advanced AI capabilities, but their massive size, often billions of parameters, makes them difficult to run on limited hardware such as edge devices. While current compression techniques like Low-Rank Adaptation (LoRA) approximate weight updates using basic low-rank matrices, new research points to a more structured approach: Low Separation Rank (LSR).
LSR extends traditional tensor decompositions (like CANDECOMP/PARAFAC and Tucker) by using sums of separable components, such as Kronecker products, to represent high-dimensional tensors more efficiently. This project explores whether using LSR-based decompositions can outperform standard low-rank methods in terms of compression and performance. The work will focus on applying LSR constraints to the weight tensors of open-source foundation models (e.g., Llama-3-8B or similar), with the goal of significantly reducing their memory requirements while preserving accuracy. You will help develop custom tensor factorization modules, plug them into the PyTorch training loop, and evaluate the results on common NLP benchmarks.
Prerequisites: Students interested in this project are expected to have a good command of (numerical) linear algebra, calculus, and Python programming, and some familiarity with deep neural networks as well as one of the deep learning frameworks such as PyTorch and TensorFlow. Prior experience with cloud computing and/or cluster computing is useful, but is not a prerequisite.



Project at intersection of AI and economics

Project #: DC2026-08
Mentor: Xintong Wang, Computer Science

Abstract: The project will be at the intersection of artificial intelligence and economics, with an emphasis on studying multi-agent, game-theoretic environments where agents interact, often with the aid of algorithms, and agents' desires may not easily and reliably match up. Examples include online platforms, financial markets, a rush-hour commute, and a game of poker. Some concrete problems that the REU student may explore are (1) the formation of algorithmic collusion and how to mitigate or regulate it from an algorithmic perspective; (2) the use of LLM agent to improve the efficiency of negotiation tasks and bilateral trading (e.g., LLM for transition, value, or heuristic functions); and (3) some other strategic behavior on economic platforms or any multi-agent systems. Background knowledge in economics or game theory may be a plus but not necessary.



Multi-Robot Path Planning: Structural Studies and High-Performance Algorithms

Project #: DC2026-09
Mentor: Jingjin Yu, Computer Science

Abstract: Graph-based Multi-Robot Path Planning (MRPP), also known as Multi-Agent Path Finding (MAPF), seeks to quickly find high-quality paths for routing a large number of robots. MRPP/MAPF has numerous important applications, such as enabling the coordination of thousands of robots at Amazon warehouses. A significant amount of work has been done in the field, and many effective algorithms have been proposed; however, we are still in the relatively early stages of fully understanding the problem's structure and utilizing the insights to design more capable MRPP/MAPF algorithms. For example, current algorithms are mostly agnostic to the graph structure - can we leverage that information?
In this summer project, the student researcher will learn about the existing SOTA in MRPP/MAPF, seek a better understanding of particular structures, and leverage that understanding to design better algorithms. The research can be theoretical, algorithmic, or a mix of the two.
Some relevant papers:
https://arxiv.org/pdf/2002.04979
https://arxiv.org/pdf/2312.10887
https://arxiv.org/pdf/2201.08976.pdf (https://www.youtube.com/watch?v=v8WMkX0qxXg)



Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes

Project #: DC2026-10
Mentor: Wilma Olson, Chemistry and Chemical Biology

Abstract: Despite the astonishing amount of information now known about the sequential makeup and spatial organization of many genomes, there is much to learn about how the vast numbers of DNA base pairs in these systems contribute to their three-dimensional architectures and workings. The chemical structure of DNA contains information that dictates the observed spatial arrangements of base pairs within the double helix and the susceptibilities of different base sequences to interactions with proteins and other molecules. Our group has been developing software to analyze the spatial and energetic codes within known DNA structures and using these data in computational studies of larger nucleic acid systems. By representing the DNA bases as sets of geometric objects, we can capture the natural deformability of DNA as well as the structural details of ligand-bound interactions (1,2). There are now sufficient data to examine the influence of sequence context, i.e., the effects of immediate neighbors, on the apparent motions of the bases and to develop simple functions that capture this information. The observed behavior can be incorporated in computational treatments to improve our understanding of macromolecular systems in which DNA sequence and protein binding play important roles. For example, DNA elements that regulate gene expression often lie far apart along genomic sequences but come close together during genetic processing. The intervening residues form loops, mediated by proteins, which bind at the sequentially distant sites. We have developed methods that take account of both DNA and protein in simulations of loop formation and successfully captured experimental observations (3-5). Our recent studies of long-range communication between regulatory elements on short nucleosome-decorated DNA (6) provide a basis for linking the 'local' arrangements of nucleosomes in these systems to higher-order macromolecular structures, such as the looped architectures detected within chromosomes and the circular pieces of genomic DNA associated with a variety of cancers. We are developing new algorithms to characterize the spatial arrangements of nucleosome-decorated DNA circles and incorporating the collective information in nucleosome-level depictions of chromosomal DNA (7,8). The build-up of structural components in our treatment of DNA and chromatin offers new insight into the stepwise processes that underlie genome organization and processing.

References:
1. Young, R.T., L. Czapla, Z.O. Wefers†, B. Cohen†, W.K. Olson. (2022) Revisiting DNA sequence-dependent deformability in high-resolution structures: effects of flanking base pairs on dinucleotide morphology and global chain configuration. Life (Basel) 12:759. DOI: 10.3390/life12050759
2. Olson, W.K., Y. Li, M.O. Fenley. (2022) Insights into DNA solvation found in protein-DNA structures. Biophys. J. 121:4749-4758. DOI: 10.1016/j.bpj.2022.11.019
3. Wei, J., L. Czapla, M.A. Grosner, D. Swigon, W.K. Olson. (2014) DNA topology confers sequence specificity to nonspecific architectural proteins. Proc. Natl. Acad. Sci. USA 111:16742-16747. DOI: 10.1073/pnas.1405016111
4. Clauvelin, N., W.K. Olson. (2021) Synergy between protein positioning and DNA elasticity: energy minimization of protein-decorated DNA minicircles. J. Phys. Chem. B 125:2277-2287. DOI: 10.1021/acs.jpcb.0c11612
5. Young, R.T., N. Clauvelin, W.K. Olson. (2022) emDNA - A tool for modeling protein-decorated DNA loops and minicircles at the base-pair step level. J. Mol. Biol. 434:167558. DOI: 10.1016/j.jmb.2022.167558
6. Clauvelin, N., P. Lo†, O.I. Kulaeva, E.V. Nizovtseva, J. Dias, J. Zola, M. Parashar, V. Studitsky, W.K. Olson. (2015) Nucleosome positioning and composition modulate in silico chromatin flexibility. J. Phys.: Condens. Matter 27:064112. DOI: 10.1088/0953-8984/27/6/064112
7. Todolli, S., P.J. Perez, N. Clauvelin, W.K. Olson. (2017) Contributions of sequence to the higher-order structures of DNA. Biophys. J. 112:416-426. DOI: 10.1016/j.bpj.2016.11.017
8. Todolli, S., R.T. Young, A.S. Watkins†, A. Bu Sha†, J. Yager†, W.K. Olson. (2021) Surprising twists in nucleosomal DNA with implication for higher-order folding. J. Mol. Biol. 433:167121. DOI: 10.1016/j.jmb.2021.167121

†-Undergraduate research student



Understanding the capabilities and limitations of beyond-binary states in quantum computing

Project #: DC2026-11
Mentor: Yipeng Huang, Computer Science

Abstract: Quantum computing research is at a critical juncture. Prototype quantum computers are now at a level of reliability and scale where researchers can run small scale algorithms. The fundamentally new paradigm of computing may offer capabilities that would be unimaginable in existing classical computers. One principle for making quantum computers easier to build and easier to understand is to encode data in binary states (we call these qubit states). In such a simplification, only the lowest two states of a system are considered valid and participate in computation. With this simplification the quantum computer system only has to provide reliable means to transition between the two states, and to provide a way to observe and distinguish two states. Simplifying computing systems to only use two states has also been a useful simplification for classical computer systems.
Nonetheless, many of the physical and technological systems that researchers are exploring to build quantum computers offer more than just binary states. Systems that use three states are said to use "qutrit" states, and ones using in general d-number of states are said to use "qudit" states. Using these additional states can lead to interesting improvements upon known ways to use quantum computers. For example, qutrit states have been shown to enable more efficient communication of information, and have also been shown to enable larger capacity and higher reliability quantum computer programs.
This project intends to build on this progress in two fronts. First is in writing quantum computer programs that take advantage of qudit states. Second is in understanding the feasibility of using these states in real-world quantum computer prototypes. On the programming aspect, open source quantum computing frameworks such as Google Cirq offer some facilities for writing programs that use qudits. As part of this project you will write a few demonstration programs that show greater communication capabilities when quantum programs make use of qudits. On the prototypes aspect, the researchers who build quantum computers for public learning and experimentation (e.g., IBM Q) have shared some insights on how the quantum computer systems support qudit states. As part of this project you will help characterize how reliable these states are, how they may be used for programs, and what are the limitations preventing their use.
This interdisciplinary project will provide excellent exposure to computer science topics such as machine learning, quantum computing, and contributing to open source frameworks.



Building Safer Global AI Systems: Red Teaming Large Language Models Across Languages and Modalities

Project #: DC2026-12
Mentor: Vivek Singh, Library and Information Science

Abstract: This project focuses on strengthening AI security by red teaming large language models (LLMs), which means stress-testing them to uncover weaknesses, across multiple languages and modalities such as text, images, and speech. You will explore how adversarial scenarios can reveal hidden risks and help design strategies that make AI more robust, fair, and reliable for global users. It is a hands-on opportunity to learn cutting-edge techniques in AI safety while contributing to research that impacts real-world applications in healthcare, education, and beyond.
The ideal candidate will have Python programming experience and interest in AI security.



Kidney Exchange: Packing Cycles and Paths in Directed Graphs

Project #: DC2026-13
Mentor: Itai Feigenbaum, Computer Science (CUNY)

Abstract: A patient who requires a kidney transplant may have a proxy donor: a person who wishes to donate a kidney to the patient, but cannot due to medical incompatibility. However, the patient may be compatible with another patient's proxy donor. Kidney exchange programs allow patients to swap proxy donors, so that each swapping patient receives a compatible transplant. The program determines which swaps occur, with the objective of maximizing the number of transplants (or a related objective representing the total health benefit to patients from transplants). At the same time, the program must account for constraints such as simultaneity of surgeries, cancellations, participation incentives and more. Mathematically, the underlying problem is what's known as a packing problem on a directed graph: finding a set of vertex-disjoint directed cycles and paths which maximize an objective function under constraints. In this project, we will strive to design algorithms that solve this problem for new combinations of objectives and constraints, motivated by real-world challenges faced by kidney exchange programs.
Prerequisites: The only strict requirement is mathematical maturity, namely the ability to prove theorems. Knowledge of design and analysis of algorithms, basic discrete probability and rudimentary programming skills are all useful but not required. This project is suitable for a theoretically minded student who is considering pursuing a Ph.D. in computer science, mathematics or operations research in the future.
Background:
News story- https://www.youtube.com/watch?v=niiZNtqUMlk
Fun little puzzle- https://www.dcs.gla.ac.uk/research/algorithms/IP-MATCH/kidney_exchange_game/index.html
Concise summaries- https://en.wikipedia.org/wiki/Kidney_paired_donation, https://en.wikipedia.org/wiki/Optimal_kidney_exchange
Seminar videos- https://www.youtube.com/watch?v=6kQxq-EfVpQ, https://www.youtube.com/watch?v=4NbJTcfN6UA



Formalization of theorems in logic and complexity theory

Project #: DC2026-14
Mentor: Ayush Khaitan, Mathematics

Abstract: There is currently an effort underway to formalize theorems from complexity theory and logic and add them to Mathlib. We propose a project in which a student will try and autoformalize certain chapters from Marker's "Introduction to Logic", as part of this effort. We also hope to explore autoformalization in this context; i.e. building AI tools that can autonomously do so for long time horizons.



Circle packing and discrete geometry on surfaces

Project #: DC2026-15
Mentor: Feng Luo, Mathematics & Hongbin Sun, Mathematics & Zhenghao Rao, Mathematics

Abstract: A circle packing is an arrangement of circles in a given metric surface such that no two overlap and some of them are mutually tangent. Circle packing theory investigates the geometry and combinatorics of packings of arbitrarily sized circles: these give rise to discrete analogs of conformal mapping, Riemann surfaces, and the like. The subject was revolutionized by William Thurston in 1979 and has attracted much research attention since then. It can be considered as discrete analytic function theory. In this project, we will aim to prove the famous Koebe-Andreev-Thurston's circle packing of the unit disk and its generalizations. The topics to be covered include Schramm's rigidity of infinite circle packing, inversive circle packing, Rodin-Sullivan's solution to Thurston's conjecture on convergence, and the discrete uniformization theorem for circle packing.
The students should have a solid knowledge of surface topology, basic algebraic topology, and complex analysis. Some knowledge of Riemannian geometry of surfaces will be helpful.
References:
1. Introduction to Circle Packing: The Theory of Discrete Analytic Functions, by K. Stephenson. Cambridge University Press; Illustrated edition (April 18, 2005)
2. Notes on circle packing, by D. Glickenstein, B. Chow, F. Luo, to be distributed for REU students
3. Circle Packing: A Mathematical Tale, Kenneth Stephenson https://web.math.utk.edu/~kens/Notices_article.pdf
4. "The convergence of circle packings to the Riemann mapping", B. Rodin, D. Sullivan, J. Diff . Geom. 26 (1987), 349-360
5. "Rigidity of infinite disk patterns", He, Zheng-Xu, Ann. of Math. (2) 149 (1999), no. 1, 1-33.



Projects in Pure Math

Project #: DC2026-16
Mentor: Faculty of the Math Department, Mathematics

Abstract: Several projects will be run in 2026 by members of the pure math faculty, in addition to the ones described above. Applicants interested in a project in pure mathematics should list this project in their application and indicate their specific Math interests; further details on the projects will be worked out in the Spring.



Compactness in the mathematical universe

Project #: DC2026-17
Mentor: Filippo Calderoni, Mathematics & Cecelia Higgins, Mathematics

Abstract: Large cardinal axioms postulate the existence of combinatorial properties of infinity. These axioms are far beyond the usual axioms of mathematics and expand the burden of the mathematical universe. A well-studied consequence of large cardinal axioms is compactness. Compactness denotes the extent to which mathematical structures are determined by their local behavior. This project focuses on set theory, with an eye toward abstract algebra. We will examine some famous compactness properties for algebraic structures and investigate new ones in the context of infinite groups, modules, and other algebraic structures.
The student is expected to have some background in set theory and abstract algebra.



Spectral theory for infinite directed graphs

Project #: DC2026-18
Mentor: Cecelia Higgins, Mathematics & Filippo Calderoni, Mathematics

Abstract: At first glance, graph theory and linear algebra may seem quite unrelated; however, these two disciplines intersect in the rich area of spectral graph theory, where linear algebraic tools are applied to study the combinatorial properties of graphs. A key tool in this area is the adjacency matrix, whose entries encode information about which of the vertices in the graph are adjacent. The eigenvalues of the adjacency matrix confer important information about parameters such as the chromatic number of the associated graph.
The goal of this project is to extend several techniques from the spectral theory of finite graphs to a certain "nice" class of infinite graphs studied in descriptive set theory. We will be interested in particular in proving new bounds on a descriptive combinatorial parameter for infinite directed graphs, where the edge relation need not be symmetric.
Prerequisites: Familiarity with proof-based linear algebra and the basics of measure theory. Some prior experience with graph theory is helpful, but not required.



Constructing solutions to gauge theory equations

Project #: DC2026-19
Mentor: Paul Feehan, Mathematics

Abstract: Gauge theory (or "Yang-Mills theory") is a research area that lies at the interface between mathematics and high-energy physics. It is the language of mathematics used to help understand elementary particle interactions. Its importance for this purpose was discovered by Chen Ning Yang (Nobel Prize, 1957) and Robert Mills in 1953. Simon Donaldson (Fields Medal, 1986) discovered that Yang-Mills theory (in particular, spaces of solutions to certain types of Yang-Mills equations) could also be used to solve deep problems in mathematics, especially those involving the structure of smooth 4-dimensional manifolds. Edward Witten (Fields Medal, 1990) and Nathan Seiberg later discovered (1994) that simpler versions of the Yang-Mills equations (involving the Dirac operator) and their discovery led to new and simpler proofs of all the results obtained by Donaldson and others using his theory, along with many new ones as well, especially for symplectic 4-manifolds.
In this project, students will learn how to construct explicit solutions to some of these Yang-Mills equations (anti-self-dual or Seiberg-Witten) over simple 4-manifolds (like the sphere or complex projective space) and use techniques invented by Cliff Taubes to create new solutions over more complicated 4-manifolds using "gluing".
Students considering this project will typically be considering applications to PhD programs that emphasize differential geometry and topology, partial differential equations on manifolds, and mathematical physics.



Return to REU Home Page
DIMACS Home Page
DIMACS Contact Information
Page last modified on December 24, 2025.