**Please revisit this page frequently, since additional projects may be posted through January. Your application, including your preference for the offered projects, can be updated by using the link received after your registration.
The projects are indicative of the research planned by the mentors, but may be modified by the time that the REU starts.**

***Note that you must be a U.S. Citizen or Permanent Resident to be eligible for these projects. **

**The Minimum Circuit Size Problem**

**Project #: DC2020-01**

Mentor: **Eric Allender**, *Computer Science*

**Abstract: **The Minimum Circuit Size Problem (MCSP) is a computational problem in NP that is widely believed to be hard to compute, although many researchers suspect that it is NOT NP-complete. However, there is currently only weak evidence to support this suspicion. This REU project will endeavor to find stronger evidence, of the form "If MCSP is NP-complete, then something weird happens". Another possible direction involves clarifying the relative complexity of MCSP and other candidate NP-Intermediate problems. The PI has recently proved some new lower bounds on the complexity of MCSP and related problems, and it should be possible to build on this recent progress. The PI has previously supervised research with REU students on this topic which has led to publications, and there is more work to be done, which should be accessible to an REU student.*References:*

Eric Allender, Rahul Ilango, Neekon Vafa: The Non-Hardness of Approximating Circuit Size, Proc. 14th International Computer Science Symposium in Russia (CSR 2019), Lecture Notes in Computer Science 11532, pp. 13-24.

Eric Allender, Joshua Grochow, Dieter van Melkebeek, Cristopher Moore, and Andrew Morgan: Minimum Circuit Size, Graph Isomorphism and Related Problems, SIAM Journal on Computing, Vol. 47(4), 2018, 1339-1372.

Eric Allender and Shuichi Hirahara: New Insights on the (Non-)Hardness of Circuit Minimization and Related Problems, ACM Transactions on Computation Theory (ToCT), 11,4 (2019) 27:1-27:27.

Eric Allender, Dhiraj Holden, and Valentine Kabanets: The Minimum Oracle Circuit Size Problem. Computational Complexity 26 (2017) 469-496.

Eric Allender and Bireswar Das: Zero Knowledge and Circuit Minimization. Information and Computation, 256 (2017) 2-8.

Cody D. Murray and Ryan Williams: On the (Non) NP-Hardness of Computing Circuit Complexity, Theory of Computing, 13 (2017) 1-22.

Michael Rudow: Discrete Logarithm and Minimum Circuit Size, Information Processing Letters 128 (2017) 1-4.

**Machine learning and SAT-solvers**

**Project #: DC2020-02**

Mentor: **Periklis Papakonstantinou**, *Management Science and Information Systems*

**Abstract: **Within the last 20 years, we progressed from solving SAT on at most 200 variables to instances involving millions of variables and clauses, which made SAT-solvers ubiquitous optimization tools. SAT-solvers are today the standard tools for solving constraint satisfaction problems. A wealth of problems from VLSI design and verification, motion and planning, cryptanalysis, and various other areas are encoded as SAT formulas and solved with the help of competitive SAT-solvers. Besides the empirical analysis of SAT-solvers, their theoretical properties are poorly understood. In fact, this is an excellent opportunity for students new to research to explore a variety of mathematical questions ranging from "simple" to "tough". This project will end up being an exciting blend of combinatorics, computer science, and theoretical statistics.

The goal of this project is to employ machine learning techniques in order to understand the behavior of existing popular heuristics. The mentor has already obtained some preliminary experimental results and the interested student is expected to work by extending those and potentially turning them into a full-blown research paper.

**Combinatorial financial stock options**

**Project #: DC2020-03**

Mentor: **David Pennock**, *DIMACS*

**Abstract: **A financial stock option pays max(0, *s-k*), where *s* is the future stock price and *k* is called the strike price. In today's option market, every option, for each stock and strike price, trades separately. In this project, we will implement a combined market matching algorithm where options on all stocks, including portfolios of stocks, and for all strikes, trade together. The implementation will involve an algorithm similar to a mixed integer-linear program. The algorithm will have applications in finance and sports wagering.

**The wisdom of the market**

**Project #: DC2020-04**

Mentor: **David Pennock**, *DIMACS*

**Abstract: **A prediction market share pays $1 if a future event happens, like a wealth tax is adopted by 2022, and $0 if not. The market price can be thought of as a consensus estimate of the probability of the event. If all traders optimize according to the so-called Kelly criteria, then the market price evolves like a machine learning expert algorithm, guaranteed to be not much worse than the most accurate trader. In this project, we will explore new variations on this theme. What happens if traders optimize in a different way? What if traders themselves learn over time how accurate they are compared to the market price? We will investigate the problem using simulation and theory.

**Brain-inspired algorithm for controlling a robotic head**

**Project #: DC2020-05**

Mentor: **Konstantinos Michmizos**, *Computer Science*

**Abstract: **Can you imagine an algorithm that moves a robot by emulating how our brain moves our body? We can. There are 5 areas of knowledge required for a successful presence in our lab - computer science, neuroscience, robotics, mathematics and machine learning. In doing this project, you are expected to get a glimpse in at least three of these areas. This project will expose you to neuromorphic computing, the most brain-derived branch of AI, where computation is purely based on neurons. Specifically, you will learn how AI may emerge by combining simple computational principles that the brain uses, and designing a spiking neural network (SNN). The SNN will control a robotic head that we have developed in the lab. In addition, your research will help us develop a robotic system that not only "works", but can teach us something about the brain's function and dysfunction.

**Automating Swarming Strategies: Foundations, Algorithms, and Applications**

**Project #: DC2020-06**

Mentor: **Jingjin Yu**, *Computer Science*

**Abstract: **A large number of autonomous agents (e.g., humans, robots, or a mixture of these) can carry out swarming strategies that are impossible for a single agent or a few agents to realize. For example, we may want to deploy autonomous robots to guard one or multiple regions for various reasons including preventing other malicious agents from entering these regions (e.g., intrusion detection), or the spreading of adverse events (e.g., fire, disease, and so on). For an effective deployment, the robots must be assigned to locations that "best" cover the target regions based on some objectives. Swarming scenarios, which include but are not limited to the ones described above, raise many intriguing (and challenging) theoretical and algorithmic questions. The REU project will carry out fundamental studies to characterize the structures induced by these problems and design optimal algorithms for solving them. Ideally, the REU student should be familiar with either: (1) standard algorithms (e.g., perhaps the first chapters of CLRS) and is comfortable with mathematical reasoning (i.e., reading/writing proofs), or (2) coding in python and python math related libraries like numpy.

For some sample work from my group, please see https://arc.cs.rutgers.edu/pub.html and in particular

https://arxiv.org/pdf/1905.04434.pdf

https://arxiv.org/pdf/1807.03347.pdf

https://arxiv.org/pdf/1801.10465.pdf

https://arxiv.org/pdf/1711.07369.pdf

If you are interested in the topic, please reach out to me at jingjin.yu@cs.rutgers.edu to discuss your interests before applying to work with me.

**Motion planning for robot manipulation**

**Project #: DC2020-07**

Mentor: **Kostas Bekris**, *Computer Science*

**Abstract: **Motion planning for multi-robot manipulation (e.g., many robotic arms manipulating many objects) is challenging due to the many degrees of freedom involved. Desirable properties of algorithms to solve robot motion planning and manipulation tasks are completeness (a solution is guaranteed to be found if it exists), optimality (the solution found minimizes a cost function), and efficiency (the solution is found quickly). Even though complete algorithms exist using cell decomposition, they are slow. They have an exponential dependency in terms of computation time and memory. So, in practice, the algorithms used generate roadmaps (graphs) for the robot's motion by sampling (often uniformly at random) possible robot configurations and interpolating or steering a path between nearby samples. The state-of-the-art roadmap algorithms, though faster, aren't complete, but do guarantee weaker notions of completeness and optimality (i.e., guarantees w.r.t resolution or given arbitrarily many samples/iterations).

This line of work motivates a hybrid between discrete task planning and continuous motion planning. A task planner takes for granted a set of motion primitives and attempts to plan a sequence of these primitives that accomplish a given task. Each motion primitive is completed using the classical continuous motion planner. Due to the discrete nature of task planning, the techniques for optimality certificates used in motion planning are not directly applicable and new techniques need to be explored. In this REU project, the student will explore robot task and motion planning optimality guarantees. Based on the student's background or preference, the focus could range from purely theoretical to mostly practical (physical or simulated experiments).

References:

[1] Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning. The international journal of robotics research, 30(7), 846-894.

[2] Littlefield, Z., Krontiris, A., Kimmel, A., Dobson, A., Shome, R., & Bekris, K. E. (2014, October). An extensible software architecture for composing motion and task planners. In International Conference on Simulation, Modeling, and Programming for Autonomous Robots (pp. 327-339). Springer, Cham.

**Data Stories**

**Project #: DC2020-08**

Mentor: **James Abello**, *Computer Science*

**Abstract: **One of the central goals of this project is to imagine ways to free data from the somewhat necessary but "boring" sidebar graphs so that data can interact and express itself with confidence to human interpreters, i.e. providing artifacts that facilitate the creation of "*data driven stories*".* What makes a story truly data-driven is when the data is the central fuel that drives the narrative*. Data can help "*narrate*" many types of stories. In this project we will initiate the development of a framework to help identify and generate different types of "data stories" similar in spirit to Christopher Booker's seven basic story plots [1,2]. One important project phase will focus on the analysis of a variety of "*data stories*", in order to identify suitable "*story modes*", and corresponding "*data types and characteristics*".

The "*story modes*" include:

a. Narrations of change over time,

b. Starting big (small) and drilling down (zooming out),

c. Contrasting Highlights,

d. Exploration of questions resulting when two divergent lines of data intersect and one overtakes the other,

e. Dissection of factors that contribute to form the big picture, and

f. Outliers profiling.

The "*data types*" may include numbers, characters, strings, sets, lists, trees, posets,lattices, graphs, hypergraphs, and simplicial complexes. Depending on the story context, each of these "*data types*" may possess its own intrinsic characteristics like*value, visual representability, variety, volume, velocity, volatility, veracity, and provenance*. Data types susceptibility to change and their ability to "interact" with other data types may be determinant factors of their "*story roles*".

Note: This project is part of a larger initiative to build data stories initially funded by the National Science Foundation.

Qualifications: Python, Java Programming, JavaScript Libraries for Visualization like D3.js [3], Classes on Discrete Math, Graph Theory, Algorithms [4], Statistics, and Probability. Some knowledge of low dimensional topology may be helpful but is not necessary. Excellent command of the English language.

References:

[1] http://mediashift.org/2015/06/exploring-the-7-different-types-of-data-stories/

[2] Christopher Booker, The Seven Basic Plots: Why we tell stories, Continuum, 2004

[3] Effective Visualization of Multi-Dimensional Data - A Hands-on Approach, DipanjanSarkar

[4] Algorithms, S. Dasgupta, C. Papadimitriou, U. Vazirani, McGraw Hill, 2008

**Deep Learning for Quality Prediction in Metal Additive Manufacturing**

**Project #: DC2020-09**

Mentor: **Weihong 'Grace' Guo**, *Industrial and Systems Engineering*

**Abstract: **Additive manufacturing, or three-dimensional (3D) printing, is a promising technology that enables the direct fabrication of complex shapes and eliminates the waste associated with traditional manufacturing. The wide adoption of 3D printing for functional parts is hampered by the poor understanding of the process-quality causal relationship and the absence of an online data-driven prediction method for part quality. This study aims to develop methods for extracting spatiotemporal patterns from real-time, in-process thermal images and then correlating the patterns with part quality. Possible solutions include statistical methods in spatio-temporal modeling such as the Gaussian processes, and deep learning methods such as CNN and RNN. A desirable novelty in the method is how to incorporate physics knowledge about the manufacturing process into deep learning. The student will have access to 3D printers and multiple measurement systems in the School of Engineering at Rutgers.

Requirements: data analytics, machine learning, python or R

**Design and Evaluation of Intelligent Scheduling in a Green Data Center**

**Project #: DC2020-10**

Mentor: **Thu Nguyen**, *Computer Science* & **Peijian Wang**, *Computer Science*

**Abstract: **As data centers increasingly serve the IT and Internet industry, the power consumed by these data centers has been surging significantly. This translates into high operational cost and environmental problems (e.g. carbon emission), which attract more and more attentions from both academia and industrial. Green data center is a promising solution to this problem. It is fully or partially powered by renewable energy, e.g. solar, wind, and could mitigate both energy cost and carbon emission simultaneously. Although the idea of green data center is sound, there still needs a lot of efforts to deal with the problem introduced by green energy, e.g. intermittency. We have built Parasol, a green micro-datacenter, which is partially powered by solar panels and partially free-cooled by an air-side economizer. The successful design and research on Parasol have demonstrated the bright future of green data center. However, five years have passed since our first design. A lot of emerging technologies are driving a re-design of Parasol. On the one hand, new computing hardware has become available, e.g. Non-volatile memory, GPUs, new energy storage device. On the other hand, a lot of new algorithms and software technologies have been proposed and achieves successes in many areas, e.g. SDN, AI. All these technologies introduce both opportunities and challenges to green data center research. We need to re-consider the design of Parasol from the perspective of both hardware and software. In this project, we will only focus on a certain aspect of green data center research: How to design reinforcement learning (RL) based scheduling. The prospective student will become familiar with Parasol infrastructure and RL algorithm, and participate in the design and implementation of the system.

**Learn to program a real high-speed Internet router**

**Project #: DC2020-11**

Mentor: **Srinivas Narayana**, *Computer Science* & **Richard Martin**, *Computer Science*

**Abstract: **Your Internet data traffic goes through a series of devices called routers(think of them like highway interchanges) on the way to your favorite website and back to your phone or laptop. Routers are high-speed devices, meant to move lots of data traffic very quickly from one place to another. In this project,you will learn how to program a real high-speed router (capable of processing 6 Terabits/s) to do your bidding, including making the router move traffic to the right place, and ensuring that the network paths, i.e., the "highways" that your data moves through, do not get congested. You will work as part of a larger research effort intended to define mechanisms and policies that carve out University network resources to various users by programming routers appropriately. If you are successful, your software may eventually be deployed on a real multi-site University campus network.

Familiarity with computer programming is required. Familiarity with the Linux environment is a plus. Please contact Srinivas Narayana (srinivas.narayana@rutgers.edu, https://www.cs.rutgers.edu/~sn624/) if you are interested or have questions.

**Accelerate your network interface (NIC) to process data blazingly fast**

**Project #: DC2020-12**

Mentor: **Srinivas Narayana**, *Computer Science* & **Richard Martin**, *Computer Science*

**Abstract: **The Network Interface Card (NIC) is the "glue" between a computer (say your laptop) and the rest of the Internet. The NIC sits in the path of the data moving in and out of your machine, allowing it to compute over the data as it transits. For example, a NIC might automatically compress data going in and out of your machine to save bandwidth and reduce download time. NICs have gotten increasingly flexible in the computations they support over the years. In this project, we will explore (i) ways to get programs of interest up and running on real high-speed NICs (running at 100 Gigabit/s), (ii) methods to optimize program speed. Your work is part of a larger research effort to help the computer networking research community understand the capabilities and limitations of hardware acceleration using NICs. Your software may eventually become part of a research network that flexibly processes data traffic for areal University campus.

Familiarity with computer programming is required. Familiarity with the Linux environment is a plus. Please contact Srinivas Narayana (srinivas.narayana@rutgers.edu, https://www.cs.rutgers.edu/~sn624/) if you are interested or have questions.

**Projects in Pure Math**

**Project #: DC2020-13**

Mentor: **Faculty of the Math Department**, *Mathematics*

**Abstract: **The Math REU Coordinator is Prof. Anders Buch. Several projects will be run in 2020 by members of the pure math faculty. Past topics have included Schubert Calculus (Buch), Lie algebras (Lepowsky), mathematical Yang-Mills theory (Woodward), etc. Applicants interested in a project in pure mathematics should list this project in their application and indicate their specific Math interests; further details on the projects will be worked out in the Spring.

**Problems in dynamical systems**

**Project #: DC2020-14**

Mentor: **Konstantin Mischaikow**, *Mathematics*

**Abstract: **The following projects will be supervised by Prof. Konstantin Mischaikow and members of his group including, 1. Prof. Marcio Gameiro, 2. Prof. Ewerton Vieira, 3. Hill Assistant Prof. Shane Kepley, 4. Hill Assistant Prof. William Cuello, 5. Postdoctoral Fellow Dr. Elena Queirolo

- Studying switching systems via sloppy models

- Domain decomposition for Taylor series ODE solvers

- Finding Hopf bifurcations in the Lorenz84 system

- An alternative approach to predator-prey and multispecies stability analysis

- Identification of bifurcations using algebraic topology

For details, please see this pdf file.

**Coloring problems in graph theory**

**Project #: DC2020-15**

Mentor: **Bhargav Narayanan**, *Mathematics*

**Abstract: **Suppose we start with a coloring of a graph *G* (not necessarily proper) with three colors, red, blue and green and a parameter λ > 1. We update this coloring a fixed number of times as follows. Pick a vertex at random and resample its color based on its neighbors: if it has *B* blue neighbors, then the probability that its new color is blue is proportional to λ^{B}, and similarly for the two other colors. Since λ>1, notice that vertices prefer to be colored similarly to the majority of their neighbors. How likely is the graph is to be colored entirely blue at the end of this sequence of Glauber updates? What's the starting state that makes this event most likely? Surely, the best starting point is to color everything blue! Annoyingly this, and a number of related problems of this flavor remain open. The use of computerized searches to find potential couplings is expected to be a good starting point for tackling such problems.

**Number Theory and Hyperbolic Geometry**

**Project #: DC2020-16**

Mentor: **Alex Kontorovich**, *Mathematics*

**Abstract: **Participants will study various topics at the interface of hyperbolic geometry and number theory. Projects will involve both theoretical and computational components. Topics of particular interest include, e.g., conjectures on proximity of punctured Riemann surfaces in the Teichmuller metric, and the study of dynamics of various modular maps on the integers.
No prior knowledge of these topics is assumed.

**Exploring notions of momentum for interacting quantum particles**

**Project #: DC2020-17**

Mentor: **Shadi Tahvildar-Zadeh**, *Mathematics* & **Michael Kiessling**, *Mathematics*

**Abstract: **In classical particle mechanics, momentum is a well-defined notion and has a precise value for each particle in the system. Not so in quantum mechanics, where kinematic notions such as position and momentum become fuzzy, due to the Uncertainty Principle.If we view the position of a single quantum particle as a random variable with a given probability distribution, one with density function ρ(*x*) = |ψ(*x*)|^{2}, where ψ is the quantum-mechanical wave function associated with the particle, it is natural to think of the momentum as also being given by a probability distribution with density function ϱ = |ψ̂(k)|^{2} where ψ̂ denotes the Fourier transform of ψ. It is therefore possible to refer to the mean of the latter distribution as "the momentum of the particle" (it is in fact the expected value of the momentum operator).

Alternatively, if the velocity of the particle is determined by the wave function, for example through a guiding law, then the momentum of a massive particle can be defined as the product of the mass and the velocity (appropriately modified in the relativistic setting.) For massless particles on the other hand, this does not seem to be a viable alternative.

For a system of *N*≥ 2 particles, the situation is more complicated, since there is just one wave function associated with the entire system, not *N* different wave functions, one for each particle. For spinless particles the conditional wave functions of individual particles (conditioned on the positions of all the other particles), are a viable alternative, but this is not the case for particles with spin degrees of freedom.

Participants in this REU project are asked to examine various notions of momentum that can be defined for individual particles in an N-body quantum-mechanical system in a variety of contrasting situations: relativistic vs. nonrelativistic, massive vs. massless, fermions vs. bosons, spinless vs. with spin, etc., and develop a list of "pros and cons" for each notion. Restricting themselves to simplest one-dimensional situations where such computations are feasible, they would then calculate quantum trajectories for each particle in an interacting system using randomly-distributed initial positions and appropriate guiding laws; explore to what extent the wave function of the system is in the scattering regime close to being locally a pure product or locally a plane wave; draw conclusions about which notion of momentum is more appropriate in each situation; and, if possible, declare a winner overall.

**Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes**

**Project #: DC2020-18**

Mentor: **Wilma Olson**, *Chemistry and Chemical Biology*

**Abstract: **Despite the astonishing amount of information now known about the sequential makeup and spatial organization of many genomes, there is much to learn about how the vast numbers of DNA base pairs in these systems contribute to their three-dimensional architectures and workings. The chemical structure of DNA contains information that dictates the observed spatial arrangements of base pairs within the double helix and the susceptibilities of different base sequences to interactions with proteins and other molecules. Our group has been developing software to analyze the spatial and energetic codes within known DNA structures and using these data in computational studies of larger nucleic acid systems (1-3). By representing the DNA bases as sets of geometric objects, we can capture the natural deformability of DNA as well as the structural details of ligand-bound interactions (4). There are now sufficient data to examine the influence of sequence context, i.e., the effects of immediate neighbors, on the apparent motions of the bases and to develop simple functions that capture this information. The observed behavior can be incorporated in computational treatments to improve our understanding of macromolecular systems in which DNA sequence and protein binding play important roles. For example, DNA elements that regulate gene expression often lie far apart along genomic sequences but come close together during genetic processing. The intervening residues form loops, mediated by proteins, which bind at the sequentially distant sites. We have developed methods that take account of both DNA and protein in simulations of loop formation and successfully captured experimental observations (2,3). Our recent studies of long-range communication between regulatory elements on short nucleosome-decorated DNA (5,6) provide a basis for linking the 'local' arrangements of nucleosomes in these systems to higher-order macromolecular structures, such as the looped architectures detected within chromosomes (7). We are developing new algorithms to characterize the spatial arrangements of nucleosomes and incorporating the collective information in nucleosome-level depictions of chromosomal DNA (8,9). The build-up of structural components in our treatment of DNA and chromatin offers new insight into the stepwise processes that underlie genome organization and processing.

References:

1. Olson, W.K., A.A. Gorin, X.-J. Lu, L.M. Hock and V.B. Zhurkin. 1998. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl. Acad. Sci., USA 95:11163-11168.

2. Czapla, L., D. Swigon and W.K. Olson. 2006. Sequence-dependent effects in the cyclization of short DNA. J. Chem. Theor. Comp. 2:685-695.

3. Wei, J., L. Czapla, M.A. Grosner, D. Swigon and W.K. Olson. 2014. DNA topology confers sequence specificity to nonspecific architectural proteins. Proc. Natl. Acad. Sci. USA 111:16742-16747.

4. Tolstorukov, M.Y., A.V. Colasanti, D. McCandlish, W.K. Olson and V.B. Zhurkin. 2007. A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning. J. Mol. Biol. 371:725-738

5. Kulaeva, O.I., G. Zheng, Y.S. Polikanov, A.V. Colasanti, N. Clauvelin, S. Mukhopadhyay, A.M. Sengupta, V.M. Studitsky and W.K. Olson. 2012. Internucleosomal interactions mediated by histone tails allow distant communication in chromatin. J. Biol. Chem. 287:20248-20257.

6. Nizovtseva, E.V., N. Clauvelin, S. Todolli, Y.S. Polikanov, O.I. Kulaeva, S. Wengrzynek, W.K. Olson and V.M. Studitsky. 2017. Nucleosome-free DNA regions differentially affect distant communication in chromatin. Nucleic Acids Res 45:3059-3067.

7. Rao, S.S.P., M.H. Huntley, N.C. Durand, E.K. Stamenova, I.D. Bochkov, J.T. Robinson, A.L. Sanborn, I. Machol, A.D. Omer, E.S. Lander and E.L. Aiden. 2014. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665-1680.

8. Clauvelin, N., P. Lo, O.I. Kulaeva, E.V. Nizovtseva, J. Dias, J. Zola, M. Parashar, V. Studitsky and W.K. Olson. 2015. Nucleosome positioning and composition modulate in silico chromatin flexibility. J. Phys.: Condens. Matter 27:064112.

9. Todolli, S., P.J. Perez, N. Clauvelin and W.K. Olson. 2017. Contributions of sequence to the higher-order structures of DNA. Biophys J. 112:416-426.

**Deep genomic analysis of tumor specimens**

**Project #: DC2020-19**

Mentor: **Hossein Khiabanian**, *Rutgers Cancer Institute of New Jersey*

**Abstract: **Understanding complexity, dynamics, and stochastic patterns in genomic data - concepts native to physics and mathematics - is critical for elucidating how disease states originate and evolve. The main focus of this project is to use statistical, information theoretic approaches to dissect the cellular and molecular heterogeneity that enables cancer cells to evolve. We use high-throughput DNA sequencing methods, and computationally integrate genetic and clinical data from large cohorts of cancer patients being treated under various precision medicine programs, with the aim to identify markers of Darwinian selection using longitudinal data. Our ultimate goal is to develop novel statistical platforms for fast translation of genomic data into clinical practice.

Proficiency in programming and statistical analysis are essential; knowledge of genomics and biology is not required but highly valued.

**Genomic data-guided mathematical modeling of cancer**

**Project #: DC2020-20**

Mentor: **Subhajyoti De**, *Rutgers Cancer Institute of New Jersey*

**Abstract: **Cancer is the second leading cause of death in adults, claiming the lives of more than half a million Americans every year. Tumor starts from a single rogue cell in the body, which becomes defective, and the clones of that initial cell start to grow (or die out), and acquire new mutations depending on certain parameters stochastically; and by the time of detection the tumor has ~10^{9} cells. During treatment, the patient initially responds and the tumor shrinks, but remaining cells inevitably develop mutations that give them drug resistance, and the tumor starts to grow again until the patient dies. It is becoming apparent that computational algorithms and mathematical frameworks can identify subtle patterns in patient profiles, which can help with early detection and personalized treatment, improving their outcome. For instance, we are developing mathematical frameworks to model tumor development and emergence of resistance against cancer drugs, using Galton Watson branching process - an area of stochastic processes, which also has applications in diverse areas (e.g. option pricing, predicting viral outbreak, news propagation on social networks). We are combining mathematical models with clinical data from cancer patients to ask some fundamental questions - (i) how long does it take a tumor to grow, and can we detect cancer early? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop mathematical frameworks to model tumor growth using branching process.

Proficiency in computer programming (Java, C++, or python), familiarity with probability theory, concepts of stochastic processes, and optimization techniques are required. Knowledge in Biology is not required.

**Data-driven precision medicine approaches to cancer**

**Project #: DC2020-21**

Mentor: **Subhajyoti De**, *Rutgers Cancer Institute of New Jersey*

**Abstract: **One in four is expected to develop cancer in their lifetime. No two patients are alike, and we need better strategies to detect cancer early,find right treatment for the patients, and monitor their outcome using a personalized medicine approach. Data science approaches using state-of-the-art computational algorithms and statistical frameworks can identify subtle patterns sifting through large amount of information in patient profiles, which can help with early detection and personalized treatment, improving their outcome. Within the scope of the Precision Medicine Initiative at the Rutgers Cancer Institute, for each patient we have troves of data on genetic mutations, and clinical and pathological reports. We are developing and implementing data science techniques to ask some fundamental questions: (i) can we distinguish aggressive tumor and treat patients accordingly? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop data science frameworks to perform integrative analysis of cancer genomics and clinical data.

Proficiency in computer programming (Java, C++, or python), statistical analysis software (R or SAS), familiarity with concepts such as linear models, model selection, support vector machines are preferred. Knowledge in Biology is not required.

**Protecting Vulnerable Communities from Targeted Violence and Mass Casualty Attacks**

**Project #: DC2020-22**

Mentor: **Ava Majlesi**, *Miller Center for Community Protection & Resilience/Center for Critical Intelligence Studies* & **Sassi Rajput**, *Center for Critical Intelligence Studies*

**Abstract: **The overarching goal of the Miller Center for Community Protection and Resilience (Miller Center) and its partners is to assist vulnerable communities, particularly communities of faith, to enhance their safety and their standing in society by improving their relationships with law enforcement, with other government agencies, and with other vulnerable communities. This goal is achieved by identifying and disseminating best practices, offering police-community workshops, consulting with and assisting vulnerable populations on security and civil liberties issues, and engaging in research relevant to the protection of vulnerable populations.

The need for the establishment of a Center for Community Protection and Resilience dedicated to the protection of vulnerable populations is a direct outgrowth of Rutgers' work over the past several years on the Faith-Based Communities Security Program. That program, which was launched in May 2014 in the wake of a lethal terrorist attack on the Jewish Museum in Brussels, was founded in recognition of a rising tide of anti-Semitism in Europe and America and of intolerance generally. The reality that almost a billion people now live in countries where they were not born, coupled with the continued struggles of historic minority populations, mean that we live as never before in a world of vulnerable populations. Events since the inception of the program - terrorist attacks in public settings such as stadiums, cafes, subway stations and airports, desecration of religiously affiliated buildings, schools and homes, and mass killings in churches, mosques, temples, and synagogues - have, if anything, served to underscore both the vulnerabilities of certain populations and the growing levels of violence - verging, in some cases, on outright genocide -- directed at them.

The Center is in the process of developing an online guide of best practices to better protect vulnerable communities in the United States from mass casualty attacks and targeted violence. Student researchers will assist in updating this guide by employing quantitative analysis to assess the levels of preparedness and resilience demonstrated by houses of worship and vulnerable communities. Students will analyze social media and other components to further understand the relationships and networks between vulnerable communities, law enforcement, the public and the private sector in protecting communities and preventing incidents. Students will also assist in an effort to assess the impact of the Miller Center's training programs on communities served (e.g. Brussels, Belgium; Whitefish, Montana). This will include the development of survey instruments, qualitative interviews with program participants, and assessment of quantitative and qualitative data.

The Center for Critical Intelligence Studies, through the Intelligence Community Center for Academic Excellence, will fund three REU Fellows to work on this research effort with the Miller Center for summer 2020.

**Homeland Security and Policing Data Analysis**

**Project #: DC2020-23**

Mentor: **Christie Nelson**, *Masters of Business and Science in Analytics and CCICADA*

**Abstract: **The advancements and wide-spread use of technology across the nation has made digital forensics and data analysis an important component of the criminal justice system. The purpose of this project is to analyze digital forensics training and certification requirements for the Department of Homeland Security investigative units and State and Local Law Enforcement, identify opportunities and gaps in digital forensics training, and recommend digital forensics training and certification pathways. Another piece of this project is to utilize data analysis and data gathering to improve policing situational awareness. This will be to understand what public data is available and how police can utilize this information through data mining and analysis.

This project is best suited to students with a background in: Math, Computer Science, Statistics, Data Mining/Machine Learning, Cyber Security, and related fields.

**Sum edge coloring of multigraphs**

**Project #: DC2020-24**

Mentor: **Atefeh Mohajeri**, *Math and Algorithmic Group - Nokia Bell Labs*

**Abstract: **We consider a scheduling problem which is presented by a conflict graph G = (V, E) where the vertex set V represents processors and an edge(v_{1}, v_{2}) ∈ E means that processors v_{1} and v_{2} are competing over a resource; i.e., they cannot run their jobs simultaneously. Our objective is to minimize the average response time, which is equivalent to minimizing the sum of completion times. Using graph theoretical language, one can rephrase this problem as follows: we want to color vertices of G, using positive integers,such that vertices that share an edge receive different colors. The sum coloring problem asks for a coloring of vertices of G such that the sum of the colors is minimum. A coloring that achieves this minimum is called an optimum sum coloring and the number of colors in such coloring is called the strength of a graph. The sum edge coloring problem and the edge strength of a graph are defined in a similar way. Both sum (edge) coloring problem and finding (edge) strength of a graph are NP-hard. We propose to study upper bounds on strength of multigraphs (we allow multiple edges between two vertices). From the algorithmic point of view, we will explore classes of graphs for which there are polynomial time algorithms to find a minimum sum edge coloring.

Another component of the project will focus on exploring potential applications of neural networks to get approximation algorithms for some classes of graphs. This is motivated by the fact that deep neural networks have recently been shown to be effective in finding approximations for resource allocation problems.

**Decentralized Graph Coloring**

**Project #: DC2020-25**

Mentor: **Sepehr Assadi**, *Computer Science*

**Abstract: **It is well-known that any graph with maximum degree Δ can be properly colored with Δ+1 colors. There is a text-book greedy algorithm for this problem that simply iterates over the vertices in arbitrarily order and assigns a color to each vertex that does not appear in its neighborhood. However, this algorithms is highly centralized and is thus not suitable in many distributed settings. Motivated by applications to channel selection, a highly decentralized algorithm for Δ+1 coloring is proposed in [2]. In this algorithm, we initially pick a color for each vertex independently and uniformly at random from {1,...,Δ+1}. Then, at any time step, a single conflicted vertex v, namely a vertex that is assigned the same color as one of its neighbors, is chosen uniformly at random. We then pick a color for v uniformly at random from {1,...,Δ+1} and repeat the same process until there are no conflicting vertices. It is shown in [2] that this algorithm can be implemented in a highly decentralized setting. The question now is how long does this algorithm take to converge to a proper Δ+1 coloring of G? It is already shown in [2] that this algorithm converges in expected O(n*Δ) steps (of recoloring). However, no lower bound better than Ω(n*logΔ) is known for this problem. Moreover, a recent work of [3] provides strong evidence that the correct answer may indeed by O(n*logΔ) steps. The goal of this project is to investigate this possibility and provide better upper (or lower) bounds on the expected runtime of this very natural algorithm. It is worth mentioning that a recent work of PI on sublinear algorithms for (Δ+1) coloring [1] seems to suggest an approach for addressing this question.

[1] Sepehr Assadi, Yu Chen, Sanjeev Khanna: Sublinear Algorithms for (Δ + 1) Vertex Coloring. SODA 2019: 767-786

[2] Apurv Bhartia, Deeparnab Chakrabarty, Krishna Chintalapudi, Lili Qiu, Bozidar Radunovic, Ramachandran Ramjee: IQ-Hopping: distributed oblivious channel selection for wireless networks. MobiHoc 2016: 81-90

[3] Deeparnab Chakrabarty, Paul de Supinski: On a Decentralized (Δ+1)-Graph Coloring Algorithm. SOSA 2020.

**Deterministic Streaming Algorithms for Approximating Median**

**Project #: DC2020-26**

Mentor: **Sepehr Assadi**, *Computer Science*

**Abstract: **We are given a stream of numbers e_1,e_2,...,e_n from a large universe U in a stream and our goal is to output an epsilon-approximate median of these numbers, namely, any number among e_1,...,e_n whose rank is between n/2-epsilon*n and n/2+epsilon*n.
How much space is needed to achieve this task? This is a classical problem in the streaming model and has been studied extensively over the years. In particular, [2] gave a randomized algorithm for solving this problem by storing only O(1/eps) numbers in the stream which can be shown is optimal. However, for deterministic algorithms, the best algorithm due to [1] requires storing O((1/eps)*log(eps*n)) many numbers. The goal of this project is to tighten the gap between the upper and lower bounds for deterministic algorithms for this problem. It is worth mentioning that the algorithm of [1] is quite interacted and in fact simplifying (the analysis of) this algorithm alone would be a very nice contribution (as already asked in several places in the literature).

[1] Michael Greenwald, Sanjeev Khanna: Space-Efficient Online Computation of Quantile Summaries. SIGMOD Conference 2001: 58-66

[2] Zohar S. Karnin, Kevin J. Lang, Edo Liberty: Optimal Quantile Approximation in Streams. FOCS 2016: 71-78

**Algebraic invariants of pretzel knots**

**Project #: DC2020-27**

Mentor: **Kristen Hendricks**, *Mathematics*

**Abstract: **Heegaard Floer homology is a powerful package of invariants of 3-manifolds and knots that has been extremely useful in answering deep topological questions in the past two decades. Although these invariants notionally come from counting solutions to partial differential equations in high-dimensional manifolds - which is hard! - one of the great things about them is that in certain special cases they can be computed much more straightforwardly. Therefore, the fundamental problem at the heart of this project is the sort of thing that's straightforward to describe and work through, even though there's lots of interesting context we can also discuss in parallel to working on it.

**Mutations of polynomials**

**Project #: DC2020-28**

Mentor: **Christopher Woodward**, *Mathematics*

**Abstract: **Mutations arise in mirror symmetry and so-called cluster theory when there are different natural coordinate systems on the same solution set. For example, consider the set of 2-dimensional subspaces of a 4-dimensional vector space. Taking two basis vectors we get a 2x4 matrix, and any two of the determinants of 2x2 matrices inside the resulting 2x4 matrix give coordinates for the set. The various coordinates systems are related by a mutation. We will explore this and other examples.
More concretely a mutation of a polynomial f(x,y) in two variables x,y with factor h(x,y) is one obtained by replacing each monomial x^i y^j with x^i y^j h(x,y)^(l(i,j)) where l(i,j) is some power that depends in a linear way on the powers i,j. For example, a mutation is given by replacing y with y(1+x)^(-1) and multiplying by (1+x)^(-2). This mutation applied to f(x,y) results in the polynomial f'(x,y) = (1+x) + 2y + y^2. If two polynomials are related by mutations we say they are mutation-equivalent. Problem: classify mutation-equivalence-classes of polynomials. There is a conjecture of Petracci and collaborators about what the mutation-equivalence classes are. We will investigate a version of this conjecture.

**Research into Smale's 7th problem**

**Project #: DC2020-29**

Mentor: **Michael Kiessling**, *Mathematics*

**Abstract: **

**Project #: DC2020-30**

Mentor: **Shay Moran**, *Google Mind, Princeton*

**Abstract: **

**Project #: DC2020-31**

Mentor: **Ron Holzman**, *Mathematics, Princeton University*

**Abstract: **

**Project #: DC2020-32**

Mentor: **Martin Farach-Colton**, *Computer Science*

**Abstract: **

**Project #: DC2020-33**

Mentor: **Daniel Ketover**, *Mathematics*

**Abstract: **

Return to REU Home Page

DIMACS Home Page

DIMACS Contact Information

Page last modified on January 21, 2020.

DIMACS Home Page

DIMACS Contact Information

Page last modified on January 21, 2020.