2018 DIMACS REU Proposed Project Descriptions

Please revisit this page frequently, since additional projects may be posted through January. An application, including your preference for the offered projects, can be updated by using the link received after your registration.

*Note that you must be a U.S. Citizen or Permanent Resident to be eligible for these projects.

Characterizing the Quality of 3D-Printed Parts using Spatiotemporal Data Analytics

Project #: DC2018-01
Mentor: Weihong 'Grace' Guo, Industrial and Systems Engineering

Abstract: Additive manufacturing, or three-dimensional (3D) printing, is a promising technology that enables the direct fabrication of complex shapes and eliminates the waste associated with traditional manufacturing. A major issue that adversely affects its performance, and hence wider adoption, is that material solidification in the printing process yields geometric shape deviation. This study focuses on the analysis of 3D surface measurements of 3D-printed parts. Dimension, contour, and surface roughness are measured and represented in image data. We are developing methods for extracting spatiotemporal patterns from the measurement images and then correlating the patterns with part quality. The student will have access to the 3D printers in Rutgers ISE and the Keyence 3D measurement system in Dr. Guo's lab. Familiarity with R is a must.

Statistics, data analytics, R

The Minimum Circuit Size Problem: A possible NP-Intermediate Problem

Project #: DC2018-02
Mentor: Eric Allender, Computer Science

Abstract: The Minimum Circuit Size Problem (MCSP) is a computational problem in NP that is widely believed to be hard to compute, although many researchers suspect that it is NOT NP-complete. However, there is currently only weak evidence to support this suspicion. This REU project will endeavor to find stronger evidence, of the form "If MCSP is NP-complete, then something weird happens". Another possible direction involves clarifying the relative complexity of MCSP and other candidate NP-Intermediate problems. The PI has recently proved some new lower bounds on the complexity of MCSP and related problems, and it should be possible to build on this recent progress. The PI has previously supervised research with REU students on this topic which has led to publications, and there is more work to be done, which should be accessible to an REU student.

- Eric Allender, Joshua Grochow, Dieter van Melkebeek, Cristopher Moore, Andrew Morgan, Minimum Circuit Size, Graph Isomorphism and Related Problems, Innovations in Theoretical Computer Science (2018), to appear.
- Eric Allender, Dhiraj Holden, Valentine Kabanets: The Minimum Oracle Circuit Size Problem. Computational Complexity 26 (2017) 469-496.
- Eric Allender, Bireswar Das: Zero Knowledge and Circuit Minimization. MFCS (2) 2014: 25-32. (Expanded version accepted to the journal Information and Computation.)
- Cody D. Murray, Richard Ryan Williams: On the (Non) NP-Hardness of Computing Circuit Complexity, Theory of Computing, 13 (2017) 1-22.
- Michael Rudow: Discrete Logarithm and Minimum Circuit Size, Information Processing Letters 128 (2017) 1-4.

Genomic data-guided mathematical modeling of cancer

Project #: DC2018-03
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: Cancer is the second leading cause of death in adults, claiming the lives of more than half a million Americans every year. Tumor starts from a single rogue cell in the body, which becomes defective, and the clones of that initial cell start to grow (or die out), and acquire new mutations depending on certain parameters stochastically; and by the time of detection the tumor has ~109 cells. During treatment, the patient initially responds and the tumor shrinks, but remaining cells inevitably develop mutations that give them drug resistance, and the tumor starts to grow again until the patient dies. It is becoming apparent that computational algorithms and mathematical frameworks can identify subtle patterns in patient profiles, which can help with early detection and personalized treatment, improving their outcome. For instance, we are developing mathematical frameworks to model tumor development and emergence of resistance against cancer drugs, using Galton Watson branching process - an area of stochastic processes, which also has applications in diverse areas (e.g. option pricing, predicting viral outbreak, news propagation on social networks). We are combining mathematical models with clinical data from cancer patients to ask some fundamental questions - (i) how long does it take a tumor to grow, and can we detect cancer early? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop mathematical frameworks to model tumor growth using branching process.
Proficiency in computer programming (Java, C++, or python), familiarity with probability theory, concepts of stochastic processes, and optimization techniques are required. Knowledge in Biology is not required.

Data-driven precision medicine approaches to cancer

Project #: DC2018-04
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: One in four people is expected to develop cancer in their lifetime. No two patients are alike, and we need better strategies to detect cancer early, find right treatment for the patients, and monitor their outcome using a personalized medicine approach. Data science approaches using state-of-the-art computational algorithms and statistical frameworks can identify subtle patterns sifting through large amount of information in patient profiles, which can help with early detection and personalized treatment, improving their outcome. Within the scope of the Precision Medicine Initiative at the Rutgers Cancer Institute, for each patient we have troves of data on genetic mutations, and clinical and pathological reports. We are developing and implementing data science techniques to ask some fundamental questions - (i) can we distinguish aggressive tumor and treat patients accordingly? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop data science frameworks to perform integrative analysis of cancer genomics and clinical data.
Proficiency in computer programming (Java, C++, or python), statistical analysis software (R or SAS), familiarity with concepts such as linear models, model selection, support vector machines are preferred. Knowledge in Biology is not required.

Kinetic Hypergraphs for Online Social Networks

Project #: DC2018-05
Mentor: Mubbasir Kapadia, Computer Science

Abstract: In this project, we will develop graph-theoretic models of how information propagates in online social networks. In particular, we will use kinetic hypergraphs to model social networks at any scale, from interactions between individual users to large social communities.

Projects in Pure Math

Project #: DC2018-06
Mentor: Faculty of the Math Department, Mathematics

Abstract: The Math REU Coordinator is Prof. Anders Buch. Several projects will be run in 2018 by members of the pure math faculty. Past topics have included Schubert Calculus (Buch), Ricci flow(Song), Mathematical Physics and General Relativity (Soffer), and mathematical Yang-Mills theory (Woodward). Applicants interested in a project in pure mathematics should list this project in their application and indicate their specific Math interests; further details on the projects will be worked out in the Spring.

Approximate computing: An effective likelihood-free method with statistical guarantees

Project #: DC2018-07
Mentor: Minge Xie, Statistics and Biostatistics & Suzanne Thornton, Statistics and Biostatistics

Abstract: Approximate Bayesian computing (ABC) is a powerful likelihood-free method that has grown increasingly popular since early applications in population genetics. However, complications arise in the theoretical justification for Bayesian inference when using ABC with a non-sufficient summary statistic. In this project, we study a new computational technique called 'approximate confidence distribution computing' (ACC), that yields theoretical support for the use of non-sufficient summary statistics in likelihood-free methods. A general strategy of constructing a data-dependent prior will be explored, which can often increase the computing speed while ensuring the inferential integrity. We use numerical examples to illustrate the benefits of the ACC method, namely the potential for broader applications than ABC and the increased computing speed compared to ABC.

Psychological theory and statistical analysis for usable security

Project #: DC2018-08
Mentor: Janne Lindqvist, Electrical and Computer Engineering

Abstract: The Rutgers Human Computer Interaction and Security Engineering Lab is recruiting a summer intern through the DIMACS REU program. We will be recruiting a student to contribute to our research on the application of psychological theory to the study of user security. We will study the cognitive mechanisms responsible for password usability and the psychological principles driving security decisions. Our project will engage the summer intern in every step of the research process including research design, statistical analysis and provide the intern with a unique exposure to the application of psychology and statistics in human-computer interaction and usable security.

Iterative Data Complexes

Project #: DC2018-09
Mentor: James Abello, DIMACS

Abstract: One of the main goals of this project is to algorithmically explore data that can be represented as a collection of subsets from a specified universe of entities U. The exploration starts by finding a special maximal antichain of "data regions" from the input collection. This maximal antichain is obtained via an iterative edge decomposition proposed in [AQ13].
We want to address the following questions under the lenses of simplicial complexes.
a. What is the relation between the iterative edge decomposition found by the algorithm of [AQ13] and the concept lattice [AP06] associated with the "data regions" of the input collection?
b. Since each "layer value" obtained by the iterative edge decomposition is a lower bound on the layer connectivity, the k-connected components on each layer can be considered as "quasi-concepts". Under this view, what is the "semantic interpretation" of such quasi-concepts in the original data collection?
c. The Data Complexes used in this project can be embedded naturally in 3d space and a natural question is: what are the "natural" queries that will facilitate the navigation of this 3d embedding?
d. What are "intuitive" visual exploration primitives that will aid a user build a story that "summarizes" the input collection?
Required Background:
Since this project is a team effort, the following "collective team background" is necessary to ensure a positive outcome: Discrete Math, Algorithms, C/C++ and/or Java, JavaScript, Python, MapReduce-Hadoop foundations.
Team Deliverables
a. Extensions of our current pool of "Edge Decomposition based Graph Algorithms" to general Data Complexes.
b. Cloud based Implementations of the exploration algorithms
c. Semantic analysis of Hypergraph decomposition "layers" that are conducive to the creation of "data stories" that summarize the exploration algorithms findings.
d. Interface to drive user's exploration of very large data sets.
[AQ13] J. Abello, F. Quelroy, "Fixed Points of Degree Peeling", ASONAM 2013, Advances in Social Networks, IEEE/ACM International Conference, Niagara Falls, Canada, August 2013.
[AMS16] J. Abello, D. Mawhirter, Kevin Sun, "Taming a Graph Hairball: Local Exploration in a Global Context", Invited Chapter to appear in New Ideas in Business and Consumer Analytics, P. Moscato and N. Devries, Editors, Springer Verlag, 2016.
[AP06] J. Abello and A. Pogel, "Graph Partitions and Concept Lattices", In Discrete Methods in Epidemiology, vol. 70 of the AMS-DIMACS Series, Co-edited by J. Abello and G. Cormode, pp 115 - 138, 2006.
[AHK06] J. Abello, F. van Ham, and N. Krishnan, "Ask-GraphView-: A Large Scale Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006)
[ALLM13] L. Aronshtam, N. Linial, T. Luczak, and R. Meshulam, "Collapsibility and vanishing of top homology in random simplicial complexes," Discrete Comput. Geom., vol. 49, no. 2, pp. 317-334, 2013.
[LP14] N. Linial and Y. Peled, "On the phase transition in random simplicial complexes," ArXiv e-prints, Oct. 2014.

Toward the Design Optimization of Multi-Robot Systems

Project #: DC2018-10
Mentor: Jingjin Yu, Computer Science

Abstract: Multi-robot and multi-vehicle systems have begun to prove their utility in recent years, with the most prominent example being the automation of warehouse systems by Amazon (see www.amazonrobotics.com, www.youtube.com/watch?v=Fr6Rco5A9SM). Other examples include autonomous vehicular systems and autonomous farming robots (www.public.harvestai.com). A key factor in rendering multi-robot systems more efficient is the opimization of the design of both the system (e.g., a "road network" for the robots or vehicles) and the algorithms for routing the robots within the system. As an everyday example, recent years have seen the rise of traffic circles to alleviate congestions at road intersections. These structures make it unnecessary for cars to wait for traffic signals and at the same time allows the card to avoid delay from using stop signs due to the start and stop of vehicles. There is a deeper connection here: the design of the system critically impacts the computational complexity of routing robots/vehicle through the system [1].
In this REU project, the student will explore the interplay between the system design and routing algorithms in optimizing the throughput of multi-robot systems. Depending on the student's background and preference, theoretical or practical aspects (or both) of the problem may be explored. At the theoretical end, one will explore the structure induced by the problem and how it may be exploited. At the more experimental end, simulation and experimental studies (https://youtu.be/n9DOtgHHyGY and [2]) may be carried out to gain empirical understanding of the structural behavior of multi-robot systems.

[1] J. Yu. Intractability of Optimal Multi-Robot Path Planning on Planar Graphs . IEEE Robotics and Automation Letters
[2] J. Yu and Daniela Rus. An Effective Algorithmic Framework for Near Optimal Multi-Robot Path Planning. The 2015 International Symposium on Robotics Research (ISRR 2015). Video: https://youtu.be/eg3tKIv2eiU

How to Google without Revealing What You Are Searching For?

Project #: DC2018-11
Mentor: Salim El Rouayheb, Electrical and Computer Engineering

Abstract: We will try to devise algorithms that answer this question using the theory of Private Information Retrieval (PIR). Strong mathematical background required with passion to develop novel algorithms. We will be using tools from linear algebra, cryptography, information theory and coding theory.

DNA Packaging and Processing

Project #: DC2018-12
Mentor: Wilma Olson, Chemistry and Chemical Biology

Abstract: All of the processes necessary for the survival of a living system hinge on its ability to store and read the genetic information encoded in its DNA. The interactions of proteins with DNA play key roles in these processes. Many regulatory proteins bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1]. Other architectural proteins deform the DNA at isolated sites of contact while concomitantly wrapping the DNA on their surfaces [2]. Despite the astonishing amount of information now known about the sequential make-up and spatial organization of genomic DNA, there is much to learn about how proteins contribute to the three-dimensional architectures and workings of long, spatially constrained DNA molecules. Pieces of DNA anchored by protein-protein contacts constitute discrete topological units. As long as the ends of the DNA stay in place and the double helix remains unbroken, the linking number Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3,4]. As a step toward understanding the effects of specific proteins on DNA organization and processing, we have been studying the spatial arrangements of successive base pairs in experimentally characterized protein-DNA structures [5,6]. There are now sufficient data to examine the influence of sequence context, i.e., the effects of immediate neighbors, on the apparent motions of the base pairs and to develop simple functions that capture this information. The observed behavior can be incorporated in existing computational treatments to improve our understanding of systems in which DNA sequence and protein binding play important roles, e.g., the looping of DNA mediated by the Escherichia coli Lac repressor and HU proteins [7,8] and the communication between regulatory proteins on chromatin [9,10]. A summer project could be devoted to the development and collection of new geometric information, to application of these results in computational studies of DNA loops and folds, and/or in analysis of simulated and experimental data.

Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand and develop parameters to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and elastic treatments of the double helix.

[1] Adhya S 1989 Multipartite genetic control elements: communication by DNA loop. Ann Rev Genet 23:227-250
[2] Luijsterburg MS, White MF, van Driel R, Dame RT 2008 The major architects of chromatin: architectural proteins in bacteria, archaea and eukaryotes. Crit Rev Biochem Mol Biol 43:393-418
[3] Bauer WR 1978 Structure and reactions of closed duplex DNA. Ann Rev Biophys Bioeng 7:287-313
[4] Clauvelin N, Tobias I, Olson WK 2012 Characterization of the geometry and topology of DNA pictured as a discrete collection of atoms. J Chem Theor Comp 8:1092-1107
[5] Lu X-J, Olson WK 2003 3DNA: a software package for the analysis, rebuilding, and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res 31:5108-5121
[6] Olson WK, Gorin AA, Lu X-J, Hock LM, Zhurkin V B 1998 DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci USA 95:11163-11168
[7] Wei, J, Czapla L, Grosner MA, Swigon D, Olson WK 2014 DNA topology confers sequence specificity to non-specific architectural proteins. Proc Natl Acad Sci USA 111:16742-16747
[8] Perez PJ, Clauvelin N, Grosner MA, Colasanti AV, Olson WK 2014 What controls DNA looping? Int J Mol Sci 15:15090-15108
[9] Clauvelin, N, Lo P, Kulaeva OI, Nizovtseva EV, Dias J, Zola J, Parashar M, Studitsky VM, Olson WK 2015 Nucleosome positioning and composition modulate in silico chromatin flexibility. J Phys: Condens Matter 27:064112
[10] Nizovtseva EV, Todolli S, Olson WK, Studitsky VM 2017 Towards quantitative analysis of gene regulation by enhancers. Epigenomics 9:1219-1231

Estimation and Modeling with Tensors

Project #: DC2018-13
Mentor: Anand Sarwate, Electrical and Computer Engineering

Abstract: Tensors are multi-way data arrays, a generalization of matrices to more than two dimensions. This project is on the fundamentals of using tensors to represent data. This can be challenging because the number of dimensions can make the number of parameters, or unknowns, very large. However, in many cases the tensors have some simpler structure that reflects local correlations. On the one hand, structured tensors are less complicated and statistical questions might be answerable with less data. On the other hand, the structure may hinder standard computational approaches. This project will entail studying and implementing methods for statistical estimation tasks involving structured and unstructured tensor data, with potential applications to network analysis, neuroimaging, and machine learning.

Robust Convergence of Machine Learning Algorithms

Project #: DC2018-14
Mentor: Pranjal Awasthi, Computer Science

Abstract: Machine Learning models are often trained using simple algorithms such as gradient descent. However, theoretically the performance of these algorithms is not well understood. One would like to characterize the precise conditions under which a simple algorithm converges to an optimal or a near optimal solution for a given problem. In recent years, we have managed to gain a partial understanding of this question. For instance, we now know that data generated from a mixture of well separated Gaussians can be clustered using the popular k-means algorithm.
Similarly, low rank incoherent matrices can be recovered from a small number of random entries using gradient descent. A limitation of these results is that they make strong assumptions on the underlying data generation process.
Real data is often noisy and hence it is important to study the performance of these algorithms under noise.
The goal of this project is to develop techniques to analyze the robustness of popular machine learning algorithms. The focus will be on studying the k-means algorithm for clustering and gradient descent based algorithms for matrix completion. This project will have both theoretical and empirical components and is ideal for undergraduates interested in algorithms, linear algebra, and probability.

[1] Clustering Semi-Random Mixtures of Gaussians. Awasthi, Pranjal and Vijayaraghavan, Aravindan. arXiv preprint arXiv:1711.08841. 2017
[2] Matrix completion has no spurious local minimum. Ge, Rong and Lee, Jason D and Ma, Tengyu. NIPS 2016.

Better Graph Partitioning Algorithms: A Reinforcement Learning Approach

Project #: DC2018-15
Mentor: Pranjal Awasthi, Computer Science

Abstract: Graph partitioning is one of the fundamental problems in computer science. At a high level the goal of graph partitioning is to divide a graph into similar looking pieces. Various objective functions have been studied in this context. For instance, the max-cut objective asks for a partition of a graph into two pieces such that the number edges going across the two pieces is as large as possible. On the other hand, the sparsest cut objective asks for a balanced partition such that the number of edges going across is as small as possible. While optimizing these objective functions is NP-hard, a powerful class of algorithms called semi-definite programs provide the best-known approximation guarantees for these problems. Furthermore, it is also known that these algorithms are optimal assuming the unique games conjecture, a central open question in this area.
The goal of this project is to investigate if there do exist algorithms better than semi-definite programs. The approach we will take is to use ideas from recent advances in reinforcement learning algorithms that have managed to achieve super human performance for games such as Go, Atari and many more. The project will investigate whether an appropriately set up learning algorithm can automatically discover better strategies then semi-definite programs. This project is perfect for undergraduates who like to build systems and are interested in understanding the inner workings of deep reinforcement learning.

[1] Mastering the game of Go with deep neural networks and tree search. Silver David et al. Nature 2016.

Theoretical properties of SAT-solvers

Project #: DC2018-16
Mentor: Periklis Papakonstantinou, Management Science and Information Systems

Abstract: Within the last 20 years, we progressed from solving SAT on at most 200 variables to instances involving millions of variables and clauses, which made SAT-solvers ubiquitous optimization tools. SAT-solvers are today the standard tools for solving constraint satisfaction problems. A wealth of problems from VLSI design and verification, motion and planning, cryptanalysis, and various other areas are encoded as SAT formulas and solved with the help of competitive SAT-solvers. Besides the empirical analysis of SAT-solvers, their theoretical properties are poorly understood. In fact, there is an excellent opportunity for students new to research to explore a variety of mathematical questions ranging from "simple" to "tough". This project will end up being an exciting blend of combinatorics, computer science, and theoretical statistics.
The goal is to provide theoretical analysis for SAT-solvers that deal with Random-k-SAT instances. After we fix the number of clauses m and the number of variables n, then these input instances are sampled uniformly at random. Interestingly, it has been observed that, for 3-SAT, when the ratio of m/n is 4.267 then the sampled instances are computationally very hard to satisfy. We want to devise new SAT-solving algorithms and analyze their performance for this type of "hard instances".

Elucidating tumor evolutionary patterns using high-depth molecular data.

Project #: DC2018-17
Mentor: Hossein Khiabanian, Rutgers Cancer Institute of New Jersey

Abstract: Understanding complexity, dynamics, and stochastic patterns in genomic data - concepts native to physics and mathematics - is critical for elucidating how disease states originate and evolve. The main focus of this project is to use statistical, information theoretic approaches to dissect the cellular and molecular heterogeneity that enables cancer cells to evolve and resist current treatments. We use high-throughput DNA sequencing methods, and integrate genetic and clinical data to identify prognostic markers and actionable drivers of tumor evolution. Our principal hypothesis is that drug resistance emerges from changing tumor evolutionary patterns. To test this, we will mine the large cohorts of cancer patients being treated under various precision medicine programs, and aim to identify markers of Darwinian selection using longitudinal data. Our ultimate goal is to develop novel statistical platforms for fast translation of genomic data into clinical practice.

A brain-controlled robotic system

Project #: DC2018-18
Mentor: Konstantinos Michmizos, Computer Science

Abstract: Can you imagine controlling a robot with your thoughts? We can. There are 5 areas of knowledge required for a successful presence in our lab - computer science, neuroscience, robotics, mathematics and machine learning. In doing this project, you are expected to get a glimpse in at least three of these areas. You will learn how to record brain activity using one of the most advanced EEG systems in the world, the 128-electrode Biosemi ActiveOne. You will also learn how to train a long short-term memory neural network (LSTM) model to control a robotic hand we have developed in the lab. Overall, your work will help us develop a brain-controlled robotic system that is naturally extensible to a variety of applications including prosthetics and rehabilitation robots.

Sphere Packings and Number Theory

Project #: DC2018-19
Mentor: Alex Kontorovich, Mathematics

Abstract: Participants will study automorphism groups of quadratic forms and Vinberg's theory of maximal reflection subgroups of such. These will be applied to study both the theoretical and computational aspects of crystallographic sphere packings, and their number theoretic properties. A goal of the summer is to generate a freely available database surveying all known examples in the literature, together with many new such packings. No prior knowledge of these topics is assumed.
References: A. Kontorovich and K. Nakamura, "Geometry and Arithmetic of Crystallographic Sphere Packings", https://arxiv.org/abs/1712.00147

Return to REU Home Page
DIMACS Home Page
DIMACS Contact Information
Page last modified on January 19, 2018.