2019 DIMACS REU Proposed Project Descriptions

Please revisit this page frequently, since additional projects may be posted through January. Your application, including your preference for the offered projects, can be updated by using the link received after your registration.

*Note that you must be a U.S. Citizen or Permanent Resident to be eligible for these projects.

Automating Swarming Strategies: Foundations and Algorithms

Project #: DC2019-01
Mentor: Jingjin Yu, Computer Science

Abstract: A large number of autonomous agents (e.g., humans, robots, or a mixture of these) can carry out swarming strategies that are impossible for a single agent or a few agents to realize. For example, we may want to deploy autonomous robots to guard one or multiple regions for various reasons including preventing other malicious agents from entering these regions (e.g., intrusion detection), or the spreading of adverse events (e.g., fire, disease, and so on). For an effective deployment, the robots must be assigned to locations that "best" cover the target regions based on some objectives. Swarming scenarios, which include but are not limited to the ones described above, raise many intriguing (and challenging) theoretical and algorithmic questions. The REU project will carry out fundamental studies to characterize the structures induced by these problems and design optimal algorithms for solving them. Ideally, the REU student would be familiar with standard algorithms (e.g., perhaps the first chapters of CLRS) and is comfortable with mathematical reasoning (i.e., reading/writing proofs). For some sample work from my group, please see https://arc.cs.rutgers.edu/pub.html. Please feel free to contact me for questions at jingjin.yu@cs.rutgers.edu (in which case, please attach a current transcript).

Forecasting Vehicle Fleet Mix and Life Cycle Cost for Tunnel Ventilation Systems

Project #: DC2019-02
Mentor: Weihong 'Grace' Guo, Industrial and Systems Engineering

Abstract: Owners of tunnel ventilation systems are often faced with forecasting life cycle costs for their assets, which include but are not limited to operating and replacement costs over long time-horizons such as 30 years or 50 years. Within such a time horizon, vehicle emission technology changes with the continuing growth of zero- or low-emission and hybrid vehicles and the declining growth of combustion vehicles fleet in the mix that use tunnels. As the vehicle emission technology advances rapidly, the vehicle fleet is constantly being renewed. Realistic and practical forecasting of life cycle cost of tunnel ventilation systems must include models that incorporate the ever-changing fleet mix of zero emission, hybrid and combustion vehicles. This study aims to provide tunnel ventilation systems owners and the Port Authority with a vehicle fleet mix forecast model that will (1) enable realistic and practical long-term forecasts of the life cycle costs of tunnel ventilation systems, and (2) assist in optimizing the performance of tunnel ventilation systems during normal operation and emergency incidents. The vehicle fleet mix forecast model will be developed by integrating traditional time series forecasting methods with vehicle emission forecasting. Multiple forecast models will be developed for different scenarios such as evolving transition, internal combustion engine ban, and renewables push. The life cycle cost, including operating and replacement costs, will be analyzed considering the ever-changing vehicle fleet mix.

Requirements: time series analysis, forecasting, R

Characterizing the Quality of 3D-Printed Parts using Spatiotemporal Data Analytics

Project #: DC2019-03
Mentor: Weihong 'Grace' Guo, Industrial and Systems Engineering

Abstract: Additive manufacturing, or three-dimensional (3D) printing, is a promising technology that enables the direct fabrication of complex shapes and eliminates the waste associated with traditional manufacturing. A major issue that adversely affects its performance, and hence wider adoption, is that material solidification in the printing process yields geometric shape deviation. This study focuses on the analysis of 3D surface measurements of 3D-printed parts. Dimension, contour, and surface roughness are measured and represented in image data. We are developing methods for extracting spatiotemporal patterns from the measurement images and then correlating the patterns with part quality. The student will have access to the 3D printers in Rutgers ISE and Makerspace, as well as the Keyence 3D measurement system in Dr. Guo's lab. Familiarity with R is a must.

Requirements: statistics, data analytics, R

The Minimum Circuit Size Problem: A possible NP-Intermediate Problem

Project #: DC2019-04
Mentor: Eric Allender, Computer Science

Abstract: The Minimum Circuit Size Problem (MCSP) is a computational problem in NP that is widely believed to be hard to compute, although many researchers suspect that it is NOT NP-complete. However, there is currently only weak evidence to support this suspicion. This REU project will endeavor to find stronger evidence, of the form "If MCSP is NP-complete, then something weird happens". Another possible direction involves clarifying the relative complexity of MCSP and other candidate NP-Intermediate problems. The PI has recently proved some new lower bounds on the complexity of MCSP and related problems, and it should be possible to build on this recent progress. The PI has previously supervised research with REU students on this topic which has led to publications, and there is more work to be done, which should be accessible to an REU student.

- Eric Allender, Rahul Ilango, Neekon Vafa: The Non-Hardness of Approximating Circuit Size, ECCC Report TR18-173, 2018.
- Eric Allender, Joshua Grochow, Dieter van Melkebeek, Cristopher Moore, and Andrew Morgan: Minimum Circuit Size, Graph Isomorphism and Related Problems, SIAM Journal on Computing, Vol. 47(4), 2018, 1339-1372.
- Eric Allender and Shuichi Hirahara: New Insights on the (Non-)Hardness of Circuit Minimization and Related Problems, MFCS '17, pp. 54:1--54:14.
- Eric Allender, Dhiraj Holden, and Valentine Kabanets: The Minimum Oracle Circuit Size Problem. Computational Complexity 26 (2017) 469-496.
- Eric Allender and Bireswar Das: Zero Knowledge and Circuit Minimization. Information and Computation, 256 (2017) 2-8.
- Cody D. Murray and Ryan Williams: On the (Non) NP-Hardness of Computing Circuit Complexity, Theory of Computing, 13 (2017) 1-22.
- Michael Rudow: Discrete Logarithm and Minimum Circuit Size, Information Processing Letters 128 (2017) 1-4.

The depth-irreducibility hypothesis

Project #: DC2019-05
Mentor: Periklis Papakonstantinou, Management Science and Information Systems

Abstract: Circuit complexity theory has gained the reputation of one of the most challenging parts of the theory of computing. Perhaps one reason for this negative publicity is because researchers aimed in attacking the big questions right from the beginning (also note, that 30-50 years ago it was not clear how difficult these questions were). The proposed project reduces one such original question into a potentially simpler and tangible goal. The project aims to understand the following phenomenon: given a circuit of size s(n)=polynomial(n) and depth d(n) when we compress the depth to something more-than-constant smaller, i.e. o(d(n)) (e.g. reduce to depth d'(n)=d(n)/logn) then in order to do the same computation as the original circuit the size must blow up to not only more than polynomial but also enough more than that. In other words, this means that if we want to turn a parallel algorithm into a super-parallel one, we will have to use a very big number of parallel processors. This is a fundamental and quite intriguing question. However, in this project, we will only attack this in a restricted model of computation. This means instead of general circuits we will consider a (hopefully) much simpler goal where the circuits cannot compute in an arbitrary fashion.


Theoretical properties of SAT-solvers

Project #: DC2019-06
Mentor: Periklis Papakonstantinou, Management Science and Information Systems

Abstract: Within the last 20 years, we progressed from solving SAT on at most 200 variables to instances involving millions of variables and clauses, which made SAT-solvers ubiquitous optimization tools. SAT-solvers are today the standard tools for solving constraint satisfaction problems. A wealth of problems from VLSI design and verification, motion and planning, cryptanalysis, and various other areas are encoded as SAT formulas and solved with the help of competitive SAT-solvers. Besides the empirical analysis of SAT-solvers, their theoretical properties are poorly understood. In fact, this is an excellent opportunity for students new to research to explore a variety of mathematical questions ranging from "simple" to "tough". This project will end up being an exciting blend of combinatorics, computer science, and theoretical statistics. The goal is to provide theoretical analysis for SAT-solvers that deal with Random-2-SAT instances. After we fix the number of clauses m and the number of variables n, then these input instances are sampled uniformly at random. Interestingly, it has been observed that, for 2-SAT, when the ratio of m/n is 1 then the sampled instances are computationally harder to satisfy (although 2-SAT is a computationally easy problem, we will use the knowledge from this project to build up machinery for attacking the more difficult question about 3-SAT). We want to devise new SAT-solving algorithms and analyze their performance for this type of "hard instances".

Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes

Project #: DC2019-07
Mentor: Wilma Olson, Chemistry and Chemical Biology

Abstract: Despite the astonishing amount of information now known about the sequential makeup and spatial organization of many genomes, there is much to learn about how the vast numbers of DNA base pairs in these systems contribute to their three-dimensional architectures and workings. The chemical structure of DNA contains information that dictates the observed spatial arrangements of base pairs within the double helix and the susceptibilities of different base sequences to interactions with proteins and other molecules. Our group has been developing software to analyze the spatial and energetic codes within known DNA structures and using these data in computational studies of larger nucleic acid systems (1-3). By representing the DNA bases as sets of geometric objects, we can capture the natural deformability of DNA as well as the structural details of ligand-bound interactions (4). There are now sufficient data to examine the influence of sequence context, i.e., the effects of immediate neighbors, on the apparent motions of the bases and to develop simple functions that capture this information. The observed behavior can be incorporated in computational treatments to improve our understanding of macromolecular systems in which DNA sequence and protein binding play important roles. For example, DNA elements that regulate gene expression often lie far apart along genomic sequences but come close together during genetic processing. The intervening residues form loops, mediated by proteins, which bind at the sequentially distant sites. We have developed methods that take account of both DNA and protein in simulations of loop formation and successfully captured experimental observations (2,3). Our recent studies of long-range communication between regulatory elements on short nucleosome-decorated DNA (5,6) provide a basis for linking the 'local' arrangements of nucleosomes in these systems to higher-order macromolecular structures, such as the looped architectures detected within chromosomes (7). We are developing new algorithms to characterize the spatial arrangements of nucleosomes and incorporating the collective information in nucleosome-level depictions of chromosomal DNA (8,9). The build-up of structural components in our treatment of DNA and chromatin offers new insight into the stepwise processes that underlie genome organization and processing.

1. Olson, W.K., A.A. Gorin, X.-J. Lu, L.M. Hock and V.B. Zhurkin. 1998. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl. Acad. Sci., USA 95:11163-11168.
2. Czapla, L., D. Swigon and W.K. Olson. 2006. Sequence-dependent effects in the cyclization of short DNA. J. Chem. Theor. Comp. 2:685-695.
3. Wei, J., L. Czapla, M.A. Grosner, D. Swigon and W.K. Olson. 2014. DNA topology confers sequence specificity to nonspecific architectural proteins. Proc. Natl. Acad. Sci. USA 111:16742-16747.
4. Tolstorukov, M.Y., A.V. Colasanti, D. McCandlish, W.K. Olson and V.B. Zhurkin. 2007. A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning. J. Mol. Biol. 371:725-738
5. Kulaeva, O.I., G. Zheng, Y.S. Polikanov, A.V. Colasanti, N. Clauvelin, S. Mukhopadhyay, A.M. Sengupta, V.M. Studitsky and W.K. Olson. 2012. Internucleosomal interactions mediated by histone tails allow distant communication in chromatin. J. Biol. Chem. 287:20248-20257.
6. Nizovtseva, E.V., N. Clauvelin, S. Todolli, Y.S. Polikanov, O.I. Kulaeva, S. Wengrzynek, W.K. Olson and V.M. Studitsky. 2017. Nucleosome-free DNA regions differentially affect distant communication in chromatin. Nucleic Acids Res 45:3059-3067.
7. Rao, S.S.P., M.H. Huntley, N.C. Durand, E.K. Stamenova, I.D. Bochkov, J.T. Robinson, A.L. Sanborn, I. Machol, A.D. Omer, E.S. Lander and E.L. Aiden. 2014. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665-1680.
8. Clauvelin, N., P. Lo, O.I. Kulaeva, E.V. Nizovtseva, J. Dias, J. Zola, M. Parashar, V. Studitsky and W.K. Olson. 2015. Nucleosome positioning and composition modulate in silico chromatin flexibility. J. Phys.: Condens. Matter 27:064112.
9. Todolli, S., P.J. Perez, N. Clauvelin and W.K. Olson. 2017. Contributions of sequence to the higher-order structures of DNA. Biophys J. 112:416-426.

Genomic data-guided mathematical modeling of cancer

Project #: DC2019-08
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: Cancer is the second leading cause of death in adults, claims the lives of more than half a million Americans every year. Tumor starts from a single rogue cell in the body, which becomes defective, and the clones of that initial cell start to grow (or die out), and acquire new mutations depending on certain parameters stochastically; and by the time of detection the tumor has ~109 cells. During treatment, the patients initially responds and the tumor shrinks, but remaining cells inevitably develop mutations that give them drug resistance, and the tumor starts to grow again until the patient dies. It is becoming apparent that computational algorithms and mathematical frameworks can identify subtle patterns in patient profiles, which can help with early detection and personalized treatment, improving their outcome. For instance, we are developing mathematical frameworks to model tumor development and emergence of resistance against cancer drugs, using Galton Watson branching process - an area of stochastic processes, which also has applications in diverse areas (e.g. option pricing, predicting viral outbreak, news propagation on social networks). We are combining mathematical models with clinical data from cancer patients to ask some fundamental questions: (i) how long does it take a tumor to grow, and can we detect cancer early? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop mathematical frameworks to model tumor growth using branching process.

Proficiency in computer programming (Java, C++, or python), familiarity with probability theory, concepts of stochastic processes, and optimization techniques are required. Knowledge in Biology is not required.

Data-driven precision medicine approaches to cancer

Project #: DC2019-09
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: One in four is expected to develop cancer in their lifetime. No twopatients are alike, and we need better strategies to detect cancer early,find right treatment for the patients, and monitor their outcome using apersonalized medicine approach. Datascience approaches using state-of-the-art computational algorithms andstatistical frameworks can identify subtle patterns sifting through largeamount of information in patient profiles, which can help with earlydetection and personalized treatment, improvingtheir outcome. Within the scope of the Precision Medicine Initiative atthe Rutgers Cancer Institute, for each patient we have troves of data ongenetic mutations, and clinical and pathological reports. We aredeveloping and implementing data science techniquesto ask some fundamental questions: (i) can we distinguish aggressivetumor and treat patients accordingly? (ii) can we identify emergence ofresistance very early and accordingly change the treatment strategy? Theproject will aim to develop data scienceframeworks to perform integrative analysis of cancer genomics andclinical data.

Proficiency in computer programming (Java, C++, orpython), statistical analysis software (R or SAS), familiarity withconcepts such as linear models, model selection, supportvector machines are preferred. Knowledge in Biology is not required.

Statistical inference of mutational patterns in precision-oncology data

Project #: DC2019-10
Mentor: Hossein Khiabanian, Rutgers Cancer Institute of New Jersey

Abstract: Understanding complexity, dynamics, and stochastic patterns in genomic data - concepts native to physics and mathematics - is critical for elucidating how disease states originate and evolve. The main focus of this project is to use statistical, information theoretic approaches to dissect the cellular and molecular heterogeneity that enables cancer cells to evolve and resist current treatments. We use high-throughput DNA sequencing methods, and integrate genetic and clinical data to identify prognostic markers and actionable drivers of tumor evolution. Our principal hypothesis is that drug resistance emerges from changing tumor evolutionary patterns. To test this, we will mine the large cohorts of cancer patients being treated under various precision medicine programs, and aim to identify markers of Darwinian selection using longitudinal data. Our ultimate goal is to develop novel statistical platforms for fast translation of genomic data into clinical practice.

Proficiency in programming and statistical analysis are essential; knowledge of genomics and biology is not required but highly valued.

A brain-controlled robotic system

Project #: DC2019-11
Mentor: Konstantinos Michmizos, Computer Science

Abstract: Can you imagine controlling a robot with your thoughts? We can. There are 5 areas of knowledge required for a successful presence in our lab - computer science, neuroscience, robotics, mathematics and machine learning. In doing this project, you are expected to get a glimpse in at least three of these areas. You will learn how to record brain activity using one of the most advanced EEG systems in the world, the 128-electrode Biosemi ActiveOne. You will also learn how to train a long short-term memory neural network (LSTM) model to control a robotic hand we have developed in the lab. Overall, your work will help us develop a brain-controlled robotic system that is naturally extensible to a variety of applications including prosthetics and rehabilitation robots.

Projects in Pure Math

Project #: DC2019-12
Mentor: Faculty of the Math Department, Mathematics

Abstract: The Math REU Coordinator is Prof. Anders Buch. Several projects will be run in 2019 by members of the pure math faculty. Past topics have included Schubert Calculus (Buch), sphere packings (Kontorovich), Lie algebras (Lepowsky), mathematical Yang-Mills theory (Woodward), etc. Applicants interested in a project in pure mathematics should list this project in their application and indicate their specific Math interests; further details on the projects will be worked out in the Spring.

Statistical Inference

Project #: DC2019-13
Mentor: John Kolassa, Statistics and Biostatistics

Abstract: The statistical literature has recently seen controversy over the appropriate reporting of frequentist statistical inference in the biomedical literature. In particular, various authors have argued that confidence intervals are preferred to p values for reporting the strength of statistical evidence regarding the effectiveness of a treatment. We propose to investigate this controversy by examining a sample of recent FDA new drug applications featuring multiple pivotal trials, and comparing the stability of confidence intervals to the stability of p-values.

Statistical or psychological theories for user security

Project #: DC2019-14
Mentor: Janne Lindqvist, Electrical and Computer Engineering

Abstract: The Rutgers Human-Computer Interaction and Security Engineering Lab is recruiting a summer intern through the DIMACS REU program. We will be recruiting a student to contribute to our research on the application of statistical theory or psychological theory to the study of user security. For example, we will study the cognitive mechanisms responsible for password usability and the psychological principles driving security decisions. Our project will engage the summer intern in every step of the research process including research design, statistical analysis and provide the intern with a unique exposure to the application of psychology and statistics in human-computer interaction and usable security. Projects may also include Bayesian data analysis.

Data Stories

Project #: DC2019-15
Mentor: James Abello, Computer Science

Abstract: One of the central goals of this project is to imagine ways to free data from thesomewhat necessary but "boring" sidebar graphs so that data can interact and expressitself with confidence to human interpreters, i.e. providing artifacts that facilitate thecreation of "data driven stories".
What makes a story truly data-driven is when the data is the central fuel that drivesthe narrative. Data can help "narrate" many types of stories. In this project we willinitiate the development of a framework to help identify and generate different typesof "data stories" similar in spirit to Christopher Booker's seven basic story plots [1,2]. One important project phase will focus on the analysis of a variety of "datastories", in order to identify suitable "story modes", and corresponding "data typesand characteristics".
The "story modes" include:
a. Narrations of change over time,
b. Starting big (small) and drilling down (zooming out),
c. Contrasting Highlights,
d. Exploration of questions resulting when two divergent lines of data intersectand one overtakes the other,
e. Dissection of factors that contribute to form the big picture, and
f. Outliers profiling.
The "data types" may include numbers, characters, strings, sets, lists, trees, posets,lattices, graphs, hypergraphs, and simplicial complexes. Depending on the storycontext, each of these "data types" may possess its own intrinsic characteristics likevalue, visual representability, variety, volume, velocity, volatility, veracity, andprovenance. Data types susceptibility to change and their ability to "interact" withother data types may be determinant factors of their "story roles".

Note: This project is part of a larger initiative to build data stories initially funded by the NationalScience Foundation.

Qualifications: Python, Java Programming, JavaScript Libraries for Visualization likeD3.js [3], Classes on Discrete Math, Graph Theory, Algorithms [4], Statistics, andProbability. Some knowledge of low dimensional topology may be helpful but is notnecessary. Excellent command of the English language.

[1] http://mediashift.org/2015/06/exploring-the-7-different-types-of-data-stories/
[2] Christopher Booker, The Seven Basic Plots: Why we tell stories, Continuum, 2004
[3] Effective Visualization of Multi-Dimensional Data - A Hands-on Approach, DipanjanSarkar
[4] Algorithms, S. Dasgupta, C. Papadimitriou, U. Vazirani, McGraw Hill, 2008

Protecting Vulnerable Communities from Targeted Violence and Mass Casualty Attacks

Project #: DC2019-16
Mentor: Ava Majlesi, Eagleton Institute of Politics & Sassi Rajput, School of Public Affairs and Administration

Abstract: The overarching goal of the Miller Center for Community Protection and Resilience (Miller Center) and its partners is to implement programs and projects that protect vulnerable populations by identifying and disseminating best practices, offering police-community workshops, consulting with and assisting vulnerable populations on security and civil liberties issues, and engaging in research relevant to the protection of vulnerable populations.
The need for the establishment of a Center for Community Protection and Resilience ("CPR") dedicated to the protection of vulnerable populations is a direct outgrowth of Rutgers' work over the past several years on the Faith-Based Communities Security Program. That program, which was launched in May 2014 in the wake of a lethal terrorist attack on the Jewish Museum in Brussels, was founded in recognition of a rising tide of anti-Semitism in Europe and America and of intolerance generally. The reality that almost a billion people now live in countries where they were not born, coupled with the continued struggles of historic minority populations, mean that we live as never before in a world of vulnerable populations. Events since the inception of the program - terrorist attacks in public settings such as stadiums, cafes, subway stations and airports, desecration of religiously affiliated buildings, schools and homes, and mass killings in churches, mosques, temples, and synagogues - have, if anything, served to underscore both the vulnerabilities of certain populations and the growing levels of violence - verging, in some cases, on outright genocide -- directed at them.
The Center is in the process of developing an online guide of best practices to better protect vulnerable communities in the United States from mass casualty attacks and targeted violence. Student researchers will assist in updating this guide by employing quantitative analysis to assess the levels of preparedness and resilience demonstrated by houses of worship and vulnerable communities. Students will analyze social media and other components to further understand the relationships and networks between vulnerable communities, law enforcement, the public and the private sector in protecting communities and preventing incidents.

Provably Efficient Graph Algorithms

Project #: DC2019-17
Mentor: Aaron Bernstein, Computer Science

Abstract: This project in theoretical computer science will focus on designing more efficient algorithms for graph problems. The focus will be on what are known as dynamic graph algorithms, which maintain information in a graph that is changing over time. For example, consider the classic problem of computing strongly connected components (SCC) in a graph. This problem is not hard to solve in O(|E|) time if the graph is fixed. One of the directions we would look at in this project is maintaining SCC when edges are being added and removed from the graph. The simplest approach would be to simply recompute the SCC from scratch every time the graph changes - this would require O(|E|) time every time an edge changes. But intuitively, the change of a single edge should only have a minor effect on the SCCs of the graph, so the algorithm should be able to handle an edge change more efficiently than recomputing from scratch. In addition to SCCs, we will also consider several other graph problems in the dynamic setting.
Although dynamic graph algorithms are motivated by the evolving nature of many real-world graphs, this REU project will be purely theoretical. There will be no implementation component. The project will focus on proving new structural properties of graphs, and then using those to design new algorithms with provably efficient bounds. This project is suitable for students that have some familiarity with graph algorithms (BFS, DFS, connected components, topological sort, shortest paths, minimum spanning tree), but most importantly for students that are comfortable reading/writing complex proofs. If you would like a better sense of the material, the paper below is an example of a paper that we might read at the beginning of the project, and then seek to improve upon.


Urns & Balls in Communications

Project #: DC2019-18
Mentor: Emina Soljanin, Electrical and Computer Engineering

Abstract: We have several research problems concerning covert, anonymous, and robust communications that can be represented as urns and balls models. Urns and balls models refer to basic probabilistic experiments in which balls are thrown randomly into urns, and we are interested in various patterns of urn occupancy (e.g., the number of empty urns). These models are central in many disciplines such as combinatorics statistics, analysis of algorithms, and statistical physics. Some modern network communications scenarios give rise to problems that are related to the classical urns and balls questions. Some new models and problems emerge as well because information packets can be processed (e.g., by using finite field arithmetic) in a way their physical counterparts, urns and balls, cannot.

Mathematical Problems in Relativistic Physics: Classical and Quantum

Project #: DC2019-19
Mentor: Shadi Tahvildar-Zadeh, Mathematics

Abstract: Our goal is to discover and analyze the mathematical structures that lie at the foundation of physical phenomena. Specific project assigned will be tailor-made to the student's interests and abilities.

Random walks with 'geodesic bias'

Project #: DC2019-20
Mentor: Bhargav Narayanan, Mathematics

Abstract: A random walker starts at some vertex on a graph and walks randomly until she gets to some target vertex T. Heuristically, one might hope that by "exciting" some vertices, the walker gets to T faster; by exciting a vertex, we mean introducing some bias in the random walk at that vertex in favor of moving closer towards to T (along a geodesic, or shortest path, to T).
Unfortunately, there are examples where exciting vertices can actually slow the random walker down! However, all these examples only have "mild" slow-downs, i.e., the random walker is slower by at most some polynomial in the number of vertices. Can such excitations cause exponential slow-downs? Or super-polynomial slow-downs? These and other questions remain unsolved.

Geometry and Combinatorics of Matroids

Project #: DC2019-21
Mentor: Nicola Tarasca, Mathematics

Abstract: The theory of matroids provides a powerful convergence of ideas from linear algebra and graph theory. Starting from simple ideas on configurations of vectors and combinatorics of graphs, one gradually adds more abstract constructions resulting in interesting structures. During the learning process, students are introduced to several concepts from abstract algebra, such as Chow rings and Hodge theory.
Recently, a series of long-standing conjectures on invariants of matroids have been proved using ideas from algebraic geometry. While most of the recent literature focuses on classical matroids, little attention has been given to the study of Coxeter matroids. Coxeter matroids were introduced by Gelfand and Serganova and provide a natural generalization. The idea is to replace the root system of the special linear group governing the theory of classical matroids with more general root systems. It is interesting to ask to what extend recent results on invariants of classical matroids generalize to Coxeter matroids. Computing explicit examples requires only a light background in mathematics and will serve as a gateway towards classical and modern theories. The objective will be to perform explicit computations of invariants and collect results for a large class of Coxeter matroids.

Investigating Sea Level Rise and Variability at Tide-Gauge Stations using Supervised Machine Learning

Project #: DC2019-22
Mentor: Robert Kopp, Department of Earth and Planetary Sciences & Daniel Gilford, Department of Earth and Planetary Sciences

Abstract: Sea level rise is expected to impact millions of people in coastal communities in the coming centuries. Tide-gauge stations (such as Boston: https://tidesandcurrents.noaa.gov/stationhome.html?id=8443970) have made historical observations of sea level change which are sparsely distributed in space, but often have long (sometimes longer than a century) records and good temporal coverage. An understanding of the variability and long-term trends of these sites provides critical context in which projections of future sea level rise and coastal flood risks can be considered. Sea level variability at each tide gauge may be decomposed into seasonal, long-term, and interannual components using a supervised machine learning technique known as Gaussian Process (GP) modeling. GP modeling is a non-parametric form of statistical modeling which maps inputs (e.g. time) to target outputs (e.g. sea levels).
In this REU project, the student will use GP modeling to fit a temporal regression over historical observations from tide gauge stations, decomposing each site's variability signal into its individual temporal properties. An example of this technique as applied to atmospheric carbon dioxide concentrations using Python's scikit-learn package can be found at: https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_co2.html. These decomposed signals will be used as baseline for predictions of future sea level rise and flood risks (e.g. changes in flood return periods).

Requirements: Proficiency in programming (preferably Python, but MATLAB also would work) and statistical analysis. Familiarity with earth sciences or sea level science is a plus, but not required.

Drone and Metal Detection at Stadiums

Project #: DC2019-23
Mentor: Christie Nelson, Masters of Business and Science in Analytics and CCICADA

Abstract: When utilizing metal detectors at a large venue such as a sports stadium, there are the competing objectives of accuracy of the patron screening and the speed of throughput. This research, carried out in collaboration with large sports venues, analyzes the patron screening method of walk-through metal detectors ("walk-throughs"). Part of this project may involve on-site observations at stadium events.
Experimental design is the focus of the project, helping large venues better understand the performance of walk-throughs in real outdoor settings. Because of the number of experimental factors to be considered (type of item, location and height of object, orientation, speed of object passing through the machine, walk-through security setting, etc.), designing experiments require a sophisticated design tool called combinatorial experimental design. Experimental designs can focus on various questions. In addition, the project may also include experimental design for a drone detection technology, and field testing for efficacy. Open to students in: statistics, math, computer science, criminal justice and related areas, or those with experimental design experience.

Cybersecurity Labor Analysis for Law Enforcement

Project #: DC2019-24
Mentor: Christie Nelson, Masters of Business and Science in Analytics and CCICADA

Abstract: The field of Cybersecurity is rapidly changing, with many certification options and career paths emerging. This project will be in collaboration with the Federal Law Enforcement Training Center, and will focus on analyzing data relating to cybersecurity jobs, certifications, and career paths. The students will also get experience with visualization and data analysis. Open to students in: data science, statistics, math, cybersecurity, criminal justice, and related areas

Return to REU Home Page
DIMACS Home Page
DIMACS Contact Information
Page last modified on January 10, 2019.