Proposed projects for the 2008 DIMACS and DIMACS/DIMATIA REU Programs

Project #: DDD2008-01

Mentor: Wilma Olson, olson at, Chemistry

Project title: Protein-induced DNA Looping

Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.

[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.

[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.

[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.

[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock, L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability Deduced from Protein-DNA Crystal Complexes,'' Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.

[5] Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A. R., and Schneider, B., ``The Nucleic Acid Database: A Comprehensive Relational Database of Three-dimensional Structures of Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.

[6] Westcott, T. P., Tobias, I., and Olson, W. K., ``Modeling Self-contact Forces in the Elastic Theory of DNA Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.

[7] Muller-Hill, B., The Lac Operon, Walter de Gruyter, Berlin, 1996.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-02

Mentors: Ming Xie, mxie at, Department of Statistics and E.A. Elsayed, elsayed at, Department of Electrical Engineering

Project title: Optimization of Sequencing and Threshold Levels of Detection Systems

The candidate will provide assistance to an on-going project that considers a problem of container inspection at the port-of-entry. Containers are inspected at inspection stations, and data generated from different analytical methods, x-rays detectors, gamma-rays detectors and others sensors used for the detection of chemical, biological, radiological, nuclear, and explosive (CBRNE) agents are often relied upon to make critical decisions with regard to the nature of the containers and the appropriate response mechanism. The process of designing an efficient and accurate detection system must be deliberate and well thought-out. The threshold levels of sensors at the inspection stations might result in accepting undesired containers or subjecting "good" containers to unnecessary additional inspections. In this project we investigate approaches for determining the optimum arrangements of sensors and their corresponding threshold levels while considering potential measurement errors and cost and other constraints. Efficient approaches for investigating inspection systems with a large number of sensors will be investigated. The theoretical contributions of this research will be transferred via computational algorithms for practical applications.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-03

Mentor: Nina Fefferman, , Tufts University, DIMACS

Project title: Duration of infectivity and disease in dynamic networks

This project will focus on examining the effect of relative time scale in shifting network structure and disease transmission patterns. Starting from empirical, computational experimentation, we will try to build a theoretical understanding of how relative durations of social and disease processes interact to shape epidemics. Very basic knowledge of graph theory and/or any programming language would be useful, but aren't strictly necessary. (Ideally, this project will be done in parallel with project DDD2008-04.)
*You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-04

Mentor: Nina Fefferman, , Tufts University, DIMACS

Project title: Long- vs. Short-term friendships and the spread of disease

This project will focus on examining the effect of varying the percentages of long- vs. short-term social contacts on patterns of disease spread in a population over time. Starting from empirical, computational experimentation, we will try to build a theoretical understanding of how varying the percentage of each duration of friendship among social contacts over time will affect disease dynamics. Very basic knowledge of graph theory and/or any programming language would be useful, but aren't strictly necessary. (Ideally, this project will be done in parallel with project DDD2008-03.)
*You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-05

Mentor: Nina Fefferman, , Tufts University, DIMACS

Project title: Entropy and DNA

DNA is encoded information. Entropy is a mathematical measure from information theory that quantifies the amount of information in a given signal. Some papers from the 90's looked at the relative entropy of coding vs. non-coding regions of the genome. In this project, we would explore a few related alternative hypotheses. This project will rely heavily on computational analysis of data, rather than mathematical theory. For this project, applicants should have at least a basic knowledge of computer programming.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-06

Mentor: William Pottenger, billp at , DIMACS

Project title:

Traditional machine learning approaches make the assumption that instances are independent and identically distributed (IID). We term models constructed under the IID assumption first-order because in general they only leverage relationships between attributes within instances (e.g., co-occurrence relationships). Thus classification of a single instance (of previously unseen data) is possible because no additional context is needed to infer class membership. Such a context-free approach, however, does not exploit valuable information about relationships between instances in the dataset.

In our research we are developing a novel Bayesian framework for text classification named Higher-order Naive Bayes. Unlike approaches that assume instances are IID, Higher-order Naïve Bayes leverages implicit co-occurrence relationships between attributes in different instances. We term these implicit co-occurrence relationships higher-order paths. Attributes ( e.g., words in documents in text collections) are richly connected by such higher-order paths, and the generative model built by Higher-order Naïve Bayes exploits this rich connectivity pattern.

In order to demonstrate the utility of this approach we vary the sparsity of the input ( i.e., labeled training data) in our experiments. Sparsity is defined in two ways: first, in terms of the word dictionary size, which is ordered by increasing document frequency and second in terms of the size of the input training set. Our results on several textual datasets show that when training data is sparse ( e.g., small training set and/or words with poor parameter estimates), Higher-order Naïve Bayes significantly reduces the generalization error by leveraging higher-order paths. Additionally we argue that higher-order path statistics have potential for use in assessing the quality of the training set; i.e., whether the training set is representative of the concept to be learned. Our ongoing research builds on these results.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-07

Mentor: Daniel Cranston, , DIMACS

Project title: Efficient Algorithms for Graph Coloring

Graph coloring problems model partitioning a set of objects into as few subsets as possible subject to various constraints. These problems have direct applications to many problems in scheduling, timetabling, and sequencing. In general, coloring problems are computationally intractable. So in practice, we look for approximation algorithms and for classes of graphs on which these coloring problems can be solved efficiently. This project focuses on finding new classes of graphs which we can color efficiently and optimally or near-optimally. Prequisites: a course in graph theory.
* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-08

Mentor: Aaron Jaggard, , DIMACS

Project title: Pattern Avoidance and Parallels Between Families of Permutations

The broad questions motivating this project involve parallels between different classes of permutations: Given two different classes of permutations---e.g., all permutations and the set of involutions (permutations whose square is the identity permutation)---which combinatorial questions have (approximately) the same answer when asked of both classes? (For example, the probability that a permutation in the symmetric group Sn contains a specified subsequence of k distinct integers is exactly 1/k!, while the probability that an involution in Sn contains the subsequence approaches 1/k! as n approaches infinity [3].)

In this project, we'll focus on questions related to pattern avoidance by permutations; a permutation p:{1,2,...,n}--->{1,2,...,3} (which we write as its list of values p(1)p(2)...p(n)) avoids the pattern 231 if there is no triple (i,j,k) of indices such that i<j<k and p(k)<p(i)<p(j) (i.e., the order relationships between p(i), p(j), and p(k) are exactly the same as those between 2, 3, and 1); avoidance of other patterns is defined analogously. Depending on the interests of the student, the questions that we study may range from concrete (as in [1], answering questions about specific patterns) to more abstract (as in [2], answering questions about families of patterns or, even more generally, about parallels between different families of patterns). We'll use Mathematica or Maple to help suggest/test conjectures, but extensive programming experience is not required.

1. O. Guibert, E. Pergola, and R. Pinzani, "Vexillary involutions are enumerated by Motzkin numbers," Ann. Comb., 5 (2):153-174, 2001 (link).

2. A.D. Jaggard, "Prefix exchanging and pattern avoidance by involutions," Elec. J. Comb., 9 (2): Research paper #16, 2003 (link).

3. B.D. McKay, J. Morse, and H.S. Wilf, "The Distributions of the Entries of Young Tableaux," J. Comb. Th. A, 97 (1):117-128, 2002 (link)..

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-09

Mentor: Paola Vera-Licona,, DIMACS/BioMaPs/Mathematics Department

Project title: On the Reverse-engineering of Biological Systems

Mathematical and computational methods have provided powerful tools for the study and understanding of biological systems [1]; some of the methods used in my research range from combinatorics and computational algebra to evolutionary computation tools [2]. Typically, the modeling approach employed is that of reverse-engineering, which utilizes data to infer the structure of a given a system; the type of networks of interest include: gene response networks [3] and brain response networks [4], reconstructed from microarray and fMRI data, respectively.

In this project, we will be introducing some of the techniques afore mentioned and how they can be applied to construct the structure and dynamics of biological systems of interest. Theoretical or applied aspects on these methods will be studied according to the student’s background and interest.

[1] B.O. Palsson. 2006. Systems Biology: Properties of Reconstructed Networks. Cambridge University Press, New York.

[2] P. Vera-Licona. 2007. Algorithms for modeling and simulation of biological systems; applications to gene regulatory networks. Electronic Dissertation. University Libraries, Virginia Polytechnic Institute and State University.

[3] R. Albert and H. G. Othmer. 2003. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol, vol. 223, pp. 1-18.

[4] P. Vera-Licona, B. Komisaruk, WC. Liu, M. Stillman, R. Laubenbacher. A Reverse Engineering Method of Brain Response Networks. Journal of Neuroimaging. Preprint.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-10

Mentor: Endre Boros, , Rutcor and Paul Kantor, , School of Communication, Information and Library Studies

Project title: Interactive Tools for Resource Allocation in Threat Detection

We are working on the problem of allocating scarce human and technical resources to provide better screening within the United States, particularly for nuclear threats. We work with models representing realistic (but not actual, which is classified) information on sensors and their effectiveness. We seek one or two students who will work on the problem of interfacing complex linear and nonlinear optimization programs to the unsophisticated user and visualizing the effect of policy choices on the overall cost-effectiveness of complex screening policies.
* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-11

Mentor: Debbie Yuster, , DIMACS, Mathematics Department

Project title: Minimal Tropical Bases for Matroids

A matroid is a combinatorial object consisting of a set of elements, along with information about which subsets of those elements form “dependent sets”. We can think of this as generalizing a set of vectors in a vector space and information about which of the vectors form linearly dependent sets [1]. Tropical geometry is a growing area of math that turns complicated geometric objects into piecewise-linear objects that are much easier to understand [2]. Certain “tropical linear spaces” can be related to matroids, and questions about these tropical objects turn into purely combinatorial questions about dependent sets of matroids [3]. We will investigate different classes of matroids and try to characterize their “minimal tropical bases”. Computer programming experience may be helpful but is not necessary.


[1] Sandra Kingan’s Matroid Website:

[2] D. Speyer and B. Sturmfels, “Tropical mathematics”

[3] J. Yu and D. Yuster. “Representing tropical linear spaces by circuits,” to appear in Proceedings of the 19th International Conference on Formal Power Series and Algebraic Combinatorics, 2007.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-12

Mentor: Debbie Yuster, , DIMACS, Mathematics Department

Project title: Computing Shift Equivalence

Two square, integer matrices A and B (not necessarily the same size) are shift equivalent if there exist integer matrices R and S, and a whole number m, such that AR=RB, SA=BS, Am=RS, and Bm=SR. It is an open problem to decide whether or not two given matrices are shift equivalent. Even shift equivalence of 2x2 matrices is not very well understood. I am trying to understand shift equivalence for a project involving building a database of dynamical systems to be used for biologists. Our summer research project would consist of trying to classify and understand different shift equivalence classes of matrices, and possibly shift equivalence classes of group homomorphisms (which are defined analogously). Prerequisite: a strong background in linear algebra. Abstract algebra would be helpful but is not necessary.


[1] D. Lind and B. Marcus, An Introduction to Symbolic Dynamics and Coding [Chapter 7], Cambridge University Press.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-13

Mentor: Wanpracha (Art) Chaovalitwongse, wchaoval at, Department of Industrial Engineering

Project title: Brain Disorder

Epilepsy is the second most common brain disorder after stroke and currently afflicts at least 2 million Americans. There are 4 stages in the seizure process: normal, pre-seizure, seizure onset, and post seizure, all of which can be captured by an EEG. There has been recent research indicating promising signs of seizure predictability. The objective of this study is to determine whether the correlations/synchronizations among different brain areas during normal and pre-seizure periods are significantly different. We will investigate and apply advances in signal processing, statistical analysis, and optimization.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-14

Mentor: Wanpracha (Art) Chaovalitwongse, wchaoval at, Department of Industrial Engineering

Project title: Cooperative Sensors in Battlespace

Wide Area Search Munitions (WASMs) are unmanned aerial vehicles with an array of onboard sensors, a warhead or other kill mechanism, and autonomous flight capabilities. WASMs have played many important roles in the modern battle field including reconnaissance, search, battle damage assessment, or communications relay. The ATR (automatic target recognition) system in WASM is used to identify potential targets and broadcast this information to other WASMs. However, target identification is subject to errors. For example, two WASMs might detect the same target, but associate with that target two separate target tracks. Conversely, two WASMs might detect separate targets, but assign the same target ID to both targets. We call this problem as the object misidentification (OMI) problem. To solve the OMI problem for a set of points in 4-dimensional space (time, longitude, latitude and altitude), we will investigate data mining techniques like clustering to identify the individual route of each target (detected point). The overall goal is to reconstruct the battle space using statistical, optimization, and data mining approaches.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-15

Mentor: Danfeng Yao, , Computer Science Department

Project title: Privacy and Visualization of Local Sharing

We are working on the problem of flexible and assured information sharing in crisis situations. We have developed quantitative models for authorizing access based on contextual and digital credential evidence. The project will examine the privacy aspects of the model by analyzing potential attacks and prevention. We will also develop a simple visualization tool for sharing location information. Some knowledge of computer security is useful, but not required.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-16

Mentor: Danfeng Yao, , Computer Science Department

Project title: Trust Inference in Web 2.0 Environments

Web 2.0 refers to web-based services that facilitate collaboration and sharing among users, such as Wikipedia and Sensorpedia. The aim of this project is to quantitatively determine the trustworthiness of data integrated from multiple sources. The specific task is to develop a trust model and evaluate algorithms to compute trust scores of integrated data based on the trustworthiness of data sources.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-17

Mentor: Neeraj Kayal, , DIMACS

Project title: Decomposing a Group Into a Direct Product of Two Groups

In this project, we will investigate an algorithmic problem arising out of groups. The problem is to devise an efficient algorithm to split a group into the direct product of two of its subgroups, if such a decomposition exists. See the wikipedia entry for the definition and basic properties of a group and the notion of direct product of two groups. Besides being an interesting problem by itself, a solution to this problem might play a significant role in any eventual solution to the group isomorphism problem. The group isomorphism problem involves deciding if two given finite groups are isomorphic or not. One possible line of approach is to observe that if a group decomposes then the subgroup of commuting elements corresponding to any group element also decomposes. The challenge then is to stitch together the information obtained from decomposing these kind of proper subgroups in order to obtain a decomposition of the original group.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-18

Mentor: Neeraj Kayal, , DIMACS

Project title: Recognizing Polynomials Which Have a Root Modulo Every Prime

Given a polynomial f(x) with integer coefficients, can we efficiently determine if it has the following property: the polynomial f(x) has a zero modulo p for every prime p. For example, the polynomial
 f(x) = x2 - 1.   It has a root modulo every prime p because the integer 1 is a root of this polynomial. The polynomial   f(x) = x2 + 1   does not have this property. A somewhat more nontrivial example of a polynomial with this property is the polynomial: f(x) = (x3 - 2)(x2 + x + 1). The project will involve investigating such polynomials and identifying their properties and trying to come up with a characterization that is hopefully strong enough that we can turn this characterization into an efficient algorithm. Besides being an interesting problem in itself because of its elementary description, if shown to be in polynomial time, it would be one of the first computational problems which involve two alternating quantifiers of the form (for all x, there exists y so that ...). It is known that a polynomial f(x) has this property if and only if the roots of the polynomial have a certain kind of symmetry and this problem can be boiled down to a group theoretic problem. More specifically, f(x) has this property if and only if every automorphism in the Galois group of the polynomial f(x) fixes at least one of the roots of the polynomial. It is hoped that groups which have this property must be fairly small and if this conjecture is true, then we would get an efficient algorithm. Thus, if we take this approach, then the project would involve trying to determine how large can a group be if all its members fix at least one root.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-19

Mentor: Vinod Ganapathy, , Computer Science Department

Project title: Leveraging Transactional Memory for Computer Security

As multi-core chips become ubiquitous, programs of the future will have to be multi-threaded to harness the parallel processing power afforded by these chips. Researchers and practitioners are therefore seeking programming models and tools to make multi-threaded programs easier to write, verify and debug. One such promising model that is the subject of much current research is Transactional Memory (TM). TM systems aim to ease multi-threaded programming by providing system support (at the hardware and software level) for transactions.

This project will seek to leverage TM systems to improve the security of both multi-threaded and single-threaded applications. TM systems provides several key features such as precise access to runtime information, isolation, and rollback, that make them an attractive target to implement security services. The goal of this project will be to design and implement a security service, such as a malware detector, or an information flow tracker, within a TM system.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-20

Mentor: James Abello, , DIMACS

Project title: DIMACS Graph Mining

In this project we will build a variety of multi-graphs that can be extracted from this data set. We will then use our current set of graph analysis tools to parse, navigate, visualize and synthesize the findings. One central challenge is to devise methods that are privacy preserving.

Requirements: Some Knowledge of SQL, fundamental graph algorithms and C/C++ programming is a plus. However, there are several analysis tasks that can be performed using our plug in graph visualization system and that require only minimal amount of programming.

[AvHK06] J. Abello, F. van Ham, and N. Krishnan, "Ask-GraphView-: A Large Scale Graph Visualization System", IEEE Transactions in Visualization and Computer Graphics, 12(5): 669-676 (2006).

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-21

Mentor: James Abello, , DIMACS

Project title: Name That Cluster

Given a user query, search engines generally return a very sizeable collection of possible answers. Clustering has been proposed as a tool to partition the possible answer set into more manageable subsets of related results. There is no current agreement on the preferred mode of presentation of these clusters. Currently, most search engines display the set of results on almost a pure textual form. However, relatively recently we have witnessed some timid attempts to use some graphical representations. This study is a first step to elucidate when and why text appears to outperform graphics for certain fundamental clustering related tasks. To this end we designed three interfaces to display flat clusters of user queries. The interfaces are enhanced with mechanisms by which users provide feedback about the relevance of a cluster for a pre-specified input query. Subsequently, users are asked to provide a name for a given cluster that best describes the cluster contents. In this project, we will analyze the results obtained from a web user study that is planned to run from Feb to May 2008.

Requirements:Basic Statistics, reading clustering related papers and preparing summary of findings.

[ASGT07] J. Abello, J, Schulz, H, Gaudin, B, and Tominski, C (2007). Name That Cluster - Text vs. Graphics, IEEE InfoVis Conference, Sacramento, November 2007.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-22

Mentor: James Abello, , DIMACS

Project title: Point Configurations, The Weak Bruhat Order of Sn and The Majority Rule

Maximal Chains in the Weak Bruhat Order of the Symmetric Group give rise to a special class of graphs called Persistent Graphs. These graphs are directly related to visibility graphs associated with point configurations on the plane and they encode transitivity of the majority rule. This project will study the combinatorial properties of this new class of graphs and their geometric implications. Requirements: A good discrete math course, basics of graph theory , linear algebra and a strong will to face interesting combinatorial geometry questions.

[AEK95] J. Abello, O. Egecioglu and K. Kumar, "Visibility Graphs of Staircase Polygons and the Weak Bruhat Order", Discrete and Computational Geometry, Volume 14, 1995.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-23

Mentor: Graham Cormode,, AT&T Labs - Research

Project title: Solving Problems on Uncertain Data

There is growing interest in handling uncertain data, which arises from sensor networks, record linkage, and confidences attached to the results of machine learning. This prompts the need to design and analyze algorithms for solving various problems on uncertain data, such as clustering. This project is to design and implement algorithms for solving new problems on uncertain data with provable guarantees.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-24

Mentor: Alantha Newman,, DIMACS

Project title: Finding Isomorphic Substructures, and Its Applications to Analyzing Interaction Graphs

Graph or subgraph isomorphism problems are that of determining if two graphs have the same structure or substructures. We can think of the following graph optimization problem: Given two graphs G and H, we would like to find the largest subgraph that is isomorphic to some subgraph of both G and H. A recent technique that has been applied to graph optimization problems is semidefinite programming. This project involves (a) developing semidefinite programming based methods and code for finding isomorphic subgraphs, and (b) exploring applications in specific "interaction graphs". For example, if we let the nodes of the graph represent people or objects and we let the edges represent the relationships between pairs of people or between a person and an object, then the problem of finding isomorphic substructures could be used to detect similar behavior patterns.

* You must be a enrolled at a U.S. university to be eligible for this project.

Project #: DDD2008-25

Mentor: Gyan Bhanot, , BioMaPS Institute and BME, Rutgers University

Project title: Finding the Sequence of "mtDNAEve"

Mitochondria (mtDNA) are structures found in the cytoplasm of eukaryotic cells responsible for vital cellular functions such as oxidative phosphorylation, release of Cytochrome C to initiate apoptosis etc. The current understanding of their origin derives from a proposal by Lynn Margulis in the 1970s, who argued for an extracellular origin due to an endosymbiosis event between bacteria and early prokaryotic cells. In complex organisms such as humans, mtDNA are maternally inherited without recombination and have a high rate of mutation (10X the mutation rate of nuclear DNA). These facts make a population analysis of their sequence an ideal method of tracing the maternal ancestry of geographically isolated populations to understand the origins and migratory history of humanity. Such analysis using clustering and phylogenetic techniques have shown that modern humans emerged from Africa in two or more migrations approximately 50-70 kYBP. The tree describing these events has its root in a sequence that represents “mitochondrial Eve”, who can be thought of as the ancestral mother of all of modern humans. Mapping the tree to world geography suggests that “mtDNA Eve” lived in Africa approximately 200 kYBP.

The project we propose is to use data on approximately 2000 mtDNA sequences from worldwide sources (available at ) to infer the mtDNA sequence of “mitochondrial Eve”. This would involve first constructing trees using methods such as parsimony, maximum likelihood, UPGMA, Neighbor Joining and Clustering and then inferring a “most likely sequence” for “mtDNA Eve” using these trees . Permutation tests on haplogroup (cluster) labels would be used to calculate a weight for each possible base assignment for each mtDNA locus. This would be based on an analysis of relative accuracy of each tree based on a probability measure using consensus accuracy of its internal branches across possible trees, given the observed clustering of sequences into robust haplogroups. The basis for this analysis is a recent paper submitted for publication where the mentor is a senior author.

The undergrad selected for this project must be able to program in C, C++, Java or Matlab, must learn to use the tree building software Phylip and must have a basic understanding of the underlying biology (e.g. as described in the books: “Adam’s Curse” and “The Seven Daughters of Eve” by Brian Sykes and “Acquiring Genomes” by Lynn Margulis and Dorion Sagan).

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-26

Mentor: Vijay Ramachandran,, Colgate University

Project title: Modeling and Simulation for Better Internet Routing

Interdomain routing is the task of establishing connectivity among the autonomous subnetworks of the Internet, and is currently handled by the Border Gateway Protocol (BGP). BGP is unique among routing protocols because a subnetwork can independently configure its routers based on complex policies. Certain combinations of these local policies can cause global instability. Therefore, understanding how interdomain-routing policies are used is essential for a properly functioning Internet.

Recent work, taking a theoretical approach, has provided guidelines on how to mitigate some of these problems. However, there are potential obstacles to adopting these guidelines. First, their efficiency and effectiveness in real-world networks have not been studied. Second, the ability to detect circumventions of these guidelines (for self-interest or malice) is unknown, as is the effect of partial deployment or compliance. Third, the theoretical assumptions may not always correspond with the details of policy implemention in today's Internet.

In this project, we will study these issues through modeling, simulation, collection and analysis of real-world routing-policy data, and other methods as appropriate. The specific focus of and techniques used in the project will reflect the interests and background of the student. Students should be comfortable with programming and learning new software tools, although no specific language background is required.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-27

Mentor: Regina Liu, , Statistics Department

Project title: Extreme Value Theory and Applications in Simultaneous Monitoring of Multiple Risk Indicators

Risk assessments in reality often encounter extreme settings with few or no occurrences of the event of interest. Inferences about risk indicators in such settings face the problem of insufficient or no data. The extreme value theory is particularly well suited for addressing this type of problems. Among other usefulness, we plan to apply the multivariate extreme value theory to establish thresholds for signaling different risk levels in the context of simultaneous monitoring of multiple risk indicators. The proposed threshold system is to be justified in terms of multivariate extreme quantiles. As an illustrative example, the proposed approach is to be applied to the monitoring of airline performance measures collected by the FAA over a period of 5 years. We hope to use this threshold system to assign different risk levels to the observed airline performance measures. In particular, we expect to divide the sample space into regions with increasing levels of risk. Moreover, in the univariate case, such a thresholding technique should be useful for determining a suitable cut-off point on a runway for holding short of landing aircrafts. This cut-off point needs to be chosen to meet a certain required level of safety when allowing simultaneous operations on two intersecting runways, which can help ease the congestion of air traffic.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Project #: DDD2008-28

Mentor: Alex Borgida, , Computer Science Department

Project title: Source-to-Source Translation Using Ontologies

Real world SQL database management systems (as well as programming language implementations) have a surprising (and bewildering) number of variants which make it very hard to port software written for one vendor to another, especially if one wants to do this "intelligently". The proposed project is to consider the use of formal ontologies to express the properties of data definition and data manipulation languages, and use this towards automatic source-to-source translation of database applications. It is expected that the research will involve not just ontological modeling but also studying extensions to Description Logics and their reasoners (such as OWL) to better represent the semantics of programming language constructs.

Requirements: familiarity with description logics, conceptual modeling, DBMS.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

Please revisit this page frequently, additional projects will be posted through January.
Index Index of REU program
DIMACS Home Page
Contacting the Center
Document last modified on February 4, 2008.