DIMACS/DIMATIA/Mathematics REU Final Student Presentation Abstracts

A Discrete Representation of DNA Base Pair Steps
Yi Lin, Emory University, Atlanta, GA
Mentors: Drs. Wilma Olson and Irwin Tobias, Department of Chemistry

Some of the results obtained from the traditional approaches, which regard DNA sequences as smooth curves, were inconsistent with experimental data. New approaches have been proposed over the past years in an attempt to use a discrete model to represent DNA base pair steps more accurately. In our project, we were most concerned with developing a method that can calculate the twist number for any closed discrete DNA base pair step sequences. We explored and derived two new methods for calculating the twist number. The two methods were tested successfully for a few shapes of which the twist numbers were known. We have prelimarily concluded that the two methods might be two valid ways of calculating twist numbers for any closed DNA sequences and, furthermore, can possibly provide a better understanding into the folding of DNA from its initial to final stage.

On Generating All Shortest Paths and Finding Minimal Cutsets.
Elizabeth Hayden, Coe College, Cedar Rapids, IA
Daniel MacDonald, Seton Hall University, South Orange, NJ
Mentor: Dr. Endre Boros, RUTCOR

In the 1960s, an important problem in graph theory arose in how to generate the shortest path between two given vertices. This problem was examined by several prominent mathematicians, including Richard Bellman, Lestor Ford Jr., and Edsger Dijkstra. A similar problem involves finding all short paths between two vertices (that is, paths with length less than some given threshold). Second, once given these short paths, the problem of finding all ways to minimally cut such short paths arises. We will specifically consider the Hierarchical Communications Web form of a planar graph. We will use the concept of independence systems to create an algorithm for finding all maximal sets of edges in the graph such that no short path exists. Within this algorithm, we introduce a Boolean function to find minimal cut-sets based on computing the product of given short edges. In the end, we are aiming for acceptable computational complexity in these algorithms.

Feature Selection and Error Tolerance for the Logical Analysis of Data
Craig Bowles, Cornell University, Ithaca, NY
Kathryn Davidson, University of Pennsylvania, Philadelphia, PA
Mentor: Dr. Endre Boros, RUTCOR

We use a new data mining process to determine critical attributes in a classification problem. Our method allows for maximal error tolerance in the original data while producing simple boolean formulas. We implement the method and test it on the Wisconsin Breast Cancer Database. We correctly identify over 97% of positive examples with a three-variable boolean formula. Our process utilizes an existing dualization algorithm and the method of binarizing data with missing bits.

Solving the Maximum Independent Set Problem for S_1,2,2-free Planar Graphs
Sarah Bleiler, Seton Hall University, South Orange, NJ
Mentor: Dr. Vadim Lozin, RUTCOR

In general, the maximum independent set problem is NP-hard to solve in the class of planar graphs. However, it is possible to solve this problem in polynomial time when we forbid certain induced subgraphs. We use the method of graph augmentation to try to find a solution to the MIS problem for planar graphs, no connected component of which contains an induced S_1,2,2 subgraph. Thus far, we have characterized all of the augmenting graphs in this class. We are currently working on finding an algorithm which finds all of these augmenting graphs in polynomial time.

3-Coloring Claw-free Graphs
Samuel Nelson, Bucknell University, Lewisburg, PA
Mentor: Dr. Vadim Lozin, RUTCOR

We studied the 3-colorability problem for different classes of claw-free graphs, particularly to show whether the problem on a subclass was NP-complete or solvable in polynomial time. We also studied the 3-edge-colorability problem for cubic graphs with minimum girth k, and found the problem on these graphs to be NP-complete.

Complexity of planarity testing
Marek Krcal, Charles University, Prague, Czech Republic
Mentor: Dr. Eric Allender, Department of Computer Science

Brief introduction to (reminding of) complexity class L and graph planarity. About the old proof of complexity of planarity testing. My new (hopefully) simpler way: Using existing intuitive proof for graphs of maximal degree 3, together with my L algorithm for degree reduction (to max. degree 3) preserving planarity. Some illustration from proof.

Connections Among Knot and Tangle Invariants
Matthew Meola, Rutgers University, New Brunswick, NJ
Joseph Walsh, Rutgers University, New Brunswick, NJ
Mentor: Dr. Chris Woodward, Department of Mathematics

We introduce invariants with the Jones Polynomial. We then talk about chain complexes and discuss the Khovanov Invariant. We will speak of a related invariant that comes from Paul Seidel's geometric interpretation of the Khovanov Invariant and discuss our work towards using Hotchschild Homology as a way of relating the Khovanov invariant to other knot and tangle invariants.

Flight Scheduling
Jessica McCoy, North Carolina State University, Raleigh, NC
Mentor: Dr. Wanpracha Chaovalitwongse, Department of Industrial Engineering

Hundreds of flights traverse the Pacific Ocean daily, connecting Asia, Australia/New Zealand, and the United States/Hawaii. Each of these flights submits a request for desired path, altitude, and departure time to the current scheduling system, which is responsible for resolving all conflicts in flight parameters and for generating a schedule. Both the current scheduling system and different algorithms were developed and compared using XPress-MP. Additional constraints and alternate models mimic more realistic conflict resolution while minimizing changes in requested parameters; future work will focus on improving capacity.

Tensor Decomposition of Microarray Data
Jennifer Staigar, Rutgers University, New Brunswick, NJ
Mentor: Dr. Stanley Dunn, Department of Biomedical Engineering

DNA microarrays are a technology used to quantify gene expression. There are many mathematical techniques used to analyze microarray data, with Singular Value Decomposition (SVD) and Principle Components Analysis (PCA) the most common. In our research, we consider modeling the expression data as a tensor, with the raw red and green channels as the third factor (with genes and experiments being the others). In this preliminary study, we consider two methods for decomposing the tensor to elucidate the three factors that give rise to the observed microarray data. In a preliminary study of the yeast cell cycle, the results suggest that tensor decomposition can cluster similar genes by cycle with further studies necessitated.

Eulerian Graph Representation for siRNA Sequence Structure
Karen Lostritto, Brown University, Providence, RI
Mentor: Dr. Stanley Dunn, Department of Biomedical Engineering

RNA interference (RNAi) is a process by which gene expression is suppressed. Since RNAi is a powerful tool for determining gene function and preventing disease, much work has been done to understand the important role that short interfering RNA (siRNA) plays in RNAi. Characteristic sequence, thermodynamic, and structural properties of functional siRNAs have been identified by researchers. We investigated the Eulerian Graph Model of siRNA, introduced by Pancoska et al, in hopes that it would provide valuable insight into the level of functionality of a potential siRNA sequence. Using this graph structure, we have identified siRNA properties, captured in the graphs, which discriminate between functional and nonfunctional siRNAs.

Testing Models of Virus Capsid Structure for Emerging and Re-Emerging Viruses
Aziza Jefferson, Rutgers University, New Brunswick, NJ
Mentor: Dr. Stanley Dunn, Department of Biomedical Engineering

In this research we examined past theories for predicting the structure of virus capsids, and determined if the theories still held true for emerging or re-emerging viruses. Two theories correspond to capsid structure, one from Caspar and Klug^1 concerning the triangulation of the capsid and the second an approach by Twarock^2 using Tiling theory. Using the Ebola virus, we compared the structure of the capsid to see if these theories held. For Ebola, both theories did not hold. With this conclusion, it points to the idea that a new theory needs to be developed in order to predict the structure of emerging or re-emerging virus capsids.

1.Caspar, D.L.D., and A Klug. "Physical Principles in the Construction of Regular Viruses." _Cold Spring Harbor Symposia on Quantitative Biology_ 27 (1962): 1-24

2. Twarock, R. "Mathematical models for tubular structures in the family of Papovaviridae." _Bulletin of Mathematical Biology_ (2004): 1-15

Experimental Evaluation of Hop-optimal Networks in the Weak Sensor Model
Andrew Rodriguez, The University of Texas at San Antonio, San Antonio, TX
Mentors: Dr. Martin Farach-Colton, Mr. Rohan Fernandes, and Mr. Miguel Mosteiro, Department of Computer Science

A sensor network is a computer network of many spatially distributed devices using sensor nodes to monitor conditions, such as temperature, pressure, motion, or environmental pollutants at different locations. These devices are usually small and inexpensive so that they can be produced and deployed in large quantities. As a consequence of their size and price, their abilities and resources in terms of energy, memory, computational speed and bandwidth are quite limited. Due to these limited capabilities of the sensor nodes, even network initialization is a nontrivial task. Given the Weak Sensor Model (WSM) that specifies the restricted working conditions of such nodes, I seek to experimentally evaluate an optimal distributed algorithm for sensor network formation developed by Dr. Farach-Colton, et al. at Rutgers University. The model that describes the topology of such a resulting network is a Constant-degree Hop-optimal Spanning Subgraph (CHSG). The CHSG, hence the network, has asymptotically optimal hop-stre tch. In this work, properties of the network obtained by this algorithm, such as connectivity and hop-stretch, are evaluated through simulation.

Entity Resolution for Authors of Biological Sciences Papers
Jordanna Chord, Gonzaga University, Spokane, WA
Melissa Mitchell, University of Detroit - Mercy, Detroit, MI
Mentor: Dr. Paul Kantor, SCILS

When several persons have the same name, we would like to be able to tell them apart, by characteristics of their writings. The intelligence community, which supports our work, has defined a "challenge problem" which approximates the real problem that they face. They will present many items and ask us to separate them into those authored by "different persons".

To prepare for this challenge problem, we have selected ten "author names" for which there is one prolific contributor, and approximately an equal number of papers by other (different) persons. We drew associated abstracts address information and keywords from the online database Pubmed.

We applied Bayesian Binary Regression to identify author using a combination of seven document attributes. We have achieved strong performance (defined as a Receiver Operating Characteristic with an area under the curve of 80% or better, in several ways. This can be done using only keywords, or only addresses, or with a representation that included all attributes: abstract words, address words, address fields, co-authors, keywords, and title.

Results clearly identified keywords and addresses as working well alone in identifying documents. One expects the use of all variables to be better. But there is little difference. This is interpreted as meaning that either set of variables can accomplish the classification, but that they do not complement each other. Future research will examine how the regression approach balances selection among them when all variables are included.

Thanks to: Alex Genkin, Dmitriy Fradkin and Andrei Anghelescu for crucial support with coding and the use of the BBR software.

Correlations on trees and forests.
Josef Cibulka, Charles University, Prague, Czech Republic
Jan Hladky, Charles University, Prague, Czech Republic
Mentor: Dr. Jeff Kahn, Department of Mathematics

We were interested in problems concerning correlations between two edges or between two paths in a uniformly random forest - see problems 1 and 2 from our project webpage for more details. We proved a similar correlation for two edges in trees instead of forests, which had already been proved, but our proof is simpler and it uses techniques that might be useful for the conjectures concerning forests. We also tried to find a counterexample for those conjectures and wrote a program that tested all the graphs on at most 9 vertices.

Ionization of a two-state system
Jonathan Bergknoff, Cornell University, Ithaca, NY
Mentor: Dr. Joel Lebowitz, Department of Mathematics

We investigate a quantum system of a particle in an attractive double delta function potential. The potential is perturbed by a periodic forcing. We are interested in determining probability of ionization, which corresponds to the particle not being in a bound state. A general mathematical approach will be presented which solves the problem in principle, although a numerical approximation is necessary for any closed form final result.

Error Bounds in Wave Equations
Charles Siegel, Rutgers University, New Brunswick, NJ
Mentor: Dr. Avy Soffer, Department of Mathematics

The classical wave equation is used regularly in modern physics, often for situations where exact solutions are impractical or impossible. In some of these situations, the solution on all space is approximated by a finite box, and boundary conditions are set such that the waves will be dissipated as they escape. Unfortunately, these boundary conditions are rather hard to work with, because they require various transformations at which to arrive, and the results are difficult to inverse transform. So, we are working on an attempt to find approximations for the boundary conditions which give reasonable error bounds. To do so, we are using methods from Complex Analysis, Continued Fractions and Numerical Analysis.