
Project title: Proteininduced DNA Looping
Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multisite cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such longrange proteinprotein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive basepairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressoroperator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.
[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705707.
[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199223.
[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287313.
[4] Olson, W. K., Gorin, A. A., Lu, X.J., Hock, L. M., and Zhurkin, V. B., ``DNA Sequencedependent Deformability Deduced from ProteinDNA Crystal Complexes,'' Proc. Natl. Acad. Sci. USA, 95, (1998), 1116311168.
[5] Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.H., Srinivasan, A. R., and Schneider, B., ``The Nucleic Acid Database: A Comprehensive Relational Database of Threedimensional Structures of Nucleic Acids,'' Biophys. J., 63, (1992), 751759.
[6] Westcott, T. P., Tobias, I., and Olson, W. K., ``Modeling Selfcontact Forces in the Elastic Theory of DNA Supercoiling,'' J. Chem. Phys., 107, (1997), 39673980.
[7] MullerHill, B., The Lac Operon, Walter de Gruyter, Berlin, 1996.
Project title: Projections Based Learning
A simple generalization of decisiontree based learning methods involves, iteratively, the projection of the trainign set into the space of a small subset of the attributes, derivation of a best classifier in that small space, and the introduction of the obtained signal as a new attribute for the original problem. Under some specific conditions this can be shown to converge to a good classifier. The project would involve extensions of this idea, characterizations of wider classes when the method can be guaranteed to converge, and implementations of such learners.
Project title: Streaming Join Computations
Many applications (sensor networks, IP traffic) generate massive "streams" of values. One problem of interest is how to compute databasestyle 'joins' between pairs of streams. Given limited buffer sizes, computing the full join is not possible, and instead the goal is to output a subset of the full join that is as large as possible. Several heuristics have been proposed for this problem, this project is to come up with new methods where a provable "competitive guarantee" can be given about the quality of the method compared to the "optimal offline" solution.
Project title: New Wavelet Representations
Wavelets comprise a class of ways of representing signals (often images or audio) in terms of a set of basis functions. Their value comes from the fact that if only a select number of coefficients are retained, then a high quality approximation of the original signal can be recovered (and the quality of this approximation can be quantified). This project is to investigate and perhaps implement wavelet transformations based on methods which are resilient against changes to the signal in terms of cuts or insertions (that is, the wavelet representation will be similar after the alteration to the original representation). I have a suggested way to build the wavelets, but this procedure needs refining, and we can prove properties of the wavelet basis to show that it will be a good choice.
Project title: Decision trees for fast data streams
Traditionally people build decision trees for stored data. But in emerging applications, data arrives in a stream, and one has to constantly update the decision trees to reflect the ongoing trend of data. How to build novel algorithms? How to test emerging algorithms on realistic data? This project will be about novel algorithm design as well as implementing and testing algorithms.
Project title: Threshold properties of geometric random graph
The random graph G(n,p) where we have n vertices and with probability p we put an edge between any two vertices has been studied for over 50 years. In contrast, geometric random graphs G(n,r,l) where we put n vertices in a circle of radius l and two vertices that are at most distance r from each other are connected, has been less studied. But these graphs are interesting to researchers in ad hoc and sensor network areas. so, there is new interest in them. This project is to study properties of networking interest in G(n,r,l). It involves both mathematical study of the threshold phenomena in such random graphs, as much as simple programming effort to simulate and understand them.
Project title: Liar games with coins
Say we have n coins of which is one is defective, ie., it is either heavier or lighter than the other n1 good coins. But we can't tell which is defective by just looking at them. How to find the defective coin using a linear test or a pan balance? This is a classical puzzle, but the twist we want to study is what happens if the pan balanace (or the linear test) returns incorrect answers once in a while, that is, the tests lie? This project is the study of such problems using discrete math tools. It will involve studying liar games, finding effective strategies to do the weighings, and analyzing such strategies mathematically.
Project title: Mutations and treatment of HIV
Standard models describing HIV dynamics within host neglect the frequent and rapid mutations of the virus. But mutations have important implications for treatment: typically a particular drug will be effective against a certain strain, not against the others. This explains why current therapies consist of cocktails of different drugs.
In this project the student will be asked to incorporate this feature in the standard HIV models. In a second step he or she will be asked to analyze these models from different perspectives. To give one specific example, suppose that 3 drugs are used in a particular patient, each targeting a different strain. Now a collection of k>1 new drugs becomes available, each having a different target. However, the patient can only take a limited total number of drugs, say 4. So, from the collection of k drugs, only 1 can be chosen. The problem is then which drug to choose. Different criteria can be used in the decision process. For example, one may look for the drug that minimizes the viral load at some steady state.
Interests in and familiarity with ordinary differential equations and linear algebra is useful for this project.
Project title: Applications in game theory
It is known that, given two mxn matrices A and B, there exist m+n potentials p_{1},...,p_{m}; q_{1},...,q_{n}. Such that
This theorem has important applications in game theory. But how to find such potentials? No polynomial algorithm is known yet, though one should exist, since the problem is in the intersection of NP and coNP.
Project title: Difference graphs
Given n finite sets S_{1},...,S_{n} , let us define a graph G with n vertices v_{1},...,v_{n} in which 2 vertices v_{i} and v_{j} are connected by an edge IFF both differences S_{i} \ S_{j} and S_{j} \ S_{i} have at least k elements each. EVERY graph can be realized in such a way if k is arbitrarily large. But is it still true if k is bounded by a constant? Even for k = 2 no counterexample is known. If k = 1 then only cooccurrence graphs can be realized.
Project title: Applications in geometry and theory of positional games
All edges of a complete graph G are colored by n colors, that is labeled by positive integers [n] = {1,..., n}, where n ≥3. If G contains a triangle whose 3 edges are colored by 3 pairwise distict colors then there exist C_{1}, ... , C_{n} ⊆ V(G) such that no edge from C_{i} is labeled by i and C_{1} ∩ ... ∩ C_{n} = ∅.
This problem has applications in geometry and theory of positional games.
Project title: Directed graph and subsets
Given a (strongly connected) directed graph D = (V,A), and a subset V' ⊆ V of its vertices, list all MINIMAL subsets of arcs A' ⊆ A such that for every two vertices v,v' ∈ V' there is a directed path from v to v' whose all arcs are in A'. This problem is tractable in case V' = V but in general it is still open.
Project title: Runtime of a Program
The runtime of a program is sometimes dominated not by the number of instructions it performs but by memory effects  cache misses, the number page swaps, etc. Some algorithmic research has focused on minimizing some particular memory effect  say, minimizing the number of page faults. Can we minimize *all* memory effects simultaneously? Is this too much to ask for? We consider both theoretical and experimental aspects of this question.
Project title: The Amount of Information About Topic X
Classic capturerecapture estimates an unknown population by marking randomly collected specimens and releasing them. This suggests that if we had uncorrelated ways of retrieving certain kinds of information from the Web, we could estimate the amount of (unretrieved) information in a similar way. Unfortunately, various methods for searching in the Web are not uncorrelated. The challenge problem is: can we describe the correlation among systems in a way that makes it possible to carry out some kind of generalized capturerecapture analysis? Is the characterization reasonable? How much data (how many systems) would be needed? This is an open problem.
Project title: Planar Rectangular Packings
There are several interesting open questions related to packings of rectangles on the plane (sides parallel with coordinate axes). For instance, let us call a packing irredundant, if the smallest rectangle containg two of the given rectangles intersects a third one, for any two of the given rectangles, and no rectangle can be removed without violating this property. We do not (yet) have a complete characterization of irredundant rectangular packings. The project would aim at analyzing the structure of such configurations, and constructing new examples.
Project title: To Be Announced
Project title: Arrangements
Arrangements of hyperplanes have interested mathematicians for centuries. Although much is known about the topological, algebraic, and combinatorial properties of arrangements, there still remain many, embarrassingly simple, open questions. For example, how many turns can be made by a monotone path (a piecewise linear function that follows lines in the arrangements, and can turn at intersection points with other lines) in an arrangement of $n$ lines? Also, because arrangements are fundamental to the subject of Computational Geometry, algorithmic questions are also important. For example: (i) given a twodimensional arrangement of n red lines and n blue lines, define f(x,y) to be the sum of the number of red lines above (x,y) and the number of blue lines below (x,y). Is it possible to find a global minimum of f in less than quadratic time? (ii) how quickly can one construct the convex hull of the kth level in an arrangement of $n$ given lines?
The student will be introduced to computational geometry and taught some of the elementary properties of arrangements. Then a collection of simple questions, like the ones above, will be presented.