|
Mentor: Eric Allender, Computer Science Department
There are many clever algorithms to check if a graph is planar. Hopcroft and Tarjan [1] gave a linear-time algorithm almost 30 years ago, but it was much harder to find efficient parallel algorithms. Ja'Ja' and Simon [2] gave NC algorithms, and a log-time PRAM algorithm with essentially optimal processor and time bounds was given by Ramachandran and Reif [3]. Still, it remained unknown which complexity class planarity is in, until Allender and Mahajan [4] showed that the Ramachandran and Reif algorithm can be reduced to the reachability problem in undirected graphs, showing that planarity is in the class SL. The Ramachandran and Reif algorithm is complicated, because much effort is spent minimizing processors. In this project we will study their algorithm and see if a simpler algorithm can be found that can show that planarity is in SL. Prerequisites: A course in graph theory.
[1] Hopcroft, J., and R. Tarjan, R., ``Efficient Planarity Testing,'' Journal of the ACM, 21, (1974), 549--568.
[2] Ja'Ja', J., and Simon, J., ``Parallel Algorithms in Graph Theory: Planarity Testing,'' SIAM Journal on Computing, 11, (1982), 314--328.
[3] Ramachandran, V., and Reif, J., ``Planarity Testing in Parallel, Journal of Computer and System Sciences, 49, (1994), 517--561.
[4] Allender, E. and Mahajan, M., ``The Complexity of Planarity Testing,'' Proc. 17th Symposium on Theoretical Aspects of Computer Science (STACS) 2000, Lecture Notes in Computer Science 1770, 2000, 87--98.
Mentor: Endre Boros, RUTCOR
Nonlinear binary optimization is one of the most general forms of many combinatorial optimization problems (see e.g., [1]). Applications are widespread and include VLSI design, statistical mechanics, portfolio selection, various production and scheduling problems, game theoretical problems, reliability problems, etc. It is a very hard problem to solve, in general. Finding efficiently an approximate solution is very important, and this research direction has become very active in the past 10 years. The best methods, however, are based on formulations in much higher dimensions than the original problem, posing great computational difficulties when these methods are applied. A promising approach to find approximation without increase in the dimension of the problem, introduced in [2], is based on finding tight convex majorants of the objective function. This leads to the question proposed here: given a multilinear polynomial f(x), find a convex function g(x) such that f(x) less than or equal to g(x) for all x in the unit cube [0,1]n for which the maximum difference (g(x)-f(x)) over the binary vectors x in {0,1}n is as small as possible. For instance, if f(u,v)=uv in n=2 dimensions, then we know that a best convex majorant of f is g(u,v)=((u+v)/2)2. It is, however, not known what are the tightest convex majorants, even for 3-variable multilinear polynomials, e.g., f(u,v,w)=2uv-5vw. Best linear majorants were considered in the literature, leading to good bounds [3,4] and to several interesting open questions about polyhedral combinatorics. Possible formulations for finding a tightest convex majorant include linear programming based on row generation, or semidefinite programming, both of which are interesting and somewhat challenging already for small (n=3,4) dimensions. Prerequisite: An introductory course in operations research or linear programming would help, but an entire course is not needed, since the necessary background concerning LP, semidefinite models, etc. can be learned to the extent needed. We will probably only consider problems involving 3-4 variables.
[1] Padberg, M., ``The Boolean Quadric Polytope: Some Characteristics, Facets and Relatives,'' Mathematical Programming, 45, (1989), 139-172.
[2] Badics, T., ``Approximation of Some Nonlinear Binary Optimization Problems,'' Ph.D. Thesis, RUTCOR, Rutgers University, 1996.
[3] Boros, E., Lari, I., and Simeone, B., ``Block Linear Majorants in Quadratic 0-1 Optimization,'' RUTCOR Research Report, RRR 18-2000, Rutgers University, March, 2000.
[4] Hammer, P.L., Hansen, P., and Simeone, B., ``Roof Duality, Complementation and Persistency in Quadratic 0-1 Optimization,'' Mathematical Programming, 28, (1984), 121-155.
Mentor: Peter Hammer, RUTCOR
This project is devoted to a frequently encountered problem of data analysis, in which a set of ``observations'' is given, with each of the observations being represented as a vector of binary attribute values. The observations in the data set are of two types, and the type of each observation (e.g., positive or negative) is known. Typical data analysis problems related to such data sets include classification (i.e., identification of the type of a new observation not included in the data set), determination of characteristic properties of observations of the same type, analysis of the role of various attributes, etc. The logical analysis of data (LAD) ([1, 2, 3, 4, 5]) is a methodology addressing the above kinds of problems. The mathematical foundation of LAD is in discrete mathematics, with a special emphasis on the theory of Boolean functions. Patterns are the key building blocks in LAD [4], as well as in many other rule induction algorithms (such as C4.5 rules CN2, AQ17-HCI, RISE, RIPPER and SLIPPER -- see for example [6, 7]). Since a typical data set has an exceedingly large number of patterns, all these algorithms are limited to the consideration of small subsets of patterns. In most algorithms, the choice of such a subset of patterns is not explicitly analyzed, in spite of the fact that it has been observed in empirical studies and practical applications that some patterns are more "suitable" than others for use in data analysis. The goal of this project is to model various such suitability criteria and to analyze the relevance of notions of suitability for algorithms for LAD. Prerequisites: A little knowledge of graph theory, linear programming, and any elements of discrete mathematics are all useful, but none is absolutely necessary.
[1] Boros, E., Hammer, P.L., Ibaraki, T., and Kogan, A., ``Logical Analysis of Numerical Data,'' Math. Programming, 79, (1997), 163-190.
[2] Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., and Muchnik, I., ``An Implementation of Logical Analysis of Data,'' IEEE Trans. on Knowledge and Data Engineering, 12, (2000), 292-306.
[3] Boros, E., Ibarakai, T., and Makino, K., ``Logical Analysis of Binary Data with Missing Bits,'' Artificial Intelligence, 107, (1999), 219-263.
[4] Crama, Y., Hammer, P.L., and Ibaraki, T., ``Cause-effect Relationships and Partially Defined Boolean Functions,'' Annals of Oper. Res., 16, (1988), 299-326.
[5] Ekin, O. Hammer, P.L., and Kogan, A., ``Convexity and Logical Analysis of Data,'' Theoretical Computer Science, 244, (2000), 95-116.
[6] Cohen, W.W., and Singer, Y., ``A Simple, Fast, and Effective Rule Learner,'' in Proc. of the Sixteenth National Conference on Artificicial Intelligence, AAAI Press, Menlo Park, CA, 1999, 335-342.
[7] Domingos, P., ``Unifying Instance-based and Rule-based Induction,'' Machine Learning, 24, (1996), 141-168.
Mentor: David Madigan, Department of Statistics
Text categorization is the automated assigning of documents to predefined content-based categories. Applications of text categorization include controlled vocabulary indexing (in both traditional bibliographic databases and newer Web-based directories), content filtering of email and Web access, subsetting information feeds, and data mining on records with both textual and nontextual fields. This project will involve developing and evaluating novel Bayesian statistical methods for text categorization and applying them to very large collections of documents. Prerequisites for this work are a basic proficiency in computer programming as well as an undergraduate course in probabiliity and statistics.
Key References include:
Mentor: Wilma Olson, Department of Chemistry
Many genetic processes are mediated by proteins that bind at separate, often widely spaced, sites on the double helix, tethering the intervening DNA into a loop [1, 2]. Examples of these processes include gene expression and its control, DNA replication, genetic rearrangements, multi-site cutting by restriction enzymes, and DNA packaging. A DNA loop stabilized by such long-range protein-protein contacts constitutes a discrete topological unit. As long as the ends of the DNA stay in place and the duplex remains unbroken, the linking number, Lk, or number of times the two strands of the double helix wrap around one another, is conserved. This constraint in Lk underlies the well known supercoiling of DNA: the stress induced by positioning the ends of the polymer in locations other than the natural (relaxed) state perturbs the overall coiling of the chain axis and/or the twisting of successive base-pairs in the intervening parts of the chain [3]. As a first step in understanding the effects of specific proteins and drugs on DNA looping, we propose to study the imposed bending and twisting of neighboring base pairs [4] in known complexes of proteins and drugs with double helical DNA stored in the Nucleic Acid Database [5]. By subjecting a DNA segment of the same chain length as that found in a given complex to the observed orientation, displacement, and superhelical stress and setting the elastic moduli to sufficiently high values, we can use existing programs, e.g., [6], to simulate the presence of a rigidly bound molecule at arbitrary positions on circular DNA molecules or to model specific systems in which DNA looping plays an important role, e.g., the lac repressor-operator assembly in EscherichiaA0coli [7]. One student could devote a summer project to the collection of geometric information. Other students could apply this information to simulations of DNA loops and folds in subsequent years. Prerequisites: Students should have an interest (but not necessarily formal training) in molecular biology, familiarity with linear algebra in order to understand the parameters used to describe DNA structure, and knowledge of basic chemistry and physics to understand the nature of the DNA molecules and the elastic treatment of the double helix.
[1] Halford, S. E., Gowers, D. M., and Sessions, R. B., ``Two are Better than One,'' Nature Struct. Biol., 7, (2000), 705-707.
[2] Schleif, R., ``DNA Looping,'' Annu. Rev. Biochem., 61, (1992), 199-223.
[3] Bauer, W. R. ``Structure and Reactions of Closed Duplex DNA,'' Annu. Rev. Biophys. Bioeng., 7, (1978), 287-313.
[4] Olson, W. K., Gorin, A. A., Lu, X.-J., Hock,
L. M., and Zhurkin, V. B., ``DNA Sequence-dependent Deformability
Deduced from Protein-DNA Crystal Complexes,''
Proc. Natl. Acad. Sci. USA, 95, (1998), 11163-11168.
[5] Berman, H. M., Olson, W. K., Beveridge, D. L.,
Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan,
A. R., and Schneider, B., ``The Nucleic Acid Database: A
Comprehensive Relational Database of Three-dimensional Structures of
Nucleic Acids,'' Biophys. J., 63, (1992), 751-759.
[6] Westcott, T. P., Tobias, I., and Olson, W. K.,
``Modeling Self-contact Forces in the Elastic Theory of DNA
Supercoiling,'' J. Chem. Phys., 107, (1997), 3967-3980.
[7] Muller-Hill, B., The Lac Operon, Walter
de Gruyter, Berlin, 1996.
Mentors: Kobbi Nissim and Ofer Melnik, DIMACS
There are many problems that lie on the frontier between machine
learning and algorithmics. As an example one may look at issues of
dimensionality- as the number of input variables or model parameters
increase, problems become computationally harder and more opaque. Thus
the intrinsic challenges revolve around finding ways to address issues
of high-dimensionality. Approaches to date have involved
approximation, dimensionality reduction, re-representation, complexity
analysis, generalization/model assumptions or exploiting regularity.
This research project will involve some programming, design and
analysis. A background in computer science, programming, math/ stats
with an interest in machine learning is desired.
Dr. Gurvich has three problems to work on, students selected to work with Dr. Gurvich may work on one or more of these problems.
max aij + pi - qj =
min bij - pi + qj = v
for all i = 1,...,m; j = 1,...,n.
This theorem has important applications in game theory.
But how to find such potentials?
No polynomial algorithm is known yet,
though the problem is in the intersection of NP and co-NP.
Mentor: Michael Rothkopf, RUTCOR
Most models of auctions ignore bid preparation costs. Most if not all of
those that don't ignore them treat them as a fixed amount. I am interested
in a more nuanced model of bid preparation costs in which bidders may
invest more or less effort in bid preparation and in which high effort
implies better value estimation. While many approaches are possible, I
envision the following class of models: Before an auction of a given type
(standard sealed bid, sealed second-price, progressive), each of n a
priori symmetric bidders has to decide how much to invest in gathering
information about her value for what is to be sold. The value is, in the
general case, composed of a private value component and a value component
common to all bidders, and the information gathering investment can be
allocated between these components.
After each bidder makes her investment and gets the information, the
bidders then participate in the auction. Several issues are of interest.
How much is spent on evaluation? If there is both private and common
information, how is the effort allocated between seeking information on the
common value and how much on the private value? How inefficient is the
allocation of the item compared to the case when information is free? How
do these answers depend upon the number of bidders, the cost of
information, and the type of auction?
I envision this analysis being done, at least initially, on a simply,
stylized model in which random variables have independent uniform
distributions. The student working on this should have had a course in
mathematical statistics or probability theory. Some background in
microeconomics or game theory would be a plus.
Mentor: Rebecca Wright, DIMACS
The exact problem will be chosen from general topic areas that include
design and analysis of cryptographic protocols, distributed
cryptography, privacy, secure multiparty computation, and
fault-tolerant distributed computing. Possible goals include
designing protocols, systems, and services that perform their
specified computational or communication functions even if some of the
participants or underlying components behave maliciously, and that
balance individual needs such as privacy with collective needs such as
network survivability and public safety.
Depending on the interests and expertise of the student, this project
can be more focused on foundational mathematical results, on
implementation and experimentation, or on written analysis of software
usage and its security implications.
NOTE: Dr. Wright is currently as AT&T Labs. She will be a Visiting
Research Associate at DIMACS during the term of the REU
program. Background information on her interests can be found at her
website: www.kiskeya.net/rwright
Mentor: Jonathan Eckstein, RUTCOR
Project #: DIMACS2002-06
Algorithms for machine learning
Project #: DIMACS2002-07
Mentor: Leonid Khachiyan, Department of Computer Science
Project #: DIMACS2002-08
Mentor: Vladimir Gurvich, RUTCOR
Project #: DIMACS2002-09
Modeling Bid Preparation Costs in Auctions
Project #: DIMACS2002-10
Distributed cryptography and computer security
Project #: DIMACS2002-11
Computational experimentation with augmented Lagrangian methods for nonlinear optimization
Index of REU program
DIMACS Home Page
Contacting the Center
Document last modified on February 21, 2002.