**Please revisit this page frequently, additional projects will be
posted through January.**

Pairwise similarity between many proteins can be naturally represented as a graph in which nodes correspond to proteins, and edges are introduced whenever two proteins meet some predefined similarity criteria. Such similarity graphs provide domain experts with many valuable insights about, for example, forming of protein complexes, transfer of protein function, etc. The goal of this project is to develop an interactive web-based system to enable exploration of annotated similarity graphs with millions of nodes and edges. The system will be based on a client-server architecture, where client will use Ajax/Javascript to visualize data and to interact with a user, while server will be responsible for graph management and queries processing. A summer REU intern will be involved in implementing the client side, and will help to design required data exchange protocols.

Dissections of polyhedra have been studied by geometers for thousands of years, are much loved by puzzle creators and solvers, and have recently found interesting applications in coding theory. Given two polyhedra A and B in n-dimensional Euclidean space, can we cut A into pieces A1, A2,…,Am and B into pieces B1,B2,…,Bm, where m is a finite number, such that Ai is congruent to Bi, i=1,2,…,m. In this summer project we will explore computational and visualization tools to help us construct good dissections, and will explore other applications to the information sciences.

In our work we are developing a framework termed Higher Order Learning that builds models by leveraging associations in the form of latent relationships in an entity-relation graph. A general multi-attribute entity-relation graph has nodes that are people, places and things, etc. Higher Order Learning violates the underlying assumption made in traditional machine learning algorithms that records are independent and identically distributed(Taskar et al, 2002). Although this assumption simplifies the underlying mathematics of statistical models and of the corresponding parameter estimation procedures, it actually does not hold for many real world applications (Getoor & Diehl, 2005). Machine learning methods based on Higher Order Learning leverage relations between objects and features in an entity-relation graph and in doing so, operate on a much richer data representation compared to the traditional feature vector form.

In this research, Higher Order Learning techniques have been investigated and applied to a number of interesting datasets including Border Gateway Protocol (cybersecurity), E-Commerce (web mining), and Nuclear Detection (homeland security). The approach has been shown to outperform the popular Support Vector Machine algorithm on benchmark textual data sets as well as real-world streaming data Ganiz, Lytkin & Pottenger). REU summer interns will have an unparalleled opportunity to explore Higher Order Learning algorithms and applications in the context of law enforcement and other domains.

Emergencies invariably require crisis managers to field numerous calls for service such as 911, radio reports by police and fire teams, medical reporting, infrastructure support teams, etc. Technologies for Entity Resolution can be applied to handle the confusion by identifying geospatial, temporal, and content similarities in the messages, enabling first responders to resolve and identify the location and scope of multiple events. Entity Resolution also plays an important role in counter-terrorism, for example, in determining when two terrorist incidents are the same. This project will involve the research and development of Entity Resolution techniques to group incidents from the Internet Crime Complaint Center database (www.ic3.gov). REU summer interns will have an unparalleled opportunity to explore Entity Resolution algorithms and applications in the context of law enforcement and other domains.

Crowd simulations are a powerful tool for predicting behavior, designing buildings, and preparing for emergencies. While much is known about the individual behavior of people in a crowd, less is known about how people move when they are in groups. Such groups include families, coworkers, and friends—each of which have their own dynamics. This project takes seeks to create a data-driven model of social pedestrian behavior by using actual anonymous footage of pedestrians. Successful completion of this project will involve creative solutions to image processing, obstacle avoidance, and social behavior problems.

As the number of active shooter events continues to rise, many agencies are anxious to help protect citizens in a variety of setting. One way to prepare for active shooter events is through the creation of accurate, agent-based models of crowd behavior in a disaster scenario. Such models could be used to simulate various law enforcement tactics, building designs, and optimal crowd behaviors. Unfortunately, a highly accurate model does not exist. In this project we seek to build a high-fidelity model based on actual data from previous active shooter events. Using this model we will create real-time simulations of panic behavior to test various theories and proposals about active shooter prevention and mitigation.

Enumerative combinatorics is an area of combinatorics that deals with the number of ways certain patterns can be formed. Most students are familiar with two simple examples of enumerative combinatorics: counting combinations and counting permutations. Algebraic combinatorics employs methods of abstract algebra, such as group theory and representation theory, within combinatorial contexts, as well as applying combinatorial techniques to algebra. Dr. Nakamura is a leading researcher with the Command Control Interoperability Center for Advanced Data Analysis (CCICADA) who research interests include enumerative and algebraic combinatorics, both from a theoretical and applied perspective. Dr. Nakamura will work with a qualified REU student to identify a new problem in enumerative/algebraic combinatorics that will be of interest to both the student and the mentor. This project will appeal to those students interested and applying algorithmic and computational techniques in a research setting.

This project will develop methods to gauge public opinion on topics of current interest using Twitter messages and sentiment analysis algorithms [1]. Tweets matching selected topics will be collected and analyzed [2], and the end result will be visualized on a map using geo-tags contained in the Tweets. Topics could be related to politics (“Obama”, “Romney”), public policy (“deficit reduction”, “alternative energy”), or scientific exploration (“Mars rover”). Such tools help understand the mood in different parts of the US, and could also help understand shifts in sentiment over time.

Turney P, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” Proceedings of the Association for Computational Linguistics (2002), pp. 417–424.

Gansner E, Hu Y, and North S, “Visualizing Streaming Text Data with Dynamic Maps, to appear in Proceedings of the 20th International Symposium on Graph Drawing” (best paper award), (2012), TwitterScope website.

ASK-GraphView is a node-link-based graph visualization system that allows clustering and interactive navigation of large graphs [1]. The system constructs a hierarchy on an arbitrary, weighted, undirected input graph using a series of increasingly sophisticated clustering algorithms. Building on this earlier work, this project will address the problem of visualizing graphs that are being collected in a streaming fashion. A key challenge is to identify graph sub-structures and statistics on them that can be used as the basis to depict evolution of the graph through time [2]. Another goal is to devise a visualization algorithm that produces a layout that is stable over time, thus preserving a user’s mental map.

Abello J, Ham FV, Krishnan N, “AskGraphView: A Large Graph Visualization System”, IEEE Transactions in Visualization and Computer Graphics, 12(5) (2006), 669-676.

Hu Y, Kobourov S, and Veeramoni S, “Embedding, Clustering and Coloring for Dynamic Maps, Proceedings of IEEE Pacific Visualization Symposium, (2012).

This project will investigate algorithms for detecting anomalies in Internet traffic. Machine learning algorithms, in particular, implicit collaborative filtering [1], will be used to model normal Internet traffic records. Using this model, each machine is represented as a point in high dimension. Low-dimensional embedding algorithms [2] are then used to project points to 2 or 3D for visualization. Anomalies can be detected by visual inspection, as well as identified using clustering algorithms.

Hu YF, Koren Y, Volinsky C, “Collaborative Filtering for Implicit Feedback Datasets,” IEEE International Conference on Data Mining (2008).

Borg I and Groenen P, “Modern Multidimensional Scaling: theory and applications,” New York: Springer-Verlag (1997).

Many genetic processes are controlled by proteins that bind at separate, often widely spaced, sites on DNA and hold the intervening double helix in a loop. Our group has initiated a series of computer simulations of the formation of the DNA loops implicated in the Escherichia coli lac operon, the classic textbook example of genetic action at a distance [1]. We incorporate effects of naturally occurring species found in the cell, such as the non-specific architectural protein HU [2] and the Lac repressor protein, on the looping free energies. By representing the DNA as a sequence of base pairs, we capture the structural details of protein-bound sites and the natural deformability of free DNA [3-5]. This approach allows for departures from conventional treatments of DNA, such as ideal elastic models, which may be divorced from reality, and leads to results that can be directly linked to genetic processing at the molecular level. We relate the computed looping propensities to DNA looping propensitieis derived from gene repression and single-molecule studies [6,7]. We are also collecting the simulated structures in a database of three-dimensional loop models that can be used as starting points in the determination of structures consistent with NMR and other local spectroscopic measurements and subsequently refined with these high-resolution data. The approach is general and can be applied to any known protein-mediated DNA looping systems as well as to the design of novel DNA constructs, such as the three-dimensional origami formed by looping DNA between carefully spaced four-way junctions [8].

Müller-Hill B (1996) The lac Operon. Walter de Gruyter, Berlin.

Swinger KK, Lemberg KM, Zhang Y & Rice PA (2003) Flexible DNA bending in HU-DNA cocrystal structures. EMBO J 22, 3749-3760.

Olson WK, Gorin AA, Lu X-J, Hock LM & Zhurkin VB (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci USA 95, 11163-11168.

Swigon D, Coleman, BD & Olson WK (2006) Modeling the Lac repressor-operator assembly: the influence of DNA looping on Lac repressor conformation. Proc Natl Acad Sci USA 103, 9879-9884.

Czapla L, Swigon D & Olson WK (2008) Effects of the nucleoid protein HU on the structure, flexibility, and ring-closure properties of DNA deduced from Monte-Carlo simulations. J Mol Biol 382, 353-370.

Becker NA, Kahn JD & Maher 3rd LJ (2005) Bacterial repression loops require enhanced DNA flexibility. J Mol Biol 349, 716-730.

Han L, Garcia HG, Blumberg S, Towles KB, Beausang JF, et al. (2009) Concentration and length dependence of DNA looping in transcriptional regulation. PLoS ONE 4, e5621.

Han D, Pal S, Nangreave J, Deng Z, Liu Y & Yan H (2011) DNA origami with complex curvatures in three-dimensional space. Science 332, 342-346.

Dr. Bahman Kalantari is a leading researcher with the Department of Computer Science who was instrumental in developing the field of polynomiography. Polynomiography is a mathematically-inspired computer medium based on algorithmic visualizations of one of the most basic and fundamental tasks in science and mathematics: solving a polynomial equation. Polynomiography serves as a powerful medium for creativity, discovery, and learning, with numerous applications in education, math, science, art and design. Dr. Kalantari will work with a qualified REU student to identify a new problem from polynomiography that will be of interest to both the student and the mentor. In particular, some specific problems will be described from computational geometry, numerical analysis, discrete math, education, algorithmic mathematical art, fine art, and games.

http://www.cs.rutgers.edu/~kalantar/

The Triangle Algorithm is a newly developed algorithm for testing if the convex hull of a finite set of points contains a distinguished point. It has potential applications in computational geometry, linear programming, convex programming and combinatorial optimization. It also gives rise to new iterative methods for solving a linear system of equations. The goal of this research is to explore some of these possibilities both from the computational points of view as well as theoretical ones. Related articles can be found online: http://www.cs.rutgers.edu/~kalantar/

A Characterization Theorem and An Algorithm for A Convex Hull Problem, B. Kalantari http://arxiv.org/pdf/1204.1873.pdf

Solving Linear System of Equations Via A Convex Hull Algorithm, B. Kalantari http://arxiv.org/abs/1210.785

Finding a Lost Treasure in Convex Hull of Points From Known Distances, B. Kalantari http://2012.cccg.ca/papers/paper20.pdf

Dr. Fiorini is the associate director and research professor with the Center for Discrete Mathematics and Theoretical Computer Science who works in the areas of graph theory, combinatorics, and extremal graph theory. Dr. Fiorini will work with a qualified REU student to identify a new problem on graph theory related to an area of graph theory called competition graphs. The notion of competition graphs were introduced by the mathematician Joel Cohen in 1968 when he asked: “Can we assign to each species an ecological niche so that competition between two species corresponds to overlap of their ecological niches?” The research problem in competition graphs will be of interest to both the student and the mentor. This project will appeal to those students who are interested in graph theory and combinatorics.

We are developing novel computational techniques for inferring parameters of probabilistic models for sequential data, such as time-series or biological sequence data. The specific summer project is to show, either theoretically through formal proofs or empirically by writing simple computer programs, whether certain functions of the data are sufficient to recover the parameters of the model. Our specific biologic application is to segment the human genome into distinct domains based on epigenetic marks. This project is most suitable for student who is proficient in linear algebra and multivariable calculus and has basic programming ability (e.g. in Matlab). An interest in statistics / machine learning or biology would be very welcome but not necessary to do the project.

1) Section 4 in Identifiability and unmixing of latent parse trees. Daniel Hsu, Sham M. Kakade, and Percy Liang. NIPS 2012.

2) cs.stanford.edu/~pliang/papers/identifiability-nips2012-poster.pdf

Dr. Hsu holds the Clavius Distinguished Professor of Science Chair and is the director of the Laboratory of Informatics and Data Mining at Fordham University. He is currently a visiting research professor with the Center for Discrete Mathematics and Theoretical Computer Science and has interests in the areas of combinatorics, algorithms, and optimization. An information fusion paradigm, combinatorial fusion analysis (CFA) he and colleagues developed, has applications in several domains including information retrieval, target tracking, cyber security, virtual screening, DNA sequencing informatics, bioinformatics, cognitive neuroscience, and portfolio management. Dr. Hsu will work with a qualified REU student to identify a new problem in an area related to his research interests. The research problem will be of interest to both the student and the mentor.

Dr. Vanderbei is a distinguished researcher and professor with the Department of Operations Research and Financial Engineering at Princeton University. His current research interests include topics in applied mathematics, astrophysics, computer science, and statistics. He is currently working on a problem in applied and computational mathematics investigating mass ratios that lead to stable ring systems versus unstable ones. He would be happy to advise a summer REU student who is interested in researching new stable periodic solutions to the n-body problem. Dr. Vanderbei will work with a qualified REU student to identify a new problem in an area related to his research interests. The research problem will be of interest to both the student and the mentor.

This project will consist of a timely research thrust on either security and privacy or social comuting on mobile phones. The student will have access to one of the latest Android mobile phones and the actual project will be developed together depending on interests.

** * You must be a U.S. Citizen or Permanent Resident to be eligible for this project. **

Social media has many emerging outlets of communication that did not exist even ten years ago, including Twitter, Facebook, micro-blogs, etc. Social media and text messaging can be utilized for quick communication when time is of the essence, particularly during disasters. During a disaster type situation, could the way that social media and text messaging communication is currently done be improved? In particular, on the public side, is there a way to improve the wording of messages to aid in the automation of classification? Should a supervised or unsupervised approach be taken? A supervised approach uses a labeled training set to build a model, while an unsupervised approach does not. A project of joint interest to the mentor and to the student will be identified. Some background in programming is encouraged.

The Yang-Mills equation is an equation for a "connection" which comes up in physics; a special case is Maxwell's equations for electromagnetic fields. The Yang-Mills "heat flow" is a method for constructing solutions to the Yang-Mills equations. This project will focus on variations on the problem of the Yang-Mills heat flow, such as on spaces with cylindrical ends, with fields etc. Interested students should have at least some analysis and an interest in physics.