Proposed projects for the 2013 DIMACS and DIMACS/DIMATIA REU Programs

Please revisit this page frequently, additional projects will be posted through January.


Project #: DC2013-01

Mentor: Alexei Vazquez, vazquez at ias dot edu; CINJ

Project Title: Optimization of Personalized Therapy for Cancer Treatment

Years of experience in cancer therapy have taught us that using combinations of drugs for personalized therapy leads to better response rates [1,2]. However, designing personalized therapies is challenging because of the many drugs for cancer treatment and the many tumor properties that can inform treatment [2]. We formulate personalized combinatorial therapy as a constrained optimization problem, where markers (tumor properties) and treatment decision rules are assigned to drugs to achieve a high treatment response rate while using drug combinations of minimal size. The goal of this project is to design heuristic algorithms to solve this optimization problem.

Requirements: Background in Statistics and SQL is desired. Participants are expected to develop scripts to statistically analyze a variety of data sets.

References:
1. T. Harris, "Gene and Drug Matrix for Personalized Cancer Therapy," Nat. Rev. Drug Discov., 9(660)(2010).

2. A. Vazquez, "Optimal Drug Combinations and Minimal Hitting Sets," BMC Syst Biol., 3(81)(2009).

* You must be a enrolled at a U.S. university to be eligible for this project.


Project #: DC2013-02

Mentor: Janne Lindqvist, janne at winlab dot rutgers dot edu; WINLAB

Project Title: Security and Privacy or Social Computing on Mobile Phones

This project will consist of a timely research thrust on either security & privacy or social computing on mobile phones. The student will have access to one of the latest Android mobile phones and the actual project will be developed together depending on interests.

Requirements: Programming experience, preferably Java, Android programming is a definite plus, experience with Eclipse is a plus, but not required.

* You must be a enrolled at a U.S. university to be eligible for this project.


Project #: DC2013-03

Mentor: Robert DeMarco, rvdemarco at gmail dot com; CCICADA

Project Title: Problems in (Random) Graph Theory

Dr. DeMarco is a leading researcher with the Command Control and Interoperability Center for Advanced Data Analysis, a DHS center of excellence associated with DIMACS. His research is based in the areas of random graph theory and extremal combinatorics. Extremal combinatorics is an area of mathematics that studies how large or small a collection of finite objects (graphs, vectors, sets, etc.) can be, if it is to satisfy given restrictions. Dr. DeMarco will work with a qualified student to identify a new problem related to his research that will be of interest to both the student and the mentor. This project will appeal to those students who are interested in graph theory and combinatorics.

* You must be a enrolled at a U.S. university to be eligible for this project.


Project #: DC2013-04

Mentor: Kevin Chen, kcchen at Biology dot rutgers dot edu; Biology

Project Title: Combinatorial algorithms for genome assembly

Genome assembly is the problem of reconstructing a full linear genome sequence (up to 3 billion bases long) from short (~50-100 bases), overlapping sequences called “reads” sampled from the genome. This is a classic problem in bioinformatics whose importance has escalated with the emergence of next generation sequencing technologies that have dramatically reduced the cost of DNA sequencing. In most genome assembly algorithms, the fundamental mathematical object is the de Bruijn graph, which represents all the sequence overlaps between the reads. The genome sequence is then computed as an Eulerian path through the de Bruijn graph [1]. The project will be to efficiently implement a basic de Bruijn graph-based assembler along with subroutines for sequencing error correction and graph simplification.

References:
1. Pevzner PA, Tang H, and Waterman MS, “An Eulerian Path Approach to DNA Fragment Assembly,” Proc Natl Acad Sci USA 98(17) (2001), 9748-9753.


* You must be a enrolled at a U.S. university to be eligible for this project.



Project #: DC2013-05


Mentor:
James Abello
, abello at dimacs dot rutgers dot edu; DIMACS


Project Title:
Measuring the Mood of the Nation via Twitter


This project will develop methods to gauge public opinion on topics of current interest using Twitter messages and sentiment analysis algorithms [1]. Tweets matching selected topics will be collected and analyzed [2], and the end result will be visualized on a map using geo-tags contained in the Tweets. Topics could be related to politics (“Obama”, “Romney”), public policy (“deficit reduction”, “alternative energy”), or scientific exploration (“Mars rover”). Such tools help understand the mood in different parts of the US, and could also help understand shifts in sentiment over time.

References:

1. Turney P, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” Proceedings of the Association for Computational Linguistics (2002), pp. 417–424.


2. Gansner E, Hu Y, and North S, “Visualizing Streaming Text Data with Dynamic Maps, to appear in Proceedings of the 20th International Symposium on Graph Drawing” (best paper award), (2012), TwitterScope website.

* You must be a enrolled at a U.S. university to be eligible for this project.


Project #: DC2013-06


Mentor:
James Abello, abello at dimacs dot rutgers dot edu; DIMACS


Project Title:
Visualization of Time Varying Graphs


ASK-GraphView is a node-link-based graph visualization system that allows clustering and interactive navigation of large graphs [1]. The system constructs a hierarchy on an arbitrary, weighted, undirected input graph using a series of increasingly sophisticated clustering algorithms. Building on this earlier work, t
his project will address the problem of visualizing graphs that are being collected in a streaming fashion. A key challenge is to identify graph sub-structures and statistics on them that can be used as the basis to depict evolution of the graph through time [2]. Another goal is to devise a visualization algorithm that produces a layout that is stable over time, thus preserving a user’s mental map.


References:

1. Abello J, Ham FV, Krishnan N, “AskGraphView: A Large Graph Visualization System”, IEEE Transactions in Visualization and Computer Graphics, 12(5) (2006), 669-676.


2. Hu Y, Kobourov S, and Veeramoni S, “Embedding, Clustering and Coloring for Dynamic Maps, Proceedings of IEEE Pacific Visualization Symposium, (2012).


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


Project #: DC2013-07


Mentor:
James Abello, abello at dimacs dot rutgers dot edu; DIMACS


Project Title:
Detecting anomalies in Internet traffic


This project will investigate algorithms for detecting anomalies in Internet traffic. Machine learning algorithms, in particular, implicit collaborative filtering [1], will be used to model normal Internet traffic records. Using this model, each machine is represented as a point in high dimension. Low-dimensional embedding algorithms [2] are then used to project points to 2 or 3D for visualization. Anomalies can be detected by visual inspection, as well as identified using clustering algorithms.


References:

1. Hu YF, Koren Y, Volinsky C, “Collaborative Filtering for Implicit Feedback Datasets,” IEEE International Conference on Data Mining (2008).


2. Borg I and Groenen P, “Modern Multidimensional Scaling: theory and applications,” New York: Springer-Verlag (1997).


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


 Project #: DC2013-08


Mentor: Alexander Schliep
, schliep at cs dot rutgers dot edu; CS & BioMaPs Institute


Project Title:
Algorithm Animation for Bioinformatics


Gato (http://gato.sf.net), an open source software system written in Python, animates graph algorithms to show their dynamic behavior [HS09, SH02]. A wide range of bioinformatics algorithms – such as aligning two sequences to assess their similarity, assembling complete genomes from fragments produced by sequencing, answering questions about the origin of species by constructing phylogenetic trees – can either be phrased as graph algorithms or can operate on graphs. Developing animations for selected algorithms will introduce participants to algorithmic aspects of bioinformatics and provide deep insights into the workings of selected algorithms.


References:

1. Hochstättler, W and Schliep, A, “CATBox: An Interactive Course in Combinatorial Optimization”, Springer Verlag 2009.


2. Schliep A and Hochstättler W, “Developing Gato and CATBox with Python: Teaching Graph Algorithms through Visualization and Experimentation,” Multimedia Tools for Communicating Mathematics, Springer-Verlag (2002), 291-310.


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


Project #: DC2013-09


Mentor:
Wilma Olson, Wilma dot olson at rutgers dot edu; Chemistry & Chemical Biology

Project Title: Effects of architectural and regulatory proteins on the spatial organization and expression of bacterial genes

Many genetic processes are controlled by proteins that bind at separate, often widely spaced, sites on DNA and hold the intervening double helix in a loop. Our group has initiated a series of computer simulations of the formation of the DNA loops implicated in the Escherichia coli lac operon, the classic textbook example of genetic action at a distance [1]. We incorporate effects of naturally occurring species found in the cell, such as the non-specific architectural protein HU  [2] and the Lac repressor protein, on the looping free energies. By representing the DNA as a sequence of base pairs, we capture the structural details of protein-bound sites and the natural deformability of free DNA [3-5]. This approach allows for departures from conventional treatments of DNA, such as ideal elastic models, which may be divorced from reality, and leads to results that can be directly linked to genetic processing at the molecular level. We relate the computed looping propensities to DNA looping propensitieis derived from gene repression and single-molecule studies [6,7]. We are also collecting the simulated structures in a database of three-dimensional loop models that can be used as starting points in the determination of structures consistent with NMR and other local spectroscopic measurements and subsequently refined with these high-resolution data. The approach is general and can be applied to any known protein-mediated DNA looping systems as well as to the design of novel DNA constructs, such as the three-dimensional origami formed by looping DNA between carefully spaced four-way junctions [8].

References:

1. Müller-Hill B (1996) The lac Operon. Walter de Gruyter, Berlin.


2. Swinger KK, Lemberg KM, Zhang Y & Rice PA (2003) Flexible DNA bending in HU-DNA cocrystal structures. EMBO J 22, 3749-3760.


3. Olson WK, Gorin AA, Lu X-J, Hock LM & Zhurkin VB (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes.
Proc Natl Acad Sci USA
95, 11163-11168.


4. Swigon D, Coleman, BD & Olson WK (2006) Modeling the Lac repressor-operator assembly: the influence of DNA looping on Lac repressor conformation.
Proc Natl Acad Sci USA 103, 9879-9884.


5. Czapla L, Swigon D & Olson WK (2008) Effects of the nucleoid protein HU on the structure, flexibility, and ring-closure properties of DNA deduced from Monte-Carlo simulations. J Mol Biol 382, 353-370.


6. Becker NA, Kahn JD & Maher 3rd LJ (2005) Bacterial repression loops require enhanced DNA flexibility. J Mol Biol 349, 716-730.


7. Han L, Garcia HG, Blumberg S, Towles KB, Beausang JF, et al. (2009) Concentration and length dependence of DNA looping in transcriptional regulation.
PLoS ONE
4, e5621.


8. Han D, Pal S, Nangreave J, Deng Z, Liu Y & Yan H (2011) DNA origami with complex curvatures in three-dimensional space. Science 332, 342-346.

 

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


 Project #: DC2013-10


Mentor: William Pottenger
, billp at dimacs dot rutgers dot edu; DIMACS


Project Title:
Higher Order Learning


In our work we are developing a framework termed Higher Order Learning that builds models by leveraging associations in the form of latent relationships in an entity-relation graph. A general multi-attribute entity-relation graph has nodes that are people, places and things, etc. Higher Order Learning violates the underlying assumption made in traditional machine learning algorithms that records are independent and identically distributed [1]. Although this assumption simplifies the underlying mathematics of statistical models and of the corresponding parameter estimation procedures, it actually does not hold for many real world applications [2]. Machine learning methods based on Higher Order Learning leverage relations between objects and features in an entity-relation graph and in doing so, operate on a much richer data representation compared to the traditional feature vector form.

 

In this research, Higher Order Learning techniques have been investigated and applied to a number of interesting datasets including Border Gateway Protocol (cybersecurity), E-Commerce (web mining), and Nuclear Detection (homeland security). The approach has been shown to outperform the popular Support Vector Machine algorithm on benchmark textual data sets as well as real-world streaming data [3]. REU summer interns will have an unparalleled opportunity to explore Higher Order Learning algorithms and applications in the context of law enforcement and other domains.

 

References:

1. Getoor L, Friedman N, Koller D, and Taskar B, “Learning Probabilistic Models of Link Structure,” Journal of Machine Learning Rsearch (JMLR) (2002).


2. Getoor L and Diehl CP, “Link Mining: A Survey,” SIGKDD Explorations 7(2) (2005).


3. Ganiz, M C, Lytkin, N I and Pottenger, W M, “
Leveraging Higher Order Dependencies Between Features for Text Classification,” Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bled, Slovenia, September(2009) .

 

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


 Project #: DC2013-11


Mentor: William Pottenger
, billp at dimacs dot rutgers dot edu; DIMACS


Project Title:
Entity Resolution


Emergencies invariably require crisis managers to field numerous calls for service such as 911, radio reports by police and fire teams, medical reporting, infrastructure support teams, etc. Technologies for Entity Resolution can be applied to handle the confusion by identifying geospatial, temporal, and content similarities in the messages, enabling first responders to resolve and identify the location and scope of multiple events. Entity Resolution also plays an important role in counter-terrorism, for example, in determining when two terrorist incidents are the same. This project will involve the research and development of Entity Resolution techniques to group incidents from the Internet Crime Complaint Center database (www.ic3.gov). REU summer interns will have an unparalleled opportunity to explore Entity Resolution algorithms and applications in the context of law enforcement and other domains.


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


Project #: DC2013-12


Mentor: Bahman Kalantari
, kalantar at cs dot rutgers dot edu; Computer Science


Project Title:
Polynomiography


Dr. Bahman Kalantari is a leading researcher with the Department of Computer Science who was instrumental in developing the field of polynomiography. Polynomiography is a mathematically-inspired computer medium based on algorithmic visualizations of one of the most basic and fundamental tasks in science and mathematics: solving a polynomial equation. Polynomiography serves as a powerful medium for creativity, discovery, and learning, with numerous applications in education, math, science, art and design. Dr. Kalantari will work with a qualified REU student to identify a new problem from polynomiography that will be of interest to both the student and the mentor. In particular, some specific problems will be described from computational geometry, numerical analysis, discrete math, education, algorithmic mathematical art, fine art, and games.


http://www.cs.rutgers.edu/~kalantar/


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.

 


Project #: DC2013-13


Mentor: Bahman Kalantari
, kalantar at cs dot rutgers dot edu; Computer Science


Project Title:
The Triangle Algorithm and Applications in Linear Programming, Combinatorial Optimization, and Convex Programming


The Triangle Algorithm is a newly developed algorithm for testing if the convex hull of a finite set of points contains a distinguished point.  It has potential applications in computational geometry, linear programming, convex programming and combinatorial optimization.  It also gives rise to new iterative methods for solving a linear system of equations.  The goal of this research is to explore some of these possibilities both from the computational points of view as well as theoretical ones.  Related articles can be found online: 
http://www.cs.rutgers.edu/~kalantar/

 

References:

1. A Characterization Theorem and An Algorithm for A Convex Hull Problem, B. Kalantari http://arxiv.org/pdf/1204.1873.pdf


2. Solving Linear System of Equations Via A Convex Hull Algorithm, B. Kalantari http://arxiv.org/abs/1210.7858,  and


3. Finding a Lost Treasure in Convex Hull of Points From Known Distances, B. Kalantari http://2012.cccg.ca/papers/paper20.pdf

 

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-14


Mentor: Eugene Fiorini, fiorini at dimacs dot rutgers dot edu; DIMACS


Project Title:
Problems in (Competition) Graph Theory


Dr. Fiorini is the associate director and research professor with the Center for Discrete Mathematics and Theoretical Computer Science who works in the areas of graph theory, combinatorics, and extremal graph theory. Dr. Fiorini will work with a qualified REU student to identify a new problem on graph theory related to an area of graph theory called competition graphs. The notion of competition graphs were introduced by the mathematician Joel Cohen in 1968 when he asked: “
Can we assign to each species an ecological niche so that competition between two species corresponds to overlap of their ecological niches?” The research problem in competition graphs will be of interest to both the student and the mentor. This project will appeal to those students who are interested in graph theory and combinatorics.


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-15


Mentor: Robert Vanderbei, rvdb at Princeton dot edu; Operations Research and Financial Engineering, Princeton University


Project Title: Climate Change Analysis


Dr. Robert Vanderbei is a leading researcher in operations research and financial engineering. Dr. Vanderbei is also an associated faculty member with the Astrophysics, Computer Science, Mathematics, and Mechanical and Aerospace Engineering departments at Princeton Universtiy. He would be happy to advise a summer REU student who is interested in refining the climate change analyses described in a "weather module" that is being developed for the Mathematics of Planet Earth 2013 project. In particular, it would be interesting to make a simplified version of the world climate change map in which each location is analyzed not by solving a daily model with sinusoidal seasonal fluctuations but rather a simpler model that uses annual averages for the input to a linear regression model.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-16


Mentor: Kevin Chen, kcchen at Biology dot rutgers dot edu; Biology


Project Title: Climate Change Analysis

Identifying differentially expressed genes between samples is a biologically important problem. Recent next-generation sequencing technologies such as RNA-seq have been greatly utilized for differential expression analysis. From RNA-seq, overlapping gene transcript fragments (~50-100 nucleotides) are generated. By assembling the fragments and counting the number of the fragments for a certain gene, the gene transcript abundance can be estimated. Cufflinks, an algorithm that computes transcript abundance of gene isoforms from RNA-seq data, implements a proof of a combinatorial result called Dilworth's theorem [1]. The goal of this project will be to learn the mathematical model behind Cufflinks and apply it to non-coding RNAs, as opposed to protein coding genes.

References:

1. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ & Pachter L, "Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoforms switching during cell differentiation," Nature Biotechnology 28(5) (2010), 511-515.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-17


Mentor: Kevin Chen, kcchen at Biology dot rutgers dot edu; Biology


Project Title: Statistical methods for microRNA motif finding

MicroRNAs (miRNAs) are a special class of small non-coding RNAs that play an important role in gene regulation.  They are distinguished from other small RNAs in that they are processed from stem-loop structures to form  ~22nt long, single-stranded RNAs.  MiRNAs can silence genes by either degrading messenger RNAs, or simply keeping them from being translated into protein. Computational methods for discovering miRNA motifs and predicting their target genes is of clear biological importance. This project will center around applying new statistical methods to the problem and comparing them to established methods based on linear regression [1].  A background in statistics, especially linear models, and programming ability is preferred.

References:

1. H. J. Bussemaker, H. Li, and E. D. Siggia (2001) Regulatory element detection using correlation with expression Nature Genetics 27: 167-171.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-18


Mentor: Lazaros Gallos, k lazaros dot gallos at Rutgers dot edu; Ecology, Evolution, and Natural Resources


Project Title: The complex networks of Tolkien, Rowling, and other great story-tellers

Many successful books and book series describe long stories that include a multitude of characters. The evolution of a story creates a timeline of events where book characters interact with each other. This timeline can then be translated into a dynamically evolving complex network. The analysis of complex networks is a rapidly expanding field and is ideally suited to extract information from complex interactions. In this case, the comparative analysis of the character affiliation networks in different books can be used to detect possible patterns that are either characteristic of a genre, a time-period for the genre, etc. Similarities and differences in story-building of different authors within and across genres are of particular importance and may hint to different writing styles.

The goal of the project is to introduce one or two students to the fundamental concepts of complex network theory and guide the students through the whole process of acquiring the data, apply the learned concepts to the data analysis, and decide on the important findings that should be highlighted in a research paper. This network-based project should demonstrate the power of applying complex network theory to a variety of disciplines.

The list of books can include, among hundreds of others, the Lord of the Rings, the Harry Potter series, the Gunslinger series, War and Peace, Les Miserables, the Iliad, etc. Simple data-mining techniques should be used to extract the interactions from the electronic text of the books and reading these books for the project should not be necessary.

Requirements: A good knowledge of programming (preferably in C/C++) is preferred and some experience in a Linux-based environment may also be helpful.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



Project #: DC2013-19

Mentor: Yana Bromberg, yanab at rci dot Rutgers dot edu, Biochemistry and Microbiology

Project Title: Metagenome analysis
Our genomes define us as unique representatives of a very unique species. Many organisms exist, however, that lack this sense of self-identity. In fact, they are often unable to live without their environmental-niche "neighbors" . Taken together, the micro-organism population of a given niche is termed the microbiome. The DNA sequence of all microbiome members is the metagenome. Interpreting the molecular functions encoded in a given metagenome can help in biofuels, biopesticides, and antibiotic production. In this project we will try to identify the specific molecular functions and biological processes that take place within a given microbiome by looking at its metagenomic data. We will also try to define the minimal set of functions necessary for a given microbiome's existence in the specific environmental niche it occupies.

* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.


Project #: DC2013-20

Mentor: Nina Feferman, lazaros dot gallos at Rutgers dot edu, Ecology, Evolution, and Natural Resources

Assistant Graduate Mentor: Orin Robinson

Project Title: Game Theory of Mate Selection
This project will use both agent based simulation and game theory to determine threshold effects in evolutionary fitness for individuals hoping to attract the best possible mate while hoping to avoid attracting competitors for those mates. Some previous experience in programming will be helpful, but not necessary.


* You must be a U.S. Citizen or Permanent Resident to be eligible for this project.



REU Home Page
DIMACS Contact Information
Page last modified on January 3, 2013.