Decomposition of Multidimensional Microarray Datasets

Student:Courtney Ward
School: California State University, Chico
E-mail:court9scorpio@yahoo.com
Research Area: Biomedical Engineering
Project Name: Decomposition of Multidimensional Microarray Datasets
Mentor: Stanley Dunn, Rutgers University

About Me

Hi! I am a Mathematics major at California State University, Chico, which is and 11 hour drive from my hometown in Holtville, California. I will be receiving my B.S. after the Fall semester. My REU this summer at Rutgers University has helped me decide that I would like to apply for grad school and study Environmental Engineering.


Project Description

The accomplishments of modern biology, highlighted by completion of sequencing of the Human Genome, are remarkable and have the potential to lead to unprecedented advances in biotechnology and in health care. The molecular "parts list" of cells and tissues is growing at an accelerating pace, and a major issue limiting the limiting the application of this data is the ability to integrate information on the parts into an understanding of the whole. These needs give rise to a set of activities that we term Molecular Systems Bioengineering (MSB). MSB represents an engineering approach to the understanding and control of biological processes. It encompasses high-throughput microarray data acquisition techniques on genomes, proteomes and other molecular catalogs, and it involves the use of both data-driven and principles-driven modeling techniques to create an understanding of biological phenotype based on a combination of molecular catalogs and environmental conditions. The goal of this research is to investigate methods for analyzing DNA microarrays based on tensor and multidimensional decomposition methods.

DNA microarrays are a technology used to quantify gene expression. There are many mathematical techniques used to analyze microarray data, with Singular Value Decomposition (SVD) and Principle Components Analysis (PCA) the most common. In our research, we consider modeling the expression data as a tensor, with the raw red and green channels as the third factor (with gene classes and experimental conditions being the others). In this preliminary study, we use the Tucker-n decomposition for decomposing the tensor to elucidate the three factors that give rise to the observed microarray data. In a preliminary study of the yeast cell cycle, the results suggest that tensor decomposition can cluster similar genes by cycle with further studies necessitated. In our second experiment, we considered the mean mRNA decay half-life by gene functional class on the genome of different strains of Escherichia coli. Our promising results show that the Tucker-n decomposition can accurately cluster genes by function. Finally, we will consider global analysis of mRNA abundance in E. coli at single-gene resolution using two-color fluorescent DNA microarrays with the Tucker-n decomposition to ensure that tensor decomposition yields the same gene clusters as the traditional preprocessing methods with less computation and fewer assumptions about the data.