DIMACS
DIMACS REU 2018

General Information

me
Student: Erin Dahl
Office: CoRE 434
School: Pacific University
E-mail: erind@pacificu.edu
Project: Data-driven precision medicine approaches to cancer

Project Description

One in four is expected to develop cancer in their lifetime. No two patients are alike, and we need better strategies to detect cancer early, find right treatment for the patients, and monitor their outcome using a personalized medicine approach. Data science approaches using state-of-the-art computational algorithms and statistical frameworks can identify subtle patterns sifting through large amount of information in patient profiles, which can help with early detection and personalized treatment, improving their outcome. Within the scope of the Precision Medicine Initiative at the Rutgers Cancer Institute, for each patient we have troves of data on genetic mutations, and clinical and pathological reports. We are developing and implementing data science techniques to ask some fundamental questions: (i) can we distinguish aggressive tumor and treat patients accordingly? (ii) can we identify emergence of resistance very early and accordingly change the treatment strategy? The project will aim to develop data science frameworks to perform integrative analysis of cancer genomics and clinical data.


Weekly Log

Week 1:

This week, I am working to explore single cell genomics data. I have been working on R tutorials for Seurat, which is a R toolkit for single cell genomics, and using datasets, to configure useful figures to help explain and identify the heterogeneity in the cells.

Week 2:

After reading some papers on single cell genomics analysis relating to tumor composition, I learned how gene expression differences can largely influence the effectiveness of treatments. In one case, a mathematical model has been developed and used to study phenotypic switching. Results indicate an optimal treatment solution that would minimize drug resistance is a combination of both an epigenetic and anti-cancer treatment at the same time. I am currently writing an R script for a small dataset of pre and post treatment gene expression, to identify how gene expression has changed.

Week 3:

I have finished my script for the small data analysis. For my project, I am developing an additional R script for a very large dataset of pre and post treatment gene expression. The will allow me to identify how the cell expression and cell populations have changed and look at specific genes that are indicators of drug resistance. I am also exploring tutorials using CONICS and HoneyBadger programs in R that look at copy number variants on the chromosomes. A copy number variant is a type of structural variation that occurs when a DNA segment is present in variable copy numbers compared to a referenced genome. This can mean that a gene is missing, or there are more genes present, and this can influence gene expression and associate with specific phenotypes.

Week 4:

To start the week off, we went to Bell Laboratories and attended some presentations on some of the recent research and technology being constructed and explored right now. We also went inside the anechoic chamber, which was at one point the world's quietest room. As for my research, I encountered some issues with HoneyBadger and CONICS programs, and worked to contact developers. In the mean time, I was able to run a Seurat Single Cell Genomics analysis on some tumor dormancy data, and generate some figures that compare tumor cells to other cells, to determine the population of cells that are likely dormant tumor cells. From this population, the next step is to look at the copy numbers of these populations and compare to known regular values.

Week 5:

This week I continued to reach out to developers, and also tried to run the HoneyBadger Copy Number Analysis on the Dormant tumor data. I ran into some issues, using data that was incompatible with my data for tumor cells, and had to find reference data that could work with the program. I am using reference TPM data from GTEx, which has RNA sequencing datasets, and have run my analysis with log transformed TPM averages. I generated some figures showing amplifications and deletions of DNA on chromosomes, and found them to be inconclusive because there was no variation between different chromosomes, which is expected. This means I need to reassess the data and make sure my script is processing everything correctly.

Week 6:

While working to determine how to filter and organize the data, I also looked into CNVs (Copy Number Variations) between the two identified clusters that I found in my TSNE plot of the disseminated liver cells (dormant). Both cell clusters produced highly similar CNV heatmaps, and again this brought me back to the problem of needing to find a better filtering and data processing method for the gene expression data. I am working on running each line within functions, to better understand how I can improve the result of the processes.

Week 7:

As this REU is coming to an end, I prepared my final powerpoint presentation for this week. I aimed to communicate with my audience what the relevance of my project is, what type of data I am using, and what my analyses have resulted in. I have attached my final presentation at the bottom of this page. For the remaining time, I plan on continuing to rerun the CNV analysis with different filtering and such parameters, to see if these results are consistent with the original results, indicating no significant differences in CNVs or if there is in fact noticeable CNVs.

Week 8:

This week I worked on my written research report. After looking into different filtering and rerunning the CNV analysis, I was consistently seeing similar results as before. I was not able to specifically identify distinguishable CNVs as I had hoped, and more research is necessary to understand what how to identify composition of dormant and tumor cells.

Week 9:

The past nine weeks have flown by so quickly! I am finishing the last few details on my research paper, and am getting ready to fly back to the West Coast! I have had such an amazing experience and made some wonderful connections with my mentor, DIMACS staff, and other REU students. I am so thankful for this opportunity and great program!



Presentations


Additional Information

Thank you to my mentor Dr. Subhajyoti De. Thank you NSF for support from the grant CCF-1852215