|Emails:||timothy.hedspeth (at) rutgers.edu and hedspetht (at) emmanuel.edu|
|Home Institution:||Emmanuel College, Boston MA|
|Project:||Deep genomic analysis of tumor specimens|
Cancer is a disease that touches everyone in one way or another. Cancer is very complex biologically cancerous cell population are very diverse containing many cells of different types. This is referred to as heterogeneity, the state of being diverse. Recent biological advances have lead to the ability to sequence single cells. These advances allow for us to better understand how tumor cell populations are organized.
The goal of the summer research project is to explore heterogeneity of cancer cell populations. Samples of cells from patients with cancer were taken and single cell sequencing was used to generate data about cancer cells. The goal of this project is to explore the data that is generated from this sequencing. We hope to find significant relationships between mutations and their expression that help us better understand cancer.
The program started this week, and I am so excited to start in on this project. I started the week off by reading a paper about advances in the field. The paper is A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing and was written by Petti, A et. al. The researchers that wrote the paper were able to find significant relationships in mutated cells and their expression. It was a fascinating read, and is linked below, along with papers I read prior about advances in single cell sequencing!
On top of getting to read the amazing paper mentioned in the preceding paragraph, I also got my first data set! I have been trying to work with it in R studio but its been going a little slow. I am trying my best to work through this issue and maybe explore a faster program, but it may be hard given the size of the data set. Regardless of this struggle I am going to find a solution next week to start exploring the data.
This week I was able to present the motivations of my project to the other students in the REU, it was very cool to see everyone's projects and plans for the summer. For my work this week I spent a lot of time in Rstudio. This time was spent looking at my first data set, and seeing what questions arose while looking at the data set. I'm excited to see where these questions take me next week.
This week was hectic, I was able to dive deeper into the data and start exploring the relationship that between depth of sequencing and the base pair positions. Though I am having some issues getting my data to cooperate in Studio, the internet has been very helpful in correcting these mistakes. Though its frustrating to get errors when I run my code, I am getting practical experience learning to overcome buggy code. I wish I could be doing the data science bootcamp but I am already so swamped with my analysis and the errors in R.
This week I started looking at a problem in probability, I am trying to figure out how to find out if a coin is fair depending upon the number of trials, this applies to heterozygous mutation data. I did a lot of work on this problem, so I am hoping it comes out okay. Other than that I am still working on the data for TP53 and looking at exon regions and their expression, and their coverage. Some interesting comparisons appear when we look at exon vs noncoding regions.
I was able to learn more about probability theory, more specifically the maximum likelihood estimator and binomial confidence intervals from a great resource. This was able to lead to me being able to answer the question I got last week, I will provide a link to the resource I learned about these confidence intervals from below. I will now apply this knowledge to the data set that I have and find an efficient way to apply this formula to all of the observations.
I am excited that my initial analysis has wrapped on TP53 and I am going to get all of the data soon, and hopefully I can use my code and quickly apply it to many different genes. I am so excited, I am excited to see what these data sets have in store for me. My mentor and I are also exploring how to predict if a cell will have at least 1 mutation present, it is so cool to look at how to explore building a probability model.
I was able to get all of the data and I have generalized my code from the analysis I have been conducting on TP53. This has allowed for me to conduct analysis on many different genes in the data set. It has been really cool to see data on depth of reads for more genes. I am super excited to compare the data from these genes to see if any patterns arise.
This week I was able to run my code on 4 other genes of interest. We were able to see that patterns did arise and the data for other genes followed the same general form as TP53. It was so cool to see how all of these different genes compare. I'm starting to write my report and will get that finalized next week.
This week was crazy busy. I spent most of the week writing/editing my report, and getting my presentation ready. It was really cool to present my work, and see everything that all my peers have been working on. I am beyond impressed by everyone, the projects are amazing and the immense amount of work everyone has done really shows!