||Ruchik S. Yajnik
||New Jersey Institute of Technology
||Heterogeneity of Indels and Their Relation to Cancer Sub-types
I am looking at the occurence of indels(deletions in a genome) of very low coverage, ~1, in cancers. The goal is to map the reads to a reference genome using a read mapper and then performing several manipulations on the resulting file to convert that to a tab delimited format and then calling the regions of interest. We will perform some mixture modelling calculations on our dataset to make some predictions about the heterogeneity of the indels and then try to relate that to cancer subtypes like "Triple Negative Breast Cancer (TNBC)" which is of interest to my group.
- Week 1:
- Week one was spent in literature review and gaining a basic understanding of the linux environment and how the command line works.
- Week 2:
- A meeting was arranged to re-evaluate our goals for the project and it was decided that I should look at TNBC dataets and start to run a read mapper on them.
- Week 3:
- At this point in time, I have found out that the dataset that I am working with does not have the data which is to be used for this project and furthermore it is also incomplete. I have started to look at various data repositories like NCBI, EBI etc. and am also doing Google searches for TNBC datasets.
- Week 4:
- I have indentified a dataset from a paper that I had read and have found the corresponding file on the NCBI website. I have submitted a dataset request proposal to my advisor who will in turn contact the people in charge of releasing the download links. In the field on bioinformatics, one of the main challenges is just dealing with the data and making sure that the dataset you work with is not corrupted and is actually what you need. Even though the NCBI data is publicly available, an individual MUST submit a proposal justifying the request and what they intend to do with it.
- Week 5 & 6:
- I spent these two weeks playing around a sample dataset to simulate the further steps for our research and am now going to apply these steps to the real data.
- Week 7:
- We have received the A-ok from NCBI to download the data and will now do a paired end mapping of the cancer dataset and the healthy tissue dataset. From there, I will conver the resultant output files into the ".vcf" format so that we can view the information contained therein and be able to isolate parts of that file. Eventually, the data will be plotted using the clustering method.