Student: | Ruth Velasquez |
---|---|
Mentor: | Subhajayoti De |
School: | Florida International University |
E-mail: | ruthvcs070 (at) gmail (dot) com |
Project: | Computatational Models of Cancer |
About me: | My hobbies consist of visiting new places, collecting magnets and sending postcards, biking, listening to music, and hanging out with friends.I am a rising sophomore in my University and part of the executive board of the biggest student-led organization on my campus. This coming fall, along with my fellow board-members, we will be hosting a hackathon and I am excited to be apart of it.My interests lie in Bioinformatics and Data Science. |
"We will develop computational model to capture principals of population dynamics of tumor cells during development and emergence of resistance against cancer drugs, using concepts based on Galton Watson branching process and reaction diffusion system. We will combine computational models with clinical data from cancer patients to investigate whether we can predict the timeline of tumor development and thereby (i) detect cancer at an early stage or (ii) identify emergence of drug resistance very early during treatment. This project will develop fundamental evolutionary concepts, implement relevant computational models, and apply them in clinical settings to help improve cancer care."
This week I arrived on Tuesday from the Newark airport and met with my mentor on Wednesday after the orientation given by Lazaros Gallos. My mentor and I discussed the timeline of the project. The first two weeks will be where I read research papers and a book about the Biology of the Cell by Bruce Alberts. I also went to my mentors work building in the Cancer Institue of New Jersey by taking the Rutgers bus to College Ave. I really like how walkable the area is.
On Friday, the day I went to the CINJ (Rutgers Cancer Institute of New Jersey) I was also able to sit down with my mentor's group of other researchers who consisted of an undergraduate student like me, albeit in their last year, along with his post-doctoral, graduate, and phD students. One of my mentor's students presented their findings on cfDNA using Gaussian models, different type of modeling tests, and many graphs that showed linearity between different parts of the DNA.
On Sunday I also took a trip to a bagel shop so that I could continue reading the research papers given to me. I plan to add a bagel review part to my page soon.
This week I created my presentation for the tuesday introductory presentation and did some last minute reading that helped me further understand what I was going to discuss during my presentation. On tuesday and Wednesday I finished up my last reading and started looking for different articles about the mathematical models that we would utilize for the project and how to utilize them with python.
On Thursday I met with my mentor Dr.De and I expressed that I was ready to work with the data and then he sent me three data fragments which I learned to unzip and convert to another file type. I opened the file using pandas and started downloading dependencies like Numpy, Matplotlib, and Scipy. The first step is to make a histogram with a few key points and a Gaussian curve fit. This histrogram is sourced from the lengths of the fragments and will visually show how the length in base pairing is distributed.
This week I created the graphs to visualize the distribution between the free-floating DNA fragments. These histogram plots also show quantile 1,3, median, and mean. When I shared my findings at the end of the week to Dr.De he expressed that we would also need to create analysis for these graphs so I went over my old notes to understand the graphs better.
Creating the graphs was not very hard, but in addition to adding a few statistics I also added the Probability Distribution Funtcion to the graph to show the Gaussian curve as it fit on the data distribution.
This week I did research on what Fourier Transform graphs are, and how to add them to my project. I began with trying to understand what the transformation would do to the data given to me. The fourier transformation extracts the parts of a 'smoothie' and is able to find the 'recipe' used to create it.
I did a few iterations of the graph using different methods, like doing it manually with the equation and switching to functions in scipy or numpy. The graphs were different for each one so by the end of the week I was trying to find an explanation for this occurs. I also went to another bagel shop and should actually start working on adding my reviews to this page!
This week I improved the graphs by using Using a Gaussian Mixture Model from the scikit-learn library creates a true Gaussian mixed curve compared to the probability density function which does not directly use the data points to create the gaussian curve, and rather uses it indirectly - resulting in inaccurate results. By utilizing the Gaussian mixed model, we can accurately describe and account for multiple peaks in the data distribution in base pair length, due to the graphs characteristic of being multimodal. A Gaussian mixed Model is further discussed in this article The first parameter described is number of components, which programs how many distributions that data will be separated into. In our case, the number of components should be equivalent to how many peaks there is in the histogram, which can be seen due to the amount of bins where the data is distributed into. The second parameter is covariance which defines the structure of the covariance matrix, but not utilized in our histogram model.
On Friday I had a meeting with my mentor Dr.De and we discussed the next steps in my research. I also suggested to be made aware of any speaker events ocurring at the research institute.
After my meeting last week on Friday I was advised by Dr.De to use a Bayesion Information Criterion, along with the akaike Information Criterion to find the best number of components for the mixed model Gaussian curve. I did more research on these model selection criterions to understand them and their score's better.
This week we also met the students at another REU from New York along with a talk and a few presentations from their REU. I also started to look into functions that will help me separate the graph into separate peaks in Scipy due to the noise found in the dataset.
On monday we went to Nokia Bell labs, where we attended two talks. I was especially interested in the one using computer vision. For the rest of the week I kept trying to compute the BIC but met more roadblocks due to memory limits. I did more research on ways to find peaks in the data but on Friday went to the Cancer Institute of New Jersey.
On friday I was able to see another presentation by one of Dr.De's research group about rare urine cancers and modeling them with different softwares.
This week I spent summarizing my work into a presentation. Dr.De emailed me a few tips that I should keep in mind while I presented. I had around 5 slides and explained the basis of my presentation. Then I began to talk about my project and what techniques I used for analysis.
This week I also listened to everyone else's presentations. They were all very interesting and I thought it was cool how much was done at Rutgers during our eight weeks. I started to write my paper by collecting all my notes and using the Template Lazaros sent through email. I also tried to rerun my Fourier script but had multiple memory errors.
This is the last week! I began to write my paper and spent a lot of time looking at papers with similar topics along with paying attention to their templates. I also worked a little more on the fourier transform and was finally able to run it with reduced time by fixing the code in earlier blocks.
This week I also wrote the evaluation and summary of the REU. Cleaning up the apartment was a little strenous but it we left it spotless!
This work was carried out while the author Ruth Velasquez was a participant in the 2024 DIMACS REU program at Rutgers University, supported by NSF grant CNS-2150186