DIMACS
DIMACS REU 2024

General Information


Student: Ruth Velasquez
Mentor: Subhajayoti De
School: Florida International University
E-mail: ruthvcs070 (at) gmail (dot) com
Project: Computatational Models of Cancer
About me: My hobbies consist of visiting new places, collecting magnets and sending postcards, biking, listening to music, and hanging out with friends.I am a rising sophomore in my University and part of the executive board of the biggest student-led organization on my campus. This coming fall, along with my fellow board-members, we will be hosting a hackathon and I am excited to be apart of it.My interests lie in Bioinformatics and Data Science.

Project Description

"We will develop computational model to capture principals of population dynamics of tumor cells during development and emergence of resistance against cancer drugs, using concepts based on Galton Watson branching process and reaction diffusion system. We will combine computational models with clinical data from cancer patients to investigate whether we can predict the timeline of tumor development and thereby (i) detect cancer at an early stage or (ii) identify emergence of drug resistance very early during treatment. This project will develop fundamental evolutionary concepts, implement relevant computational models, and apply them in clinical settings to help improve cancer care."


Weekly Log

Week 1: May 31 - June 2

This week I arrived on Tuesday from the Newark airport and met with my mentor on Wednesday after the orientation given by Lazaros Gallos. My mentor and I discussed the timeline of the project. The first two weeks will be where I read research papers and a book about the Biology of the Cell by Bruce Alberts. I also went to my mentors work building in the Cancer Institue of New Jersey by taking the Rutgers bus to College Ave. I really like how walkable the area is.

On Friday, the day I went to the CINJ (Rutgers Cancer Institute of New Jersey) I was also able to sit down with my mentor's group of other researchers who consisted of an undergraduate student like me, albeit in their last year, along with his post-doctoral, graduate, and phD students. One of my mentor's students presented their findings on cfDNA using Gaussian models, different type of modeling tests, and many graphs that showed linearity between different parts of the DNA.

On Sunday I also took a trip to a bagel shop so that I could continue reading the research papers given to me. I plan to add a bagel review part to my page soon.


Week 2: June 5 - June 9

This week I created my presentation for the tuesday introductory presentation and did some last minute reading that helped me further understand what I was going to discuss during my presentation. On tuesday and Wednesday I finished up my last reading and started looking for different articles about the mathematical models that we would utilize for the project and how to utilize them with python.

On Thursday I met with my mentor Dr.De and I expressed that I was ready to work with the data and then he sent me three data fragments which I learned to unzip and convert to another file type. I opened the file using pandas and started downloading dependencies like Numpy, Matplotlib, and Scipy. The first step is to make a histogram with a few key points and a Gaussian curve fit. This histrogram is sourced from the lengths of the fragments and will visually show how the length in base pairing is distributed.


Week 3: June 12 - June 16

This week I created the graphs to visualize the distribution between the free-floating DNA fragments. These histogram plots also show quantile 1,3, median, and mean. When I shared my findings at the end of the week to Dr.De he expressed that we would also need to create analysis for these graphs so I went over my old notes to understand the graphs better.

Creating the graphs was not very hard, but in addition to adding a few statistics I also added the Probability Distribution Funtcion to the graph to show the Gaussian curve as it fit on the data distribution.


Week 4: June 19 - June 23

This week I did research on what Fourier Transform graphs are, and how to add them to my project. I began with trying to understand what the transformation would do to the data given to me. The fourier transformation extracts the parts of a 'smoothie' and is able to find the 'recipe' used to create it.

I did a few iterations of the graph using different methods, like doing it manually with the equation and switching to functions in scipy or numpy. The graphs were different for each one so by the end of the week I was trying to find an explanation for this occurs. I also went to another bagel shop and should actually start working on adding my reviews to this page!


Week 5: June 26 - June 30

This week I improved the graphs by using Using a Gaussian Mixture Model from the scikit-learn library creates a true Gaussian mixed curve compared to the probability density function which does not directly use the data points to create the gaussian curve, and rather uses it indirectly - resulting in inaccurate results. By utilizing the Gaussian mixed model, we can accurately describe and account for multiple peaks in the data distribution in base pair length, due to the graphs characteristic of being multimodal. A Gaussian mixed Model is further discussed in this article The first parameter described is number of components, which programs how many distributions that data will be separated into. In our case, the number of components should be equivalent to how many peaks there is in the histogram, which can be seen due to the amount of bins where the data is distributed into. The second parameter is covariance which defines the structure of the covariance matrix, but not utilized in our histogram model.

On Friday I had a meeting with my mentor Dr.De and we discussed the next steps in my research. I also suggested to be made aware of any speaker events ocurring at the research institute.


Week 6: July 1 - July 5

After my meeting last week on Friday I was advised by Dr.De to use a Bayesion Information Criterion, along with the akaike Information Criterion to find the best number of components for the mixed model Gaussian curve. I did more research on these model selection criterions to understand them and their score's better.

This week we also met the students at another REU from New York along with a talk and a few presentations from their REU. I also started to look into functions that will help me separate the graph into separate peaks in Scipy due to the noise found in the dataset.


Week 7: July 8 - July 12

On monday we went to Nokia Bell labs, where we attended two talks. I was especially interested in the one using computer vision. For the rest of the week I kept trying to compute the BIC but met more roadblocks due to memory limits. I did more research on ways to find peaks in the data but on Friday went to the Cancer Institute of New Jersey.

On friday I was able to see another presentation by one of Dr.De's research group about rare urine cancers and modeling them with different softwares.


Week 8: July 15 - July 19

Week 9: July 22 - July 26

References & Links

Relevant Papers & References:
  1. Hallmarks of Cancer: The Next Generation
  2. Signatures Beyond Oncogenic Mutations in Cell-Free DNA Sequencing for Non-Invasive, Early Detection of Cancer
Websites: Presentations:

Acknowledgements

This work was carried out while the author Ruth Velasquez was a participant in the 2024 DIMACS REU program at Rutgers University, supported by NSF grant CNS-2150186