DIMACS
DIMACS REU 2023

General Information

me
Student: Colton Fitzjarrald
School: University of Missouri St. Louis
E-mail: cwfdfd@umsl.edu
Project:
Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes

Project Description

My work consists of looking into the growth patterns of the E. Coli ribosome. There is recent structural data on this particular ribosome at several stages of its development. Using various techniques from mathematical biology and knot theory we desire to understand how the topology of the E. Coli ribosome changes from birth to its fully matured state.


Weekly Log

Week 1:
There was an in-depth literature review. The bulk of this was a review in biology and chemistry, specifically understanding the mechanisms of nucleic acids, rna, ribosomes, etc., in addition to the more basic aspects of molecular and chemical biology. There was also a brief look into various techniques used in bioinformatics and mathematical biology. Another large portion of this week was spent creating the initial presentation to peers and faculty on my research intentions.
Week 2:
Familiarized myself with the protein data bank (PDB) and current methods of data visualization in chemical biology. I attempted to find an external c++ library to read through files in the PDB only to find very poor, difficult to work with documentation. At the end of the week I decided to create a program to read through the .cif and .pdb files myself.
Week 3:
The majority of work this week consisted of developing a program to read through and extract information from the .cif and .pdb files in the PDB. This quickly morphed from what I intended to be a simple program to a relatively complex library of functions.
There was also a probing into the types of c++ external libraries for data visualization. It was found that the current libraries available for this purpose were insufficient for my needs, while also containing very poor documentation. At the end of the week a search was done to decide what external software would be most efficient for the visualization aspect of my data and what was the most easily integrable with my current design. The primary considerations were to integrate python or matlab.
Week 4:
Python's library Seaborn was eventually decided as the best tool for visualization to integrate with the c++ library. An integration was completed between the two libraries, allowing for the final visualizations of the data to commence. A program was also developed in c++ and added to the existing library to read through .json files which contained base pair information.
The visualizations that were created: Scatter plots of base pair data, which outlined the residues within the ribosomal structure that were connected via base pairs; Nucleic acid contact maps, which are binary distance matrices of nucleic acid residues; Heatmaps of the initial 526 nucleic acid residues plotting their relative distances against their residue number in the given stage of the ribosome.
A serious consideration is made at this point to determine if the libraries developed should be included on Github to potentially assist other researchers who find the current c++ libraries in this specific area of research to be lacking.
Week 5:
This week consisted wholly of ongoing plot adjustments to refine desired details. There was also much deliberation on what tools and methods should be considered for future modeling. It was decided that upon completion of this portion of the project, work would commence in considering the writhing number, a concept in knot theory often applied in topological chemistry, to analyze the ribosome structure.
Week 6:
This week begins with heavy literature review into topological methods applied to molecular biology.

Presentations


Additional Information