Ryan Ding, DIMACS REU 2024

General Information

Student:	Ryan Ding
Office:	CoRE 444
School:	University of California San Diego
E-mail:	dingryan2 (at) gmail (dot) com
Project:	Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes
Mentor:	Wilma Olson

Project Description

The dynamics of B-Form DNA and how it might fold with regards to its environmental conditions and sequence-dependent conditions is a topic of interest within the field of bioinformatics. Existing software such as emDNA seek to model the resting states various sequeunces converge towards through a variety of models that take into consideration elastic energy minimization and sequence-dependent data. My objectives for this summer are to investigate the various conformations loop-like structures take on when analyzed in a sequence-dependent context, particularly that of tetrameric depenedency, and integrate these findings into the emDNA software to both identify protein binding sites for structures of interest and to improve upon the capabilities of the emDNA software itself in the process.

Weekly Log

Week 1 (5/28 - 5/31):

Given that my home institution is on a quarter system and I am still in the midst of in-person finals, I am conducting work remotely while meeting with my mentor, Dr. Wilma Olson, over Zoom calls. This week was spent as a briefing towards potential project options and giving me time to review key resources provided by my mentor. Next week, I will go over my interpretations of the literatures with Dr. Olson to confirm or correct my beliefs, as well as asking questions for further investigation.

Week 2 (6/3 - 6/7):

This week, I prepared my preliminary project presentation for the program and met with Dr. Olson again to review some of the materials within the paper. After I discussed that, alongside some revisions to my presentation itself to correct any misunderstandings, I started to read through the emDNA literature, test out some of its software components, and attempt to replicate results given in the paper.

Week 3 (6/10 - 6/14):

I have finally finished all my finals and will be heading to Rutgers on Friday! This week was lighter in terms of work as Dr. Olson understood that I would be taking finals. However, I still attempted to test out emDNA throughout this week, running into some errors such as not configuring my gcc compiler properly, some variables that I needed to change (such as changing all SSH links to HTTPS links), and some executables not installing properly. I will try to finalize this process by next week, ensuring that I can replicate the protocol in the supplementary information of the emDNA paper, so that I can begin work on my personal objectives.

Week 4 (6/17 - 6/21):

This week I finally finished my attempt to replicate the results in the JMolBio 2022 paper and figured out that some of the provided Bash scripts simply were meant for a Linux OS rather than Windows. To counteract this, I developed a Python script, since they were better for cross-platform compatibility, such that users of both systems would have an easier time in trying to replicate the paper results than I did. Additionally, Dr. Olson sent me some novel Force Constants and Step Parameter data for both dimers and tetramers, and I will be integrating those into emDNA over the week in a manner that follows the existing formatting inside emDNA already.

Week 5 (6/24 - 6/28):

This week was rather slow as it involved me consolidating the step parameters and force constants given to me by Dr. Olson with the tetrameric model that was still stuck in a branch by Zoe Wefers on the emDNA repository. Additionally, I was given more detail on the math behind identifying looplike structures as well as an example of one in DNA gyrase. This week, I worked towards preparing the new dimeric and tetrameric model referenced in Young 2022 to then apply the model towards the gyrase structure provided, whilst also reading more of the literature provided to me by Dr. Olson (linked in the references below).

Week 6 (7/1 - 7/5):

This week, I looked into a 601 base-pair structure used to bind to the DNA gyrase protein last week, with further clarification that the protein served to regulate the topology of a structure. A new task was presented to me in that researchers who wrote about the protein never specified the binding site of the protein on the DNA minicircle, and so I was presented with the task of finding where it might bind along the sequence of DNA. This task involved varying the linking number (value representative of the topology of the DNA) and sequences of the DNA that the protein would bind to in order to find a low energy conformation to suit the complex. I think that this task might be a bit time-intensive, given that altering a sequence of DNA to simulate a protein being bound to it involves modifying a specific sequence in the DNA to adhere to certain step parameters in an iterative process whilst modifying the rest of the sequence given the new "controlled" sequence of step parameters. While this process should be done with small changes to the step parameter at each iteration, each iteration itself takes about 3-5 minutes to execute, and I have 100 iterations to go through. I will try to find a way to speedup this process in the code via literatures and consulting with my mentor, Dr. Olson, in greater detail on the subject.

Week 7 (7/8 - 7/12):

I was able to figure out a quicker way to simulate the protein binding process, referred to as binding ramp minimization/linear ramping, with assistance from Dr. Olson. The way that I would go about trying to identify the binding site for the protein and DNA would be as follows:

Create a minicircle of Lk (linking number) between 56-60 using a provided script with a modified helical repeat so as to get a specific value. This must be as precise as possible since the linking number must be an integer.
Perform the ramping procedure using the emDNA ProBind script with the first 60 base-pair step parameters being adjusted from its current rigid-body state to that of its orientation when it is suppposed to bind to the DNA. This essentially adjusts the sequence to "wrap around" the protein structure, and we will take into account how the other base-pair step parameters change whilst the controlled 60 base-pair sequence is iteratively and incrementally adjusted to their desired state.
Rotate the sequence such that the 60 base-pair sequence that the protein structure would cover encompasses all possible coverings of the DNA structure for the specific linking number as we attempt to find the resting state energy of the protein-DNA complex, allowing us to find the most suitable placement of the protein on the sequence for that specific linking number. This process will then be repeated for the entire range of linking numbers to try and hone in on one part of the sequence that would be best for binding.

As each of the experiments require some premediated setup before I would just have to allow the program to run for several hours, mainly in the form of actually getting all rotations of a string (e.g. if I have a a string ATCG, with a protein binding to the first two base-pairs, I would want to perform the experiment on sequence TCGA, CGAT, and GATC) and ensuring that the program is working correctly (since there are a few one-off errors I need to investigate as well as the fact that I cannot run scripts sequentially due to the lack of consistency in their file parsing mechanisms that make it a bit tough, which I will also consolidate), this is my task for the remainder of the week, and I hope to be able to build more on top of this with an actual emDNA protein binding site predictor in the near future.

Week 8 (7/15 - 7/19):

For this week, I ended up getting results for energy values of the protein-DNA complex being bound at increments of 15-bp, as was advised by Dr. Olson (though she referenced some papers that had noticable differences in structure due to bindings at every 5-bp, which is something I want to look at later). Nevertheless, as I got these results I decided to halt some of the work to refine the final presentation that I needed to prepare for this Thursday, which was a challenge to include the appropriate biological background with the computing sense that my peers have in the DIMACS program. I will be uploading this presentation to this website after I have filtered out any particular results that I want to keep out, as well as add more of slides that I had to omit in order to give future students who may read my page to understand the concepts a bit better.

Week 9 (7/22 - 7/26):

The majority of my week was spent writing the report alongside meeting with Dr. Olson to go over my presentation and parts of my final report to prepare for submission (as the presentation was not able to be recorded under given circumstances and issues related to sharing results prior to publications for some teams). On top of this, I was able to go over and learn how to superimpose two protein-DNA complex structures on top of one another to see how the minimization changes the final conformation of a complex. I found this to be helpful since I had assumed from initial PyMOL examination of the final conformational preference of two different optimized molecules that the process for minimization was similar. This was due to the structure taking roughly the same shape with a few more bends or curves on some sequences. However, upon superimposition, I found that two structures with similar structures were minimized such that they seemed to end up on different planes, which leads me to want to explore most of the energy values that I had calculated last week a bit more in-depth. Going forward, I also want to explore different visualization approaches of sequence data that Dr. Olson had showed me, including generating heatmaps of various external coordinates along a sequence given different binding sites. I think that this REU gave me great insight to the work that is needed to perform significant research, and I hope to be able to explore more opportunities in the future!

Presentations

First Presentation
Final Presentation - Will be uploaded soon!

References

Revisiting DNA Sequence-Dependent Deformability in High-Resolution Structures: Effects of Flanking Base Pairs on Dinucleotide Morphology and Global Chain Configuration, Young et. al. - PubMed
Synergy between Protein Positioning and DNA Elasticity: Energy Minimization of Protein-Decorated DNA Minicircles, Clauvelin & Olson - ACS
Structural basis of DNA crossover capture by Escherichia coli DNA gyrase, Vayssières et. al. - Science
Surprising Twists in Nucleosomal DNA with Implication for Higher-order Folding, Todolli et. al. - Journal of Molecular Biology
Characterization of the Geometry and Topology of DNA Pictured As a Discrete Collection of Atoms, Clauvelin et. al. - ACS
Two perspectives on the twist of DNA, Britton et. al. - The Journal of Chemical Physics
Surprising Twists in Nucleosomal DNA with Implication for Higher-order Folding, Todolli et. al. - Journal of Molecular Biology
Analyzing and Building Nucleic Acid Structures with 3DNA, Colasanti et. al. - JoVE

Tools

RCSB Protein Data Bank - Website used to obtain PDB formatted structures
3DNA - Web tool used to obtain rigid body parameters of structures alongside in-depth analysis
PyMOL - Software to visualize structures needed for binding site identification purposes and other insights