DIMACS
DIMACS REU 2024

General Information

Ryan Ding
Student: Ryan Ding
Office: CoRE 444
School: University of California San Diego
E-mail: dingryan2 (at) gmail (dot) com
Project: Genome folding and function: from DNA base pairs to nucleosome arrays and chromosomes
Mentor: Wilma Olson

Project Description

The dynamics of B-Form DNA and how it might fold with regards to its environmental conditions and sequence-dependent conditions is a topic of interest within the field of bioinformatics. Existing software such as emDNA seek to model the resting states various sequeunces converge towards through a variety of models that take into consideration elastic energy minimization and sequence-dependent data. My objectives for this summer are to investigate the various conformations loop-like structures take on when analyzed in a sequence-dependent context, particularly that of tetrameric depenedency, and integrate these findings into the emDNA software to both identify protein binding sites for structures of interest and to improve upon the capabilities of the emDNA software itself in the process.


Weekly Log

Week 1 (5/28 - 5/31):
Given that my home institution is on a quarter system and I am still in the midst of in-person finals, I am conducting work remotely while meeting with my mentor, Dr. Wilma Olson, over Zoom calls. This week was spent as a briefing towards potential project options and giving me time to review key resources provided by my mentor. Next week, I will go over my interpretations of the literatures with Dr. Olson to confirm or correct my beliefs, as well as asking questions for further investigation.
Week 2 (6/3 - 6/7):
This week, I prepared my preliminary project presentation for the program and met with Dr. Olson again to review some of the materials within the paper. After I discussed that, alongside some revisions to my presentation itself to correct any misunderstandings, I started to read through the emDNA literature, test out some of its software components, and attempt to replicate results given in the paper.
Week 3 (6/10 - 6/14):
I have finally finished all my finals and will be heading to Rutgers on Friday! This week was lighter in terms of work as Dr. Olson understood that I would be taking finals. However, I still attempted to test out emDNA throughout this week, running into some errors such as not configuring my gcc compiler properly, some variables that I needed to change (such as changing all SSH links to HTTPS links), and some executables not installing properly. I will try to finalize this process by next week, ensuring that I can replicate the protocol in the supplementary information of the emDNA paper, so that I can begin work on my personal objectives.
Week 4 (6/17 - 6/21):
This week I finally finished my attempt to replicate the results in the JMolBio 2022 paper and figured out that some of the provided Bash scripts simply were meant for a Linux OS rather than Windows. To counteract this, I developed a Python script, since they were better for cross-platform compatibility, such that users of both systems would have an easier time in trying to replicate the paper results than I did. Additionally, Dr. Olson sent me some novel Force Constants and Step Parameter data for both dimers and tetramers, and I will be integrating those into emDNA over the week in a manner that follows the existing formatting inside emDNA already.
Week 5 (6/24 - 6/28):
This week was rather slow as it involved me consolidating the step parameters and force constants given to me by Dr. Olson with the tetrameric model that was still stuck in a branch by Zoe Wefers on the emDNA repository. Additionally, I was given more detail on the math behind identifying looplike structures as well as an example of one in DNA gyrase. This week, I worked towards preparing the new dimeric and tetrameric model referenced in Young 2022 to then apply the model towards the gyrase structure provided, whilst also reading more of the literature provided to me by Dr. Olson (linked in the references below).
Week 6 (7/1 - 7/5):
This week, I looked into a 601 base-pair structure used to bind to the DNA gyrase protein last week, with further clarification that the protein served to regulate the topology of a structure. A new task was presented to me in that researchers who wrote about the protein never specified the binding site of the protein on the DNA minicircle, and so I was presented with the task of finding where it might bind along the sequence of DNA. This task involved varying the linking number (value representative of the topology of the DNA) and sequences of the DNA that the protein would bind to in order to find a low energy conformation to suit the complex. I think that this task might be a bit time-intensive, given that altering a sequence of DNA to simulate a protein being bound to it involves modifying a specific sequence in the DNA to adhere to certain step parameters in an iterative process whilst modifying the rest of the sequence given the new "controlled" sequence of step parameters. While this process should be done with small changes to the step parameter at each iteration, each iteration itself takes about 3-5 minutes to execute, and I have 100 iterations to go through. I will try to find a way to speedup this process in the code via literatures and consulting with my mentor, Dr. Olson, in greater detail on the subject.
Week 7 (7/8 - 7/12):
I was able to figure out a quicker way to simulate the protein binding process, referred to as binding ramp minimization/linear ramping, with assistance from Dr. Olson. The way that I would go about trying to identify the binding site for the protein and DNA would be as follows:
  1. Create a minicircle of Lk (linking number) between 56-60 using a provided script with a modified helical repeat so as to get a specific value. This must be as precise as possible since the linking number must be an integer.
  2. Perform the ramping procedure using the emDNA ProBind script with the first 60 base-pair step parameters being adjusted from its current rigid-body state to that of its orientation when it is suppposed to bind to the DNA. This essentially adjusts the sequence to "wrap around" the protein structure, and we will take into account how the other base-pair step parameters change whilst the controlled 60 base-pair sequence is iteratively and incrementally adjusted to their desired state.
  3. Rotate the sequence such that the 60 base-pair sequence that the protein structure would cover encompasses all possible coverings of the DNA structure for the specific linking number as we attempt to find the resting state energy of the protein-DNA complex, allowing us to find the most suitable placement of the protein on the sequence for that specific linking number. This process will then be repeated for the entire range of linking numbers to try and hone in on one part of the sequence that would be best for binding.
As each of the experiments require some premediated setup before I would just have to allow the program to run for several hours, mainly in the form of actually getting all rotations of a string (e.g. if I have a a string ATCG, with a protein binding to the first two base-pairs, I would want to perform the experiment on sequence TCGA, CGAT, and GATC) and ensuring that the program is working correctly (since there are a few one-off errors I need to investigate as well as the fact that I cannot run scripts sequentially due to the lack of consistency in their file parsing mechanisms that make it a bit tough, which I will also consolidate), this is my task for the remainder of the week, and I hope to be able to build more on top of this with an actual emDNA protein binding site predictor in the near future.
Week 8 (7/15 - 7/19):
Weekly Log Entry TBD
Week 9 (7/22 - 7/26):
Weekly Log Entry TBD

Presentations


References

  1. Revisiting DNA Sequence-Dependent Deformability in High-Resolution Structures: Effects of Flanking Base Pairs on Dinucleotide Morphology and Global Chain Configuration, Young et. al. - PubMed
  2. Synergy between Protein Positioning and DNA Elasticity: Energy Minimization of Protein-Decorated DNA Minicircles, Clauvelin & Olson - ACS
  3. Structural basis of DNA crossover capture by Escherichia coli DNA gyrase, Vayssières et. al. - Science
  4. Surprising Twists in Nucleosomal DNA with Implication for Higher-order Folding, Todolli et. al. - Journal of Molecular Biology
  5. Characterization of the Geometry and Topology of DNA Pictured As a Discrete Collection of Atoms, Clauvelin et. al. - ACS
  6. Two perspectives on the twist of DNA, Britton et. al. - The Journal of Chemical Physics

Tools

  1. RCSB Protein Data Bank - Website used to obtain PDB formatted structures
  2. 3DNA - Web tool used to obtain rigid body parameters of structures alongside in-depth analysis
  3. PyMOL - Software to visualize structures needed for binding site identification purposes and other insights