Sophie's REU Page

Sophie Dove

Email: sdove AT umass DOT edu

Office: CoRE Building 440

Home Institution: College of Natural Sciences & Manning College of Information and Computer Sciences, University of Massachusetts- Amherst

Mentor:

Wilma Olson

About Me

I have a LinkedIn. It's just my name.

Project Description

Deoxyribonucleic acid (DNA) is an essential biological molecule that contains the genetic information for organisms. It can take on various structures. One includes circular DNA. DNA minicircles become essential in gene expression. With that, they're used in gene therapy, vaccine development, and research areas related to pathology and biology. This structure has the ability to supercoil, a natural process involving the structure crossing over itself, and becomes essential during transcription and its interactions with enzymes. DNA gyrase is an enzyme, part of a class called topoisomerases, which ease transcripton by cutting DNA strands into pieces and rejoining them. DNA gyrase is mainly found in prokaryotes, where it's able to release torsional stress of positively supercoiled DNA and allow RNA polymerase to copy DNA into RNA during transcription. This projects focuses on the effects of DNA gyrase on the topology of DNA.

Weekly Log

Week 1 (05/26-05/31)

I spent this week (and before) reading literature provided to me by Dr. Olson and downloading emDNA (a tumultuous process), the software I'll be using throughout this project. I began to understand DNA gyrase's function, as well as protein binding and its effects on DNA structure. The placement of proteins depended largely on mechanical stress, but proteins also have the ability to manipulate it (Clauvelin & Olson). In other papers, they describe supercoiling and its chirality, which can involve torsional stress. This information makes sense and will probably be an essential part of my research. Other than reading, I also met with Dr. Olson. She condensed the readings for me and showed me some of the 3D structures created by emDNA (very interesting). On Thursday, I started working on my presentation and ran tests in the command line. Today (Friday), I started practicing for my presentation and finished the slides.

Week 2 (06/01-06/06)

I spent the beginning of this week revising my slides and presentation notes, after meeting with Dr. Olson to receive feedback on the notes and get photos I needed for the presentation. I gave my presentation on Tuesday and it went okay (I thought it was too quick) and after, I started going through the examples given in the paper discussing emDNA and its utilities (Young et al). During this, I corresponded with Ismaeel Abdulghani, who worked on a captstone project with Dr. Olson. He explained how to use the commands and where to use them (e.g. ./emDNA-topology in the Release folder under emDNA-install). When was looking in the emDNA-install directory, I noticed that I didn't properly copy the repository, which lead to some complications, like having the protein binding simulation not progressing because the Release folder didn't exist.. On Thursday, I met with Dr. Olson, where she gave me some feedback on the presentation. She also introduced me to a couple of resources that will instrumental in the next steps of my research. These include Protein Data Bank and 3DNA. She also gave me advice on what to do next in my research, which includes finishing all of the examples and trying to recreate the DNA structure found by the Olson Lab's collaborators and analyze the rigid body parameters. At this point (Friday), I've gotten through almost all of the examples from the JMol Biol Case Study, but am having a hard time with the last one, as my input step parameter file doesn't have the same sequence length as my string sequence file. I might have to start from the beginning because I believe that I messed up at a certain point.

Outside of work/research this week, I attended board game nights, went to H-Mart with my roommates, and started a new TV show. I'm enjoying the work-life balance of research. 7-8 hours of research/work is a lot, but at least I don't have schoolwork.

Week 3 (06/08-06/13)

After corresponding with Ismaeel and running example "cs03.sh" (JMolBiol 2022) multiple times, I finally finished the examples Monday night/Tuesday morning. I think organizational code had to be added so that the sequence lengths matched. Additionally, Ismaeel decreased the number of linear ramping steps. I also donwloaded various resources (given by Dr. Olson), like x3DNA and 3DNA-DSSR (these aren't the same thing). These will allow me to easily download .pdb (Protein Data Bank) files and obtain the rigid body parameters necessary for simulating the binding of Gyrase to DNA. Dr. Olson also gave me an excel spreadsheet to to get the starting state of a DNA minicircle. On Tuesday afternoon, I met with Dr. Olson where she explained some of the resources and showed me that the x3DNA web server doesn't get all of the step parameters. Hence, DSSR. She also gave me a paper describing equations for the Tilt and Roll (featured on the excel sheet) and encouraged me to note any issues I had with emDNA. After that, I imported some .pdb files to x3dna-DSSR and looked at the output files. More specifically, I looked at the local base-pair parameters to see if I could find any similarities in the three structures (I looked at Shift, Slide, Rise, Tilt, Roll, and Twist). Dr. Olson also advised me to look at Buckle, Shear, Stretch, and Propeller in each structure's output file. These parameters can indicate distortion, which is also important in understanding DNA structure when bound to Gyrase. Dr. Olson mentioned the step parameters, which I forgot to analyze, so I'll be oding that today. I might also brainstorm ways to find the binding site(s) of Gyrase.

Week 4 (06/16-06/19)

On Monday, I met with Dr. Olson to discuss what I'd done with the step parameters, but found out that I didn't manually edit the text files where the base-pairs were listed. I thought the --list-pair option did this automatically for me for some reason (yikes), but discovered from Dr. Olson's reference code that duplex-outfile generates a file that can be manually edited and can be inputted back to be analyzed (yay DSSR). I spent Monday afternoon and most of the Tuesday fixing the files that had missing base-pairs. I organized all of them into a Google Sheet and created a couple of histograms for each parameter where each structure was color coded. The scatterplots are awful to look at so to analyze the step parameters, I decided to just compare them line by line. I noticed that the structures whose parameters had similar values corresponded to a base pair of significance. For example, most of the base-pairs were close to either the "gate" or "pinwheel" (shapes of the protein)-bound parts in each DNA structure. I'm also beginning to understand the anatomy of the .par files and the purpose of the Excel sheet Dr. Olson sent me (this is so sad). I think I'll attempt to simulate protein binding with each structure, but I still don't know where it binds. On Thursday, I met with Dr. Olson and she advised me to graph the step parameters that correspond to the base pairs are near the "gate" and/or "pinwheel." She also said that I could start taking the step parameters for the "gate" or "pinwheel" and see what it does to a DNA minicircle. I managed to get the "pinwheel" graphs done for 8QDX and 9GBV, but I don't see many similarities. The graph for 9GBV is a lot less noisy compared to 8QDX. Maybe I can find a smaller range (for the base-pair residues) so when I superimpose them, they look more similar..

Outside of research, there was a pasta cooking competition. We didn't win, but there was a lot of good food.

Week 5 (06/23-06/27)

I met with Dr. Olson and showed my revised graphs, but they needed more fixing because the lines still weren't superimposed on each other. I spent Monday afternoon finding ranges that allowed each structure's step parameters to be superimposed on each other and creating a .par file for a 225 bp minicircle. With the gate, it was easy to obtain the step parameters, as I could just downlaod them from the x3DNA webserver. On Tuesday, I simulated protein binding. I started with 50 steps and having the base-bound domain be near the beginning of the sequence. I also attempted 100 steps, a different binding site, and multiple binding sites. Using a different binding site (not at the beginning of the sequence) just resulted in a minicircle.. I ensured that the base bound domains were "covered" by the number of base pairs in the step parameter file, but I still had trouble. The other "conditions" resulted in two somewhat different structures-- both ovals where one side was distorted (one was more round than the other). I decided to move on to a 250 bp minicircle and try a smaller amount of steps when for binding to one site, but there weren't many differences. On Thursday, after corresponding with Ismaeel, I decided to redo binding in two different sites and other sites besides the beginning of the sequence for the 225-bp and 250-bp minicircles. Even at 50 steps, this took almost an hour and a half. After Thursday, I was able to redo most of the tests and got some more distorted shapes, especially when using three binding sites (the fix for this was freezing the correct base pairs and specifying a range that corresponded to each binding site). This didn't take as long. I met with Dr. Olson to discuss what else can be done to understand where the "pinwheel" begins and ends. She introduced me to "Snap," a feature part of x3DNA. My next task is to observe the text files and use parts of each structure's sequence to the start and end of the "pinwheel."

Outside of research, I have been trying to get an introduction to stochastic processes and brownian motion. Right now, I'm just doing review and learning about Baye's Estimators. Today, I counted the ducks on my desk and found out I've collected 137 of them...

Week 6 (06/30-07/03)

I spent the beginning of the ("short") week finishing up simulations with the 225-bp minicircle with different linking numbers. I wasn't able to capture any supercoiled structures, but some of the ones I froze could be useful for finding the pathway used by the protein to deform DNA. I also created a new .par file contianing the "gate" from 8QQS and "pinwheel" from 8QDX. I took the optimized structure's files and ran more simulations with the new .par file, but still didn't find any supercoiled structures. On Tuesday, I tried starting with a 250-bp minicircle and just running the simulation with the new .par file and finally got a supercoiled structure! There are definitely points to clear up with Dr. Olson though, as I'm confused if this is another way to run the simulations. On Wednesday, I met with Dr. Olson to discuss data analyzation and organization. While running more simulations, one of the next things to do is to create coordinate-axis matrices and tabulate information from the output files. This could give us an idea as to where the "gate" and "pinwheel" are on the "frozen" structures. I hope to at least get the organizaiton done this week, but on Thursday, I froze a structure at particular binding site that might help understand the pathway to deformation. I have to work on Independence Day, so that'll be fun..

As for progress with learning about stochastic processes, I decided to skip Poisson processes and renewal theory. I'm now learning the basics of Markov Chains. I've found it annoying to "memorize" the conditions for two states communicating and a state being transient or recurrent.

Week 7 (07/07-07/11)

At the start of the week, I wasn't feeling so great so I didn't get too much done. I managed to organize some data and clean up plots. I added the sequence on the horizontal axes of my step parameter graphs. I also started thinking about what to put in my final presentation. I redid experiments, as Dr. Olson pointed out that my ste parameter file was wrong and I'd frozen the wrong residue range after optimizing with the gate. After running the experiments, I obtained a couple of structures that are both "more" supercoiled. It was pointed out to me again that I was doing this process wrong, so I ran more simulations and ensured that my input file and optimized file had the same step parameters. Dr. Olson gave me a couple of step parameter files-- one that had the 8QDX pinwheel, 8QQS gate, and 9GBV pinwheel. She instructed me to use a much smaller minicrcle (165-bp).

Week 8 (07/14-07/18)

I did some experiments with the 165-bp minicircle and only noticed subtle differences between the four structures I did (each of them had a different linking number). I also worked on my final presentation. I gave my final presentation on Wednesday and worked on a poster that I might present at the Jane Street event. At the end of the week, I ran a couple of experiments and had no luck preserving my step parameters. I'll attempt to use the same binding site for both optimizations.

I went to a couple of restaurants this week. They were both pretty good. I also bought fish-shaped ice cream. Yay.

Week 9 (07/21-07/25)

I went to the Jane Street event. It was kind of nice. I spent the weekend working on my final paper. I have the Introduction and Methods done and have added most of my figures/tables to Latex. I tried running a couple of simulations in emDNA and wasn't able to preserve the step parameters. I'm meeting with Dr. Robert Young and Dr. Olson later in the week.

I met with Dr. Young and Dr. Olson. Dr. Young said that one way the simulations could be done is optimizing a half circle with the gate and the other with the pinwheel and then fusing them using a base-pair list file instead of a step parameter file. This is in terms of reference frames, so the first line of the file has to be copied to the end and the frozen steps have to align with the base bound domains for each optimiziation. Dr. Young also mentioned that a simpler way to go about this is to do one optimization with a step parameter file that contains the 8QDX pinwheel, followed by the 8QQS gate. I carried out a couple of simulations and got similar structures. Maybe after the REU, I'll try to fuse two half circles together. I finished my final paper. This was a great experience and I've definitely learned a lot in regards to what I can do in the field of applied mathematics.

Presentations

First Presentation

Final Presentation

Tools

References

Clauvelin N, Olson WK, Tobias I. Characterization of the geometry and topology of DNA pictured as a discrete collection of atoms. J Chem Theory Comput. 2012 Mar 13;8(3):1092-1107. doi: 10.1021/ct200657e. PMID: 24791158; PMCID: PMC4004093.

Li, Shuxiang, et al. “Web 3DNA 2.0 for the Analysis, Visualization, and Modeling of 3D Nucleic Acid Structures.” Oxford Academic, 22 May 2019, https://academic.oup.com/nar/article/47/W1/W26/5494724. Accessed 14 July 2025.

E. Michalczyk, Z. Pakosz-Stepie´n, J.D. Liston, O. Gittins, M. Pabis, J.G. Heddle, and D. Ghilarov, Structural basis of chiral wrap and T-segment capture by Escherichia coli DNA gyrase, Proc. Natl. Acad. Sci. U.S.A. 121 (49) e2407398121, https://doi.org/10.1073/pnas.2407398121 (2024).

Vayssi`eres M, Marechal N, Yun L, Lopez Duran B, Murugasamy NK, Fogg JM, Zechiedrich L, Nadal M, Lamour V. Structural basis of DNA crossover capture by Escherichia coli DNA gyrase. Science. 2024 Apr 12;384(6692):227-232. doi: 10.1126/science.adl5899. Epub 2024 Apr 11. PMID: 38603484; PMCID: PMC11108255.

Young RT, Clauvelin N, Olson WK. emDNA - A Tool for Modeling Protein-decorated DNA Loops and Minicircles at the Base-pair Step Level. J Mol Biol. 2022 Jun 15;434(11):167558. doi: 10.1016/j.jmb.2022.167558. Epub 2022 Mar 24. PMID: 35341743; PMCID: PMC9177622.

Acknowledgements

I'm grateful for my mentor, Dr. Wilma Olson, and for Ismaeel Abdulghani's guidance. I also would like to thank Lazaros Gallos and Larry Frolov for their support throughout the summer. This project was funded by the NSF through grant CCF-2447342