Sophie Dove

Email: sdove AT umass DOT edu

Office: CoRE Building 440

Home Institution: College of Natural Sciences & Commonwealth Honors College, University of Massachusetts- Amherst

Mentor:

Wilma Olson

About Me

I'm a second-year student at UMass Amherst pursuing degrees in mathematics and computer science. I'm interested in probability theory, genetics, infectious disease, and machine learning. I have a LinkedIn. It's just my name..

Project Description

Deoxyribonucleic acid (DNA) is an essential biological molecule that contains the genetic information for organisms. It can take on various structures. One includes circular DNA. DNA minicircles become essential in gene expression. With that, they're used in gene therapy, vaccine development, and research areas related to pathology and biology. This structure has the ability to supercoil, a natural process involving the structure crossing over itself, and becomes essential during transcription and its interactions with enzymes. DNA gyrase is an enzyme, part of a class called topoisomerases, which ease transcripton by cutting DNA strands into pieces and rejoining them. DNA gyrase is mainly found in prokaryotes, where it's able to release torsional stress of positively supercoiled DNA and allow RNA polymerase to copy DNA into RNA during transcription. This projects focuses on the effects of DNA gyrase on the topology of DNA.


Weekly Log

Week 1 (05/26-05/31)

I spent this week (and before) reading literature provided to me by Dr. Olson and downloading emDNA (a tumultuous process), the software I'll be using throughout this project. I began to understand DNA gyrase's function, as well as protein binding and its effects on DNA structure. The placement of proteins depended largely on mechanical stress, but proteins also have the ability to manipulate it (Clauvelin & Olson). In other papers, they describe supercoiling and its chirality, which can involve torsional stress. This information makes sense and will probably be an essential part of my research. Other than reading, I also met with Dr. Olson. She condensed the readings for me and showed me some of the 3D structures created by emDNA (very interesting). On Thursday, I started working on my presentation and ran tests in the command line. Today (Friday), I started practicing for my presentation and finished the slides.

Week 2 (06/01-06/06)

I spent the beginning of this week revising my slides and presentation notes, after meeting with Dr. Olson to receive feedback on the notes and get photos I needed for the presentation. I gave my presentation on Tuesday and it went okay (I thought it was too quick) and after, I started going through the examples given in the paper discussing emDNA and its utilities (Young et al). During this, I corresponded with Ismaeel Abdulghani, who worked on a captstone project with Dr. Olson. He explained how to use the commands and where to use them (e.g. ./emDNA-topology in the Release folder under emDNA-install). When was looking in the emDNA-install directory, I noticed that I didn't properly copy the repository, which lead to some complications, like having the protein binding simulation not progressing because the Release folder didn't exist.. On Thursday, I met with Dr. Olson, where she gave me some feedback on the presentation. She also introduced me to a couple of resources that will instrumental in the next steps of my research. These include Protein Data Bank and 3DNA. She also gave me advice on what to do next in my research, which includes finishing all of the examples and trying to recreate the DNA structure found by the Olson Lab's collaborators and analyze the rigid body parameters. At this point (Friday), I've gotten through almost all of the examples from the JMol Biol Case Study, but am having a hard time with the last one, as my input step parameter file doesn't have the same sequence length as my string sequence file. I might have to start from the beginning because I believe that I messed up at a certain point.

Outside of work/research this week, I attended board game nights, went to H-Mart with my roommates, and started a new TV show. I'm enjoying the work-life balance of research. 7-8 hours of research/work is a lot, but at least I don't have schoolwork.

Week 3 (06/08-06/13)

After corresponding with Ismaeel and running example "cs03.sh" (JMolBiol 2022) multiple times, I finally finished the examples Monday night/Tuesday morning. I think organizational code had to be added so that the sequence lengths matched. Additionally, Ismaeel decreased the number of linear ramping steps. I also donwloaded various resources (given by Dr. Olson), like x3DNA and 3DNA-DSSR (these aren't the same thing). These will allow me to easily download .pdb (Protein Data Bank) files and obtain the rigid body parameters necessary for simulating the binding of Gyrase to DNA. Dr. Olson also gave me an excel spreadsheet to to get the starting state of a DNA minicircle. On Tuesday afternoon, I met with Dr. Olson where she explained some of the resources and showed me that the x3DNA web server doesn't get all of the step parameters. Hence, DSSR. She also gave me a paper describing equations for the Tilt and Roll (featured on the excel sheet) and encouraged me to note any issues I had with emDNA. After that, I imported some .pdb files to x3dna-DSSR and looked at the output files. More specifically, I looked at the local base-pair parameters to see if I could find any similarities in the three structures (I looked at Shift, Slide, Rise, Tilt, Roll, and Twist). Dr. Olson also advised me to look at Buckle, Shear, Stretch, and Propeller in each structure's output file. These parameters can indicate distortion, which is also important in understanding DNA structure when bound to Gyrase. Dr. Olson mentioned the step parameters, which I forgot to analyze, so I'll be oding that today. I might also brainstorm ways to find the binding site(s) of Gyrase.

Week 4 (06/16-06/19)

On Monday, I met with Dr. Olson to discuss what I'd done with the step parameters, but found out that I didn't manually edit the text files where the base-pairs were listed. I thought the --list-pair option did this automatically for me for some reason (yikes), but discovered from Dr. Olson's reference code that duplex-outfile generates a file that can be manually edited and can be inputted back to be analyzed (yay DSSR). I spent Monday afternoon and most of the Tuesday fixing the files that had missing base-pairs. I organized all of them into a Google Sheet and created a couple of histograms for each parameter where each structure was color coded. The scatterplots are awful to look at so to analyze the step parameters, I decided to just compare them line by line. I noticed that the structures whose parameters had similar values corresponded to a base pair of significance. For example, most of the base-pairs were close to either the "gate" or "pinwheel" (shapes of the protein)-bound parts in each DNA structure. I'm also beginning to understand the anatomy of the .par files and the purpose of the Excel sheet Dr. Olson sent me (this is so sad). I think I'll attempt to simulate protein binding with each structure, but I still don't know where it binds. On Thursday, I met with Dr. Olson and she advised me to graph the step parameters that correspond to the base pairs are near the "gate" and/or "pinwheel." She also said that I could start taking the step parameters for the "gate" or "pinwheel" and see what it does to a DNA minicircle. I managed to get the "pinwheel" graphs done for 8QDX and 9GBV, but I don't see many similarities. The graph for 9GBV is a lot less noisy compared to 8QDX. Maybe I can find a smaller range (for the base-pair residues) so when I superimpose them, they look more similar..

Outside of research, there was a pasta cooking competition. We didn't win, but there was a lot of good food.

Week 5 (06/23-06/27)

I met with Dr. Olson and showed my revised graphs, but they needed more fixing because the lines still weren't superimposed on each other. I spent Monday afternoon finding ranges that allowed each structure's step parameters to be superimposed on each other and creating a .par file for a 225 bp minicircle. With the gate, it was easy to obtain the step parameters, as I could just downlaod them from the x3DNA webserver. On Tuesday, I simulated protein binding. I started with 50 steps and having the base-bound domain be near the beginning of the sequence. I also attempted 100 steps, a different binding site, and multiple binding sites. Using a different binding site (not at the beginning of the sequence) just resulted in a minicircle.. I ensured that the base bound domains were "covered" by the number of base pairs in the step parameter file, but I still had trouble. The other "conditions" resulted in two somewhat different structures-- both ovals where one side was distorted (one was more round than the other). I decided to move on to a 250 bp minicircle and try a smaller amount of steps when for binding to one site, but there weren't many differences. On Thursday, after corresponding with Ismaeel, I decided to redo binding in two different sites and other sites besides the beginning of the sequence for the 225-bp and 250-bp minicircles. Even at 50 steps, this took almost an hour and a half. After Thursday, I was able to redo most of the tests and got some more distorted shapes, especially when using three binding sites (the fix for this was freezing the correct base pairs and specifying a range that corresponded to each binding site). This didn't take as long. I met with Dr. Olson to discuss what else can be done to understand where the "pinwheel" begins and ends. She introduced me to "Snap," a feature part of x3DNA. My next task is to observe the text files and use parts of each structure's sequence to the start and end of the "pinwheel."

Outside of research, I have been trying to get an introduction to stochastic processes and brownian motion. Right now, I'm just doing review and learning about Baye's Estimators. Today, I counted the ducks on my desk and found out I've collected 137 of them...

Presentations

Tools

References

Acknowledgements

I'm grateful for my mentor, Dr. Wilma Olson, and for Ismaeel Abdulghani's guidance. I also would like to thank Lazaros Gallos and Larry Frolov for their support throughout the summer. This project was funded by the NSF through grant CCF-2447342