Name: | Zoe Wefers |
---|---|
Email: | zoe.wefers@mail.mcgill.ca |
Office: | Virtual |
Home Institution: | McGill University |
Project: | Genome folding and function: from DNA base pairs to nucleosome arrays and chromosome |
Dr. Olson's lab has created a program that can be used to model DNA folding and see the effects of end conditions and protein binding on DNA folding. Currently the code minimizes the elastic energy of the DNA with respect to dimeric base-pair steps. My project will be to adapt the code to do the optimization with respect to tetrameric base-pair steps.
It was a great first week! I met with my mentor Dr. Olson and Robert, one of the graduate students in the lab. They recommended lots of papers and explained as more of an overview of the biology topics relevant to my project. I also downloaded some of the software I will need for the project, such PyMol and 3DNA-DSSR and I played around the PDB database. On Friday I got access to the emDNA software that I will be working on. It took me a couple tries, but I got it up and running. I am excited to dive into the code next week!
It was a very productive week! I spent Tuesday running some bash script tests on the emDNA program to make sure it was working properly. I found some inconsistencies when comparing my output to the expected output. The important values like twist, writhe and final/initial elastic energy all matched up, but the number of iterations of the algorithm was consistently lower in my output. After discussing it with Robert, I think it might have to do with float rounding. Wednesday and Thursday was mostly spent diving into the code and tryin to get a solid understanding about how all the modules come together. I have never worked with such a large programming or in C++, so it was a bit daunting at first. But by the end of Thursday I felt like I had a good understanding of which classes are relevant to my project. Friday morning I mapped out some possible approaches to add the tetrameric functionalities to the program. I really tried to think of ways that change as few of the modules as possible and that avoids requiring a separate toggle in the command line. After presenting my ideas and progress to the lab on Friday, I picked an approach. I look forward to start coding next week!
This was my first week editing the program. It was a little rocky at the beginning because I have never coded in C++ before, but I think by the end of the week I got much more familiar the language. For the first three days I focused on adding the classes and functions I would need to support tetrameric functionality. I added a Tetramer class and expanded the step parameters and force constant databases so that they could hold more information. On Thursday Dr. Olson provided me with the Tetrameric data, so I spent Thursday and Friday writing a small python script to compute the force constant matrices and format the data in a way where I could just copy and paste it into the program. I spent some time on Saturday pasting all of the data into the program and running tests to make sure it was working. It did! Dr. Olson clarified this week that we need both to be able to access the tetrameric model internally from the program and to input an external file of tetrameric data. I believe I have the internal component done, so next week I will focus on modifying the force field packager modules so that the user can also import external files.
For the first half of the week I worked on getting the forcefield packager working so that users can input external tetrameric files. After some testing, the program seemed to be running smoothly. I did not expect to finish editing the program so quickly, so I am quite happy with the progress I made so far. I spent the second half of the week trying to get 3dna running on my computer so that I am able to visualize the conformations that the new emDNA program generates. This ended up giving me lots of trouble so for now I will just use the online version. The next step is to build some dna minicircles, run them through emDNA in both the tetrameric and dimeric model and see how they compare!
I spent this week writing and fixing up python scripts that generate step parameters for dna minicircles. I now have 168 poly-A minicircles of different lengths and sequences that I am going to optimize both in the dimeric and tetrameric models and then I will compare their shapes and final elastic energies. I still haven't got 3dna up and running and cannot troubleshoot because the 3dna forum website is down. So for now I am using the web tool which is a bit tedious but gets the job done. I am hoping that next week I will finish processing all the minicircles and can start making graphs and diagrams to visualize the results! Additionally, at lab meeting, Dr. Olson provided me with a paper about how tetrameric intrinsic parameters differ from the dimer intrinsic parameters in B-DNA. So I will take a look at that and see if the parameters I calculated follow the same patterns.
This week I ran the optimization on the polyA minicircles of lengths 80-102 base pairs with a different base pair at the 40th index and with a linking number of 8. I did the optimization with the Cohen2017 tetrameric model and with the Olson1998 dimeric model. Then I made the graphs to visualize the energy change and writhe for each minicircle. The minicircles that I optimized with the Cohen2017 model showed different periodic behavior in writhe compared to the ones optimized with the Olson1998 model. This is due to the fact that the AAAA tetramer in the Cohen2017 model has an intrinsic twist of about 36.5 while the AA dimer in the Olson1998 model has an intrinsic twist of about 35.1. Seeing these differences was exciting because it is an indication that the tetrameric model does indeed have an effect on the optimization. The fact that the differences are explainable by the intrinsic step parameters of the model is also a good sign that the tetrameric adaptation of the softwares is functioning correctly. At lab meeting Dr. Olson suggested I rerun the experiment with minicircles of an odd linking number which should show similar periodic behavior between energy and length but shifted by about 10 base pairs.
After meeting with Rob early this week I put together a list of goals to finish in the next 2 weeks before moving on to putting together my final report and presentation. The first thing I did was add a check in the force field packager to make sure that the external file with the parameters doesn't contain any duplicate tetramers. At this point in the project, I started thinking more about the user experience of the program that I edited because not all of the biologists and chemists who use the program have coding experiences. While it is impossible to use the tetrameric functionality without providing 400 lines files with the data for each tetramer, keeping track of 400 lines is quite difficult, so I have been thinking of possible additionally programs or python scripts I could provide to the user to help them generate these external files. The next thing I did was use recent data to compute the new parameters for a dimer model, called Cohen2017, and I added this model to the internal list of sequence dependent models in the program. Then I redid the polyA minicircle experiment with minicircles lengths 80-102 base pairs and linking numbers 8 and 9. I optimized these minicircles using the Olson1998 model, the Cohen2017 dimeric model and the Cohen2017 tetrameric model. When graphing the final energies and writhes of the optimized minicircles I observed the expected periodic behavior in both the circles with linking number 8 and linking number 9. On Friday I met with Nicolas Clauvelin, the author of the emDNA program. I presented an overview of the changes I made to the code and received his input. Sometimes in the next 2 weeks, we will work together to merge the tetrameric branch that I have been working on with the main branch in Github.
This week I started wondering if the different sequence-dependent models were causing not only a shift in periodic behavior but also a decrease in period length. So I decided to also optimize DNA minicircles of lengths 103 to 150 base pairs. The optimization took more than 24 hours, but plots I got out of it looked great! And my suspicion was confirmed, the tetramer model had a slightly shorter period than the dimer models. Towards the end of this week I started cleaning up my plots and planning out my final report.
It was a great final week! I Finished up my final report and gave my final presentation on Thursday. I cannot thank the DIMACS program enough for the knowledge, skills, and connections I gained from this REU experience!
This work was carried out while the author, Zoe Wefers, was a participant in the DIMACS REU program at Rutgers University, supported by NSF grant CCF-1852215.