DIMACS
DIMACS REU 2018

General Information

“incoming”
Student: Tim Stavetski
School: University of Notre Dame
E-mail: tstavets@nd.edu
Project: Nucleosome Structure and Packaging

Project Description

A single strand of DNA is around 2 meters long and so, in order to fit in the nucleus of a cell, each strand must be compacted or folded. Nucleosomes are the first level of DNA compaction, around 150 basepairs (bp) of DNA wrap around 1.8 times around 8 histone proteins. The wrapped DNA along with the histone proteins is referred to as a nucleosome. All of the nucleosomes on a strand of DNA appear as beads on a string, around 30-90 bp separate each nucleosome on a strand of DNA. We are interested in how these nucleosomes interact and pack together for further DNA compaction. The Protein Data Bank (PDB) is our primary source of data, it provides information of over 140,000 biological macromolecular structures; we will be focused on a subset of these structures, the nucleosomes, which comprise around 150 of the entries in the PDB. These 150 nucleosomes in the PDB differ by the specific proteins they wrap around, some have additional proteins. The primary piece of information we will look at for these nucleosomes is how they pack together when they are forced into a crystalline form. There is reason to believe that the way nucleosomes pack together in crystals is similar to the way they pack together in the nucleus of the cell. By Applying mathematical theories such as linear algebra and group theory, we hope to be able to comprehend the different ways nucleosomes pack, in such a way as to improve DNA models, especially those of nucleosome decorated DNA.


Weekly Log

Week 1 (May 30 - June 1, 2018)
The first week served mostly as an introduction. I met the people in the lab and learned a bit about what they are all working on. I read some introductory articles on Nucleosome structure and packing in crystals, as well as a bit of “Crystal Structure Analysis for Chemists and Biologists” by Glusker, and “Chemical Applications of Group Theory” by Cotton. I was also introduced to the Protein Data Bank and learned about how it was organized and how to use it.
Week 2 (June 4 - June 8, 2018)
On Monday every student researcher gave a short presentation on their summer research projects. My specific presentation can be found below, if I figure out how to upload it to a server. In terms of my actual research, this week was mostly spent reading and learning, but towards the end more concrete progress was started. I read the paper "A systematic analysis of nucleosome core particle and nucleosome-nucleosome stacking structure" by Nikolay Korolev, Alexander P. Lyubartsev, and Lars Nordenskiöld, and studided the coordinate system proposed within the paper. This coordinate system is designed to simplify measurements of relative position between two nucleosomes packed together. I then started to attempt implementing this in Mathematica, using the data in PDB. I managed to figure out how to import PDB files into Mathematica and how to extract the coordinate data from the PDB file.

I hope next week to be able start apply everything I've learned into mathematica- generating lattices and manipulating pdb files.
Week 3 (June 11 - June 15, 2018)
This week I was able to accomplish two things in mathematica. The first was to use the rules described in a PDB file for how a specific structure's crystal is generated to generate a crystal structure in Mathematica. A picture of a lattice built out of cylinders is below. The way the lattice is built is by starting with the 'asymmetric unit', this is the unit is used to generate the rest of the lattice, so for example in the picture below the unit cell is a single cylinder. There are three transformations applied to this unit cell, each transformation in this case being a reflection followed by a translation. This generates three more unit cells, and the three transformations are applied again. After several iterations of this process, the form of the crystal structure becomes apparent.
The goal for next week is to be able to import a pdb file and take just the phosphorus atoms (to ease the computational load) and apply the transformations, exporting new pdb files. Then we will be able to compare these files to files generated in simulations and see if they are similar.
“incoming”
Week 4 (June 18 - June 22, 2018)
Before discussing resarch progress, I want to talk about the seminar and field trip we had this week.

On Tuesday Charles Fefferman gave us a talk on the N-Body problem. This talk was extremely interesting because of who Charles Fefferman is and how he presented the information. Charles Fefferman was a child prodigy- he graduated the University of Maryland with a degree in Phyics and Math at the age of 17, earned his PhD in mathematics three years later at Princeton. He won the Fields Medal for his work in Analysis when he was 29 and continues to be an eminent mathematician working as a professor at Princeton. So to be in a seminar with such an impressively succesful mathematician was quite exciting. His actual talk was on the difficulties of solving the N-Body problem, which is probably a very interesting problem on its own, but I got the impression that he could've picked almost any sufficiently difficult problem to talk about without sacrificing any interest. He started by explaining the general problem and the specific constraints under which we would discuss it. Then he went through the frame of an argument, focusing solely on the intuitions and ideas behind each step, rather than on the technical details. That being said, he still spoke using the language of mathematics, and that is what really elevated the talk for me- the argument drawed heavilly on ideas that I have studied extensively in my math courses at Notre Dame, and so without understanding those concepts, his argument would have been opaque. This showed to me a taste of what having a broad and deep understanding of mathematics can make possible, the understanding and application of ideas to arguments leading to new insights and solutions. I always hear my professors and read mathematicians describe the beauty of mathematics, but this was my first experience of seeing what makes the subject beautiful. The ideas were not contrived, not surface level, not entirely abstract, but powerful and elegant. To add to this experience, he gave the talk without notes, power point, or other presentation tool- all he used were his words and the occasional drawing on the whiteboard. It was a fantastic talk.

The next day, we went on a field trip to the IBM Watson Research Center in Yorktown, New York. The reason this was a really cool experience is fairly straightforward. We got to see Watson, the computer that won Jeopordy in 2011, which was very surreal. We also got to see IBM's quantum computers; I think it's crazy that these actually exist, and so seeing them in person was truly remarkable.

Anyway, enough about field trips and seminars. This past week I figured out how to import pdb files into mathematica, extract the coordinate data from the atoms. For ease of computation, I filtered out all atoms except for Phosphorous, reducing the number of atoms in the file from over 15,000 to around 250. Then I was able to transform the data according to the rules and export a seperate pdb file for each transformation required to produce the crystal lattice.

Next week I hope to be able to do the same thing for molecular structures with non-orthogonal unit cells, I don't anticipate this being too difficult.
Week 5 (June 25 - June 29, 2018)
It turned out to be difficult, but for mildly frustrating reasons. The issue was that there are a lot of details that need to line up for everything to make sense. In the pdb files, they list all of the coordinates of all of the atoms. These coordinates are written in an orthonormal basis, but as far as I could tell this basis isn't related to the structure at all. The unit cell of a molecular strucutre is definied by three vectors, and when the unit cell is orthogonal these vectores line up with the x, y, and z axes. So what I had trouble figuring out was where these three vectors are oriented in relation to the basis vectors when the unit cell is not orthogonal. Further more, the transformations defined in the pdb file are defined in relation to the x, y, and z coordinates of a generic atom. If the unit cell is not orthogonal, then there are two sets of coordinates defining any specific atom- the ones in the orthonormal basis, and the ones in the unit cell basis. I was unsure which basis the transformations were defined in. Finally, for every non-orthogonal molecular sturcture I found, there were no transformations given. These three issues stymied me for much of the week. I finally resolved them on friday. It turns out that the unit cell is defined with the first unit cell basis vector lining up with the x axis, the next lying in the xy-plane measured by angle from the x-axis. Then the last is defined measured from the xy-plane in the direction of the positive z-axis.

Next week I hope to write this in python and push forward with the ideas in general.
Week 6 (July 2 - July 6, 2018)
This week I made much progress python. I was not too familiar with the python programming language, so I first spent some time learning how to write and use it. Then I started writing a script that takes a xyz and pdb file as command line arguments, and outputs corresponding files for the crystal latitice. It took a little bit of work to get it to work correctly, a couple syntax mistakes kept tripping me up for a while, but eventually I had the program doing what I wanted. Because of my earlier work in Mathematica, I was able to code for the various different cases fairly quickly. I also started thinking about my final presentation- which we have to present next week (next week, however, is NOT the final week).

Next week I hope to improve on the script and prepare my presentation.
Week 7 (July 9 - July 13, 2018)
This was an exciting week. I took a sizeable step forward in part of my goal, while also getting my final presentation ready. The step forward was an improvement in visualization. There is this softwear called Pymol, and it can be used to visualize molecular structures, such as nucleosomes. Since it supports the PDB file format, I am able to import the files I create into pymol, creating a visualization of the crystal lattice. The issue is that there are a lot of nucleosomes that make up the lattice, so it can be hard to look at and interpret/understand. What I did to help with this was to create a partial lattice by translating the unit cell along each unit cell basis vector. This partial lattice is much easier to engage with visually, as it has three distinct parts that come together to create a whole. This development had the added benefit of improving my final presentation.

My final presentation was succesful. I had difficulty balancing my explanations of background information (which is always tricky because the background is the chemistry/biology I am not intimately familiar with) and explanations of what I've been doing. The obvious issue is that if I don't explain the background, then it is harder to communicate what I have been doing and why I've been doing it. Were I more familiar with the bacground chemistry/biology, I would've had an easier time knowing what ideas to talk about and which ones to just ignore. My mentor Prof. Olson helped me figure this balance out as well as give advice on how the presentation should be structured. I decided to focus on the visual aspects of what I have been doing, and as a result my presentation was shorter than the majority of the others, but it was still effective. It should be linked below.

Next week I hope to pretty much finish with the python script and begin to think about the final paper.

To conclude this week, I put some of the visuals I prepared using my python script and Pymol. There is one that shows the full lattice and there are three partial nucleosome lattices (along the unit cell vectors).
“incoming”
“incoming”
“incoming”
“incoming”
Week 6 (July 16 - July 20, 2018)
In Progress..

Presentations


Additional Information