General Information • Project Description • Project Log • Links
Gato (http://gato.sf.net), an open source software system written in Python, animates graph algorithms to show their dynamic behavior. A wide range of bioinformatics algorithms - such as aligning two sequences to assess their similarity, assembling complete genomes from fragments produced by sequencing, answering questions about the origin of species by constructing phylogenetic trees - can either be phrased as graph algorithms or can operate on graphs. This project involves creating animations for such bioinformatics algorithms. Week 1: Arrived at Rutgers after a chaotic trip from Ohio. To become familiar with the software, I implemented a topological sort algorithm. Week 2: I started working on an animation for the construction of perfect binary phylogenetic trees. This one is trickier to animate, since it requires operations on a matrix as well as a graph. Week 3: I finished work on the perfect binary phylogenetic tree algorithm. Might put up some screenshots soon. Week 4: I tinkered around with the animation for the perfect binary phylogenetic tree algorithm. Now it highlights edges and the algorithm looks much simpler. Week 5: I started working on an animation for the ultrametric tree problem, which is a specific type of distance-matrix based phylogeny creation algorithm. Unfortunately, it turns out that many of the descriptions of the algorithm were a bit vague about the details necessary to implement them, which was quite frustrating. Week 6: I found a clear algorithm for the ultrametric tree problem and implemented it. I also took some time to make the animation "pretty". Edges shouldn't cross, left children should be to the left of their parents and lower on the canvas, etc. Fortunately the properties of ultrametric trees gave me clues about the number of leaf nodes, height of the tree, etc. so I could make a pretty drawing. Week 7: I looked into implementing an algorithm for figuring out genome rearrangements based on breakpoint graphs. Basically you've got something like a tumor cell DNA sequence and a normal cell DNA sequence, and the tumor one has been chopped up into bits and rearranged. We can model this behavior with breakpoint graph operations. A breakpoint graph operation is one where you cut two edges out of the graph and then reconnect the 4 vertices that were affected in a different way. There are some theorems about the minimum number of breakpoint operations required to make the normal graph look like the tumor graph. We're looking into how this works, and whether we can code an algorithm where the input is a genome rearrangement and the output is a possible list of operations that explain how the normal genome turned into that rearrangement. Week 8: We gave up on animating the genome rearrangement algorithm. Although the idea is simple enough, the algorithms we found were really long and complicated. (The main paper on this problem involves things called superhurdles and fortresses. It's scary.) What I ended up coding was a little program where you could select two edges and perform a breakpoint operation on them. It reminds me of a group game called "The Giant Knot" where you all cross arms in a circle and grab someone else's hand at random. Then the whole group is knotted up, so you have to step over people's hands and under arms to "untie the knot." At the end, you're only holding hands with the two people next to you in the circle. Interesting problem, but since this is the last week of the program, I focused on starting to write my very first academic paper with my mentor's help. We hope to be done with it by the end of August, so I'll be finishing this project long-distance from Rochester, NY. Hooray for a productive summer! |