DIMACS
DIMACS REU 2015

General Information

me
Student: Daniel Mawhirter
Office: CORE 450
School: Colorado School of Mines
E-mail: dmawhirt[at]mines[dot]edu
Project: Enhancement of OEIS

Project Description

The goal is to allow users of OEIS to explore the collection of sequences visually. We are creating visual tools to display a conceptual map of the data which is easily navigable, has semantic meaning, and remains visually appealing when viewing large amounts of data. These tools should be useful beyond OEIS once complete.


Weekly Log

Week 1:
I setup a python script to cache all 258,545 html pages from the oeis site, running it on 4 computers simultaniously to speed up the process and get it to finish overnight. It raised a problem, however, as having that many files in one directory slows the OS to a crawl when trying to use the files inside. I wrote a second python script to reorganize the files into folders of 1000 files each, and the 260 folders in the top level directory are handled much better by Windows. The next step was to parse useful data out of the html, beginning with crossreferences. I wrote a python script for this as well, pulling the crossreference links out of each OEIS page. I also handled table size and list of contributors.
Week 2:
I refined the parsing of crossrefs to avoid picking up adjacent pages, and optionally, sequence in context. This, along with Hadley's work with references, allowed us to finish up data collection this week. I spent most of the week, however, building the view of a graph in which vertices are words from the page. This used the GraphStream heavily, and was done in Java and a small amount of CSS. I began work on the hierarchy-based dynamic visualization, also in GraphStream.
Week 3:
This week was primarily focused on producing the dynamic layout in GraphStream, such that the view at any given time is an antichain and the edges induced by its leaves. This requires a file format and lots of work on a tree structure which displays besude, and offers control over, the graph view. Also, we finally have programmatic access to the MySQL database, which is centrally hosted by DIMACS. It requires an SSH pipe to execute commands over the network, which took lots of troubleshooting to discover, but it now works. Populating it is in progress.
Week 4:
Populating the database ate Monday-Wednesday. Thursday I began work on the REST service and JS frontend for migrating the project to a web application. Friday I worked on the web app more and worked on generalizing the hierarchy viewer to support selecting views from all our work to date.
Week 5:
The web application handles nodes in javascript and queries a Java webservice to modify the edgeset as needed. It is a performance intensive application, and having no web experience, it's a lot of work. Busy week.
Week 6:
Users can now enter sequences and the browser will display a path view between them that is descriptive of the relationship between the chosen sequences. This is handled by another Java webservice and a modified view to acccomodate the different visual requirements. The view can be zoomed, panned, clicked, and nodes in view can be moved to stick in particular locations. Node and font sizes can be increased and decreased by the user at will. All of these features have also been propogated back to the hierarchy viewer.
Week 7:
Lots of smaller changes made impacting user experience, mostly preparing nice views for the presentation, then the final presentation itself.
Week 8:
We began writing a short paper for submission to an IEEE workshop first, and intend to submit a larger paper to a full conference in a few months. There are still a few hundred things to accomplish to perfect the online viewer, and those are in progress as well.
Week 9:
I wrote so much code this week. Improved the handling of peeling, performance improvements, modularity and generalization improvements. Besides that, it's just writing a paper.

Presentations


Additional Information