DIMACS
DIMACS REU 2015

General Information

Mountain View
Student: Hadley Black
Office: CORE 450
School: University of California, Santa Cruz
E-mail: hadleyblack92[at]gmail[dot]com
Project: Exploration of OEIS

Project Description

The On-line Encyclopedia of Integer Sequences (OEIS) was founded by Neil Sloane in 1964. It began as a modestly sized handbook comprised of interesting integer sequences from different areas of mathematics. Today it is a massive online database containing almost 260,000 sequences. Our project is to produce graph representations of OEIS. From here our aim is to study the structure of the graph and produce a tool to be used by an OEIS user to assist in navitation of the data. One of our goals is to implement a heirarchical clustering algorithm in order to get a macro-level view of the graph. This should allow us to algorithmically classify groupings of sequences by keywords which describe the area of mathematics they come from and hopefully offer insight into the nature of their connection.


Week 1:
I began this week familiarizing myself with GraphStream, a java library for visualizing dynamic graphs. I then wrote several Python scripts for mining various data from the OEIS html files. After collecting some data we were able to visualize some nice graphs induced by co-reference and coloured by keyword. On Friday we gave an initial presentation of our project to the group. We also met with a PhD. student to formalize the ER diagram for the database she will be helping us implement to store our collected OEIS data.
Week 2:
My goal for this week was to finish collecting data to populate our database. One major challenge was writing a script to parse the references section for sequences on OEIS. For each reference, the idea was to come up with a way to find substrings which comprised the title, author, and year of publication. This was difficult since reference formatting is inconsistent on OEIS, but after some work I was able to find a solution which was ~95% accurate. For now we have enough data collected for an initial implementation of our database. Our next step is to begin an implementation of a hierarchical clustering algorithm for the graphs we are producing.
Week 3:
This week I was able to produce an internal memory implementation of a clustering algorithm described by Akhremtsev et. al [1]. So far I have only implemented the contraction phase of the algorithm. Some of this week was also dedicated to finishing our data collection from OEIS. This is now complete for the time being and one of my team members is working towards populating our database. I was also able to produce a semi-external graph peeling algorithm. Unfortunately its run time is O(n^2) which is too slow for our input size, but I hope to improve on this in the future.
Week 4:
I've made some progress extending my clustering algorithm to work semi-externally. This means the algorithm can execute while the edge set of the graph is stored in a file while the vertex set and cluster information is in internal memory. I also was able to get my clustering to produce a file containing the full hierarchy tree in a recursive format which can be read by a graph hierarchy visualization tool built by one of my team members. At the end of this week I was able to implement a fast graph peeling algorithm described in [3] such that peeled subgraphs are treated as expandable clusters rooted at their anchor points at the lowest level in the hierarchy.
Week 5:
This week I implemented the refinement phase and second contraction phase of the algorithm described in [1].
Week 6:
I added some features to my clustering algorithm this week. One feature is that after a clustering is computed the algorithm identifies clusters with size above some defined threshold and runs label propagation again on the graphs induced by these clusters. Another feature I added is vertex splitting as described in [2]. If for some vertex v, deg(v) > t, where t is a defined parameter, then v is split into \lceiling deg(v) / t \rceiling vertices connected in a cycle and with the incident edges of v split among them. There is an additional "central" vertex connecting the new vertices forming a cycle. Thus high degree vertices are replaced by wheel graphs. This helps to produce a more balanced clustering.
Week 7:
This week was mainly dedicated to preperation for our final presentation. In addition to this I finalized some details of my clustering algorithm. Our next steps are to begin drafting a paper as well as finalizing the integration of the components of our OEIS navigator. On Monday 7/20 I leave for Prague to attend the Midsummer Combinatorial Workshop held at Charles University.

References

Presentations


Team Members

Additional Information