DIMACS
DIMACS REU 2019

General Information

me
Student: Austin Allen
Office: CoRE 444 (Morning), EE 208 (Afternoon)
School: Carnegie Mellon University
E-mail: apallen at andrew dot cmu dot edu
Project: Coding for the Service Capacity Problem

Project Description

Description coming soon...


Weekly Log

Week 1:

Tuesday (05/28/19): Today served as move-in day for all participants. We were then treated to a pizza party and some social interaction.

Wednesday (05/29/19): Today was the program orientation! We were introduced to everyone in the program, and what we should expect from the program. I later met with Dr. Soljanin's grad students Amir, Mehmet, and Pei to discuss potential projects I could work on over this summer. I was first introduced to the Service Capacity Problem by Mehmet and Amir. I became really interested in the project because it was a problem that I saw presented by Dr. Soljanin at JMM 2019 in Baltimore, which can be found here. I Also like the problem because it involves my interest in combinatorics. I was then introduced to the Covert Communication problem by Pei. I will be meeting with them in the Electrical Engineering building on Tuesdays and Thursdays from 2-5 PM. The rest of the day I continued to read up on both of the problems in order to get ready for my discussion with Dr. Soljanin.

Thursday (05/30/19): I continued reading up on both problems as I was still uncertain which of the two I liked more. I decided that I liked the Service Capacity Problem more, but waited until my initial meeting with Dr. Soljanin. I then met with Dr. Soljanin and we discussed more about the problem I should work on. We decided that it was best that I work on a variant of the Service Capacity Problem. Currently there may be uncertain connections to fundamental problems in Coding Theory that solve this variant, but there are some approaches that I could focus in on. More to come on my progress. We then had a website workshop taught by Parker which was rather useful since I had no prior experience with creating and maintaining a website.

Friday (05/31/19): I continued to work my way through the papers, which will certainly take me most of the weekend to work on. In the early afternoon we were treated to a talk by Neil Sloane on the Online Encyclopedia of Integer Sequences. Everyone really enjoyed the talk, and Neil was very engaging with the audience. I began working on my presentation for Monday.

Saturday (06/01/19): I finished work on my presentation and sent my slides to Dr. Soljanin for review. We discussed a little more what she would like to see me work on, and I plan on working on these tomorrow.

Sunday (06/02/19): I worked on a few elementary Coding Theory problems related to parity-check matrices and the minimum distance of a code. These were just a few exercises to warm me up to thinking about coding problems.

Week 2:

Monday (06/03/19): Today each of us gave a brief presentation on our topic of research. Because of the nature of our projects, it's rather difficult for the audience to grasp the overall meaning of the project in just 4 minutes. Overall presentations went well, and I became interested in many of the problems being worked on by my peers. In the afternoon I began looking into batch and private information retrieval codes as they could be important pieces of the puzzle to solve the converse service capacity problem. I prepared a few notes on the topic so that I could present to Amir, Mehmet, and Dr. Soljanin at tomorrow's lab.

Tuesday (06/04/19): I continued looking at different properties of batch and PIR codes to better aid the research process. Upon analyzing various properties of atch codes, I noticed that the [3,2,2]-simplex code, [7,3,4]-simplex code, and [15,4,8]-simplex code are all batch codes. This lead me to conjecture that perhaps simplex codes are indeed a sub-class of batch codes. Although it seems rather obvious, there is no proof that can be found in the literature. I presented all of my findings to Amir, Mehmet, and Dr. Soljanin. This was beneficial because they really tested my knowledge of what I have learned at discovered. Once the meeting ended, Dr. Soljanin wanted me to solve a simple service capacity problem, read more on PIR, and wanted me to prove that simplex codes are batch codes. There are two suggested proof techniques for showing the simplex codes are batch codes: using graph techniques and using convex lattices. More to come on this problem.

Wednesday (06/05/19): I split my time between reading up on PIR and working on the simplex code question. It turns out that the simplex code question is rather non-trivial, but in the case for a multiset of size t where only one distinct systematic symbol is requested, both the graph and lattice question is easy to solve. I discussed some of these observations with Amir and Mehmet in the afternoon. I then looked into the part of my problem that discusses "efficiency." In addition to supplying some coding scheme for a distributed storage system given the region of file requests, we must potentially apply the "most effcient" coding scheme. There happens to be a paper written about the very topic of optimal batch codes, which can be found here. Since the current suggestion to solve the converse service capacity problem is to look at batch codes, it would be great if we could not only show that batch codes could be applied, but that they are also optimal. I began reading a little more into this aspect, and I believe I can show that simplex codes are optimal. More on this to come.

Thursday (06/06/19): I worked more on the proof at hand and looking at PIR. I came up with a false proof that simplex codes are batch codes. The graph construction I utilized was not entirely correct, and this became more clear discussing the proof with Dr. Soljanin in the lab. At lab I was treated to a discussion on k-anonymity and then to Mehemet's thesis proposal on distributed storage. I discussed the notion of optimal batch codes with Dr. Soljanin and she would like more me to focus on this aspect as well.

Friday (06/07/19): Today I took time to read some of the softer literature on how codes are implemented in storage systems. I also discussed more on what I should expect from my project as well as possible directions to take with Amir and Mehmet. Next week should give me a much clearer picture on my progress and what direction I should be able to take with my project.

Week 3:

Monday (06/10/19): I continued working on the proof that simplex codes are batch codes. I had been juggling three different methods: graph construction, optimality, and polytope. I continued working with the graph construction, but it just wasn't going to work. Rather than having a complete bipartite graph as we had earlier, we had some r-regular graph that was semi-bipartite, i.e. one of the bipartitions actually had self-loops. I finally put this down and took some time to look back at the idea of optimal batch codes. Although there is nothing that is explicitly stated about simplex codes, some of the lemmas in the paper dealt with the duals of codes, so I thought maybe looking at what the authors discussed about Hamming codes might give me some inspiration. I soon attempted to re-create their proof that Hamming codes are optimal batch codes for simplex codes. This also turned out to be somewhat of a dead-end, but ultimately I have this paper in the back of my mind because if we show that simplex codes are linear batch codes, then it should be trivial to show that simplex codes are "optimal" since simplex codes are (2,t-1) linear codes with locality 2 and availability t-1. More information on locally repairable codes can be found here. Since another method that was suggested was a polytope method, I attempted reading up on convex polytopes and lattice polytopes but became a bit discouraged. This week should be fun however as it turns out that the problem is becoming more clear each passing day.

Tuesday (06/11/19): I began thinking of some algorithmic approach to the graph construction method. This problem is interesting because of how obvious it seems, however there is no proof that has been published in the literature to date. The problem with the algorithmic proof was proving a valid termination condition, which made the approach somewhat intractable/not rigorous. In the afternoon we were treated to a seminar talk by Maria Chudnovsky of Princeton University on induced subgraphs and coloring. Many students seemed to enjoy the talk, and it has been my favorite to date. I then met with Dr. Soljanin, Amir, and Mehmet after the seminar. We decided to abandon that particular graph construction approach and began focusing on the convex lattice polytope method. Part of my research this summer is to find equivalences between common problems in distributed stoarage/coding theory and the service capacity problem. One problem I began looking at was coding for network switches, i.e. looking at switch codes which are a subset of primitive multiset batch codes. I just barely scratched the surface of this paper and this paper. The rest of the day was dedicated to reading up on the basics of linear programming, integer programming, and convex optimization since any of these methods could potentially be applied to the convex lattice polytope method of proving that simplex codes are batch codes.

Wednesday (06/12/19): Today was dedicated to playing around with the polytope method. I wasn't too familiar with the definitions of convex polytopes and convex lattice polytopes, so I began by reading parts of a textbook by Grunbaum on the subject. The service capacity problem can be realized both in the continuous sense and the discrete sense. My focus is on the discrete sense, which means that not only do I want to show that simplex codes form a convex lattice polytope, but that the integer solutions can be "acheived." Recall that in general we want to show that any code we can apply to the servers are actually just batch codes. This means that some codes can't be represented by batch codes, at least through the polytope method. For example, since two of the extreme points for a [4,2] Reed-Solomon code are not integers, you can't convert the code into a convex lattice polytope. I was also able to confirm that some simple simplex codes do indeed form a convex polytope. In the afternoon we had our first Bridge Workshop on the Hadamard Maximum Determinant problem by Daniel Scheinerman.

Thursday (06/13/19): I began looking at a proof that simplex codes are batch codes via the polytope method. There are a few intermediate steps that one can take to show that simplex codes form a convex lattice polytope, some of which I have completed but won't yet divulge. I simply worked on these aspects for most of the day. In the afternoon I presented the service capacity problem to one of Dr. Soljanin's students.

Friday (06/14/19) Dr. Soljanin recommended that I look at a method from combinatorial optimization that could be an alternative to the polytope method, a method I call the matching method. One of the graph constructions from the first switch code paper can potentially lead us to more information on whether simplex codes are batch codes. This can potentially be derived from the proof of achievability of the simplex service rate region from this paper. The idea is to translate a fractional matching into a integer matching. This is what I will dedicate most of my time to next week.

Week 4:

Monday (06/17/19): Most of the day was spent visiting Nokia Bell Labs in Murray Hill, New Jersey. Everyone seemed to enjoy the trip, but most agreed that it potentially hindered productivity the rest of the day. When we arrived back to campus, I spent some time looking at the optimization statement of a fractional matching and covering. It turns out that there is a rather interesting connection between fractional matching and the achievability aspect of service capacity. Since covering is the dual of the matching problem, there may also be deeper connections with the covering problem and service capacity as well.

Tuesday (06/18/19): In the morning we had our second Bridge Workshop on the Three Permutations problem by Cole Franks. Most of the students seemed to enjoy the talk, and most of us got more involved in the collaboration questions as compared to the previous talk. In the afternoon I spent some more time reading up on the fractional problems mentioned earlier. I then explained fractional matching and covering to Amir, Mehmet, and Dr. Soljanin. We decided that it would be best to look at fractional matching and covering for some of the smaller examples we have worked with such as the [4,2] Reed-Solomon code and the simple mixed coding and repetition scheme. Amir and I discussed some of the constraints for these problems, and it was decided that I would first attempt to solve the linear program first using any method I wished to use, and then use programming.

Wednesday (06/19/19): I began working on the linear programming problem, and the answer seemed fairly obvious, which it was, however I wanted to focus on another aspect of the problem. Rather than finding the max fractional matching or min fractional covering of the entire graph, I wanted to look at certain regions of service capacity to determine of the max values on the boundary can be achieved. In essence this is one of the most central problems with dealing with service capacity. For this I used PuLP, a linear programming package implemented in Python. I made some interesting observations for the max matching problem for the [4,2] Reed-Solomon coded scheme, and I will continue to work with the fractional matching an covering for this scheme and the mixed coding and repetition scheme.

Thursday (06/20/19): I continued working with PuLP to deal with the fractional matching and covering problems. I also confirmed an interesting observation for the matching problem of the mixed coding and repetition scheme. Most of the day I dealt with trying to make the data more visual, but the package that I am working with is making it slightly diffcult. I might have to look into other LP packages or languages that are better suited for this kind of work. I also started to look at the covering problem, but ran into some issues because some constraints for this problem aren't well defined. This is something that I will continue to look at, since the dual problem is just as important to service capacity as the achievability aspect.

Friday (06/21/19): In the morning we were treated to a seminar by Sid Dalal of Columbia University on deep analytics. In the afternoon I met up with Amir and Mehmet to discuss some of my progress. Amir and I talked more about the covering problem, and we both agreed that we are very close to conjecturing an important result for this problem. I discussed more of the convex lattice polytope method with Mehmet, and I explained my progress with the problem. Dr. Soljanin emailed me in the late afternoon to explore the fractional matching for hypergraphs as well since we can expand our results to linear codes with locality > 2. This is something I will continue to work on next week. We are very close to some very important results!

Week 5:

Monday (06/24/19): Today I spent most of my time attempting to make the linear program for fractional matching of the [4,2] Reed-Solomon code and the mixed coded and repetition more visual. I was finally able to come some of the intitial dificulties I was having since I was using a linear programming package that I wasn't too familiar with. Because it's always best to continue looking at different methods and tools that can possibly be used to solve a problem, Dr. Soljanin recommended looking at a tool called a matching polytope. Not that this would necessarily help, but since we had been on the theme of polytope before and because there was a polytope modeling a situation we were dealing with, exploring this tool has its merits. This is something I will be reading up on and will be working on throughout this week.

Tuesday (06/25/19): In the morning I spent some time reading up on matching polytopes. It turns out that there is such an object called an integral matching polytope and a fractional matching polytope, which perfectly models the situation we are currently in. It's possible that the "continuous version" of the service capacity problem, i.e. the version that has already been considered by Dr. Soljanin and her colleagues, can be seen as a fractional matching polytope. Something that's of importance to us is the fact that similar to the fractional matching and integral matching number, whenever we are working with a bipartite graph the integral matching polytope is equivalent to the fractional matching polytope. This means that for codes that admit a bipartite structure, this equivalence may tell us something about the code's load-balancing properties, i.e. whether or not this is a batch code. In the afternoon we were treated to a seminar by Jie Gao of Stony Brook University on network algorithms. One of the topics in the presentation was on k-anonymity, which was interesting because this is a topic I have discussed with Dr. Soljanin, Amir, Mehmet, and a Rutgers student named Kendric in a previous lab meeting. Not that it's even remotely connected to what I have been doing here at DIMACS, but it's just something of interest to note. After the seminar I met Dr. Soljanin, Amir, and Mehmet in the lab and discussed a little more about the program I came up with for the fractional matching linear program. We also discussed more on the prospects of the matching polytope method, and I decided I would look into it a little more.

Wednesday (06/26/19): In the early morning I thought about these matching polytopes and tried to connect them with the idea of convex lattice polytope that we had been considering earlier. It seemed like I had developed a fairly logical idea that it's possible to prove that simplex codes form a convex lattice polytope, but there were some slight issues that I found out later in the day. After thinking about this problem we had our third bridge workshop on Ramsey Theory by Keith Frankston. Most of the group was fairly familiar and already enjoyed Ramsey Theory, so this was a nice little diversion from our research. In the afternoon I met with Dr. Soljanin and Amir for a two-hour problem session on the topic of these matching polytopes. My line of thinking from earlier didn't quite work, but we were able to figure out some interesting properties of the matching polytopes. The first obvious connection (that wasn't quite obvious to us at first) is the fact that the polytope that a simplex code forms is different from the matching polytope of the graph associated with some simplex code. The main idea that we were able to derive was that the matching polytope actually encodes more information than the simplex polytope, which may or may not be beneficial. It's certainly beneficial for the fact that the two polytopes are not equal. This means that the matching polytope actually has the potential to tell us something about a code's batch properties, whereas the "classic" polytope for the simplex code doesn't tell us much about batch properties. It will be interesting to see if anything comes from this matching polytope method, but it will take some time to explore such as complex topic.

Thursday (06/27/19): I started working on my final presentation, which is in two weeks, for two different reasons. Of course it's always nice to get a head start on a task, but it served as an exercise to reflect on the research I have done to this point. All of the students here at the DIMACS REU work extremely hard, to the point where we almost become robotic in performing tasks. We know exactly what we need to do and we are all extremely motivated to work towards that goal. The problem is that we tend to focus on our task rather than really understading what we are working on. Perhaps it would be best for us to all "stop and smell the roses" from time to time. There was a problem that I was also working on that was related to matchings in a bipartite graph and connections to the number of possible multisets one can create. This is an intimate result that is desperately needed to prove that Simplex codes are batch codes. More on this to come.

Friday (06/28/19): I continued working on my presentation and the problem about matching and multisets mentioned previously. Something that I looked into as well was expander properties of graphs. I had previously only hear about expander graphs, and my mentor from the last REU I was at had a Ph.D. student working with expander codes, so I decided to look into expander properties of our graph construction. There is definitely some really interesting literature in this area of mathematics, but unfortunately I probably won't have too much time to explore it, however I still look forward to reading up on it.

Week 6:

Monday (07/01/19): I contiued working on my presentation as well as looking into the matching polytope method.

Tuesday (07/02/19): I continued working on my presentation as well as looking into the matching polytope method. To break from some of the monotony of working on the same part of the problem, I decided to look into other coding schemes that coukd possibly be used for distributed storage systems. We have already considered Reed-Solomon codes, and these seem to be a little difficult to work with at the moment since they do not form a bipartite graph. A coding scheme that could potentially be used is Reed-Muller codes. Additionally, remebering the paper that I read on optimal batch codes, it turns out that Reed-Muller codes have some sort of batch properties. Since I was only partly familiar with Reed-Muller codes, I decided to read up on them. It's fairly obvious that we would like to be working with first-order Reed-Muller codes. These codes are defined via a recursive process. The important aspects about codes that we are interested in analyzing is their generator matrix. Unfortunately Reed-Muller codes don't have a nice systematic generator matrix, so it's a little difficult to represent them using our graph construction. Well, we can construct them via a graph but it doesn't really tell us much. More work will have to be done on this front. We aso had a seminar in scientific writing by Lazaros Gallos.

Wednesday (07/03/19): I continued working on my presentation as well as looking at the matching polytope method. I looked back at Reed-Solomon codes because I became a little cynical. I remebered that Reed-Solmon codes don't have a nice systematic generator matrix, yet whenever we talk about the service capacity of a Reed-Solomon code, we have several systematic nodes which means there is some systematic generator matrix representation of a Reed-Solomon code. I became a little frustrated attempting to look for the systematic generator matrix for a Reed-Solomon code. My main motivation for looking at the systematic generator matrix for a Reed-Solomon code is the fact that, although they are different from Reed-Muller codes, in many ways the generator matrices are pretty similar. So I thought the systematic generator matrix for a Reed-Solomon code could give me a little more insight into the systematic generator matrix for a Reed-Muller code.

Thursday (07/04/2019): Fourth of July holiday.

Friday (07/05/19): I continued working on my presentation and somewhat abandoned the matching polytope method. It's not necessarily that I don't have faith in the method, but I really needed to take a break from it as I was really only working on the same approach over and over again. On the Reed-Muller front, I became desperate and just started searching for some systematic generator matrix representation of Reed-Muller codes. I really didn't find too much and became a little hopeless. I also worked on the topic of matchings in multisets mentioned earlier for some time. Next week are the student presentations, so it will be interesting to see what everyone has been up to the past six weeks.

Week 7:

Monday (07/08/19): I continued to work on my presentation as well as look into Reed-Muller. I realized that it might not be necessary to have a systematic generator matrix, and just considered the graph of Reed-Muller given its non-systematic generator matrix. It's nice that there are actually some connections between Reed-Muller and Simplex code given its graph, but there is a big issue when only using one systematic vertex. Something that hasn't been mentioned is that although the first k columns of the generator matrix for a Reed-Muller code don't form the identity matrix, the first column has a 1 in the first position and zeros everywhere else. This is equivalent to a singular systematic vertex in the graph construction. The problem here is that for increasing size of the multiset, it's not possible to recover that systematic symbol more than once. So this was really just a big flop, and maybe looking at the systematic representation is something we need. We also had a graduate panel moderated by Parker Hund, in conjuction with two other REU's.

Tuesday (07/09/19): I continued working on my presentation as well as the work in Reed-Muller codes. Nothing really additional at this point.

Wednesday (07/10/19): In the morning we had our fourth and final bridge workshop on Fraisse classes by Samuel Braunfeld. None of us wer really familiar with infinite combinatorics or model theory, but because the topic was pretty different from what we are used to, we all still really enjoyed the workshop. Personally this was my favorite workshop! I was finally at the point of refining my presentation and beginning to practice. I will be presenting on Friday, and I am really excited for both presenting and listening to everyone else's research!

Thursday (07/11/19): In the morning we were treated to presentation from half of the REU group. Everyone was really impressive and professional. I thoroughly enjoyed the presentations! The rest of the day was spent refining and practicing my presentation for tomorrow. In the afternoon I presented to Amir and Mehmet, and they both had some great constructive criticism that I was able to use to make a great last final revision to my presentation.

Friday (07/12/19): Today I presented the research I have been working on over the past six weeks, and it went well! Everyone else on the second day also had really interesting progress, and I am glad that we could all take the time to listen to what everyone's been up to. I am excited for next week! It will be the czech students' last week as well as the americans (me) that will be going to Prague for two weeks. I want to see how much I can squeeze out of this last week here at DIMACS, but nothing is do or die since I will continue working on this problem in Prague and will also be working on it once I head back to Pittsburgh.

Week 8:

Monday (07/15/19):

Tuesday (07/16/19):

Wednesday (07/17/19):

Thursday (07/18/19):

Friday (07/19/19):

Week 9:

Monday (07/22/19):

Tuesday (07/23/19):

Wednesday (07/24/19):

Thursday (07/25/19):

Friday (07/26/19):

Week 10:

Monday (07/29/19):

Tuesday (07/30/19):

Wednesday (07/31/19):

Thursday (08/01/19):

Friday (08/02/19):


Presentations


Coding in Distributed Storage

Traditionally the field of Coding Theory was invented to correct errors in transmission of messages across a noisy channel. The typical model is some binary symmetric channel, where one transmits single bits 0,1 where there is some probability p associated with transmitting the bit incorrectly and some probability 1-p associated with transmitting the correct bit. The goal of my research doesn't necessarily pertain to the error correction aspect of Coding Theory. Recently coding has been applied to distributed storage systems and network switches, which seems a little unconventional. If you wish to learn more about the purpose of codes in distributed storage and how these codes are applied, I highly recommend this gentle introduction for beginners, and this article from Technion for those that are a little more advanced.


Additional Information


Acknowledgements

This work was carried out during the 2019 DIMACS REU program at Rutgers University, supported by NSF grant CCF-1852215. Additionally I would like to thank my mentors Dr. Emina Soljanin, Amir Behrouzi-Far, Mehmet Atkas, and Pei Peng for their unwavering support and dedication. Lastly I would like to thank Dr. Lazaros Gallos, Parker Hund, and the rest of the DIMACS REU crew for making the program enjoyable for everyone.

Visitor Hit Counter