General Information
Student: |
Dalton Burke |
Office: |
CoRE 442 |
School: |
University of Colorado Denver |
E-mail: |
dmtburke@gmail.com |
Project: |
Codes for Storage with Queues for Access |
Project Description
With the immense amount of big data applications cropping up everywhere, parallelization of computation is a must. However, even with many computers, competition for resources can hinder the performance of any one of them. We don't know which of the machines will be the lynch pin for the entire job, and it can change as time goes on. This summer, we will consider elements of Combinatorial Design and MDS codes to inform job scheduling algorithms to alleviate this hang up.
Weekly Log
- Week 1:
- Met with Dr. Soljanin, learned about the problem. Spoke with Amir about MDS coding for making data transmission robust, relation to making data storage robust, and parallel computing robust. Methods of job scheduling, join the shortest queue (JSQ), select d queues randomly and join the shortest of those (power-of-d), and MDS coding (our primary focus). Checkout line analogy. Worked on presentation and readings through the weekend. Readings: BIBD, Matroids, Fano plane.
- Week 2:
- Presented on monday, went well. Breaking ground on a python program to simulate jobs entering a system randomly. Simulation finished, is matching Amir's simulation. Did some work to make a usable api, and implemented some parallelization to make running many simulations simultaneously faster. I wonder if I can tell the multiprocessing.Pool to fork-rejoin my jobs 🤔. I'm going to start on the documentation, I'll probably upload that here later.
- Week 3:
- Elise and I ran simulations all week, testing tons of ideas. Through a simplification we realized that there are only 720 unique arrangements of the (7,3,1) design, with this we ran all of them. We also checked out a (10,4,2) design which has waaaay more orderings, so we sampled 5000 of them randomly to check out. The ordering used didn't seem to matter, some will perform better than others, but it isn't very consistent across runs. I think this is just randomness and they all perform pretty much the same, but I'm doing some more tests to be sure.
We also checked out how the chosen k (number of parts needed to call it good) affects the outcome, in summary it seems that the closer to the middle k is, the better, also that k = n is really bad. This was done for many policies, fork-rejoin seems to be the best, round robin and the combinatorial designs performed about the same in the middle, and random is the worst (duh). Tons and tons of plots were generated to show these results. We met with our mentors and now I'm working on implementing a few more things with the simulation (changing up the function to determine how long a job will take once being worked on by a server), then we'll look into a different problem involving vertex covers to efficiently handle demand for pieces of data.
- Week 4:
- Before, once a job piece reached a server, the amount of time the server took to finish that task (service time) was distributed exponentially, i.e. rexp(mu). When we split each job up in to k pieces (and then encoded to n) we changed the service time's distribution to rexp(mu*k), since the job was 1/kth the size, the parameter should be scaled by k. This week we changed that up, our service time model is now the addition of something job specific (denoted capital delta, dependent on k), and something machine specific (denoted tau, which was a random exponential again, but not dependent on k). We mostly were testing out constant deltas and found that there was a need for some balance factor, alpha, because the random exponential was overwhelming our choice of delta if delta was too small, and if delta was too large then the whole system was overrun. We will be checking out random stuff for the deltas in the near future.
- Week 5:
- We switched focus a little bit this week to the 'vertex covering' problem. We tried out a few linear algebra ideas with it but part of the problem was that the inequalities were harder to work with than we could manage with linear algebra methods. As it turns out, linear programming was a pretty good fit for the problem, so I ran a few things with my computer and it all seemed to match, so we went to speak with Amir, aaaand... these are exactly the things that Dr. Soljanin had already covered in a paper to get the results that I was trying to match. It didn't feel great that my work had already been done, but at least it was good enough that someone had published it. After speaking with Amir he said we should wait to meet with Dr. Soljanin again on Monday before proceeding. I had some other thoughts about the problem today (Sunday), so I went to the office and worked a few things out. I feel like what I accomplished was actually substantial enough to have a good presentation tomorrow, we'll see how it works out.
- Week 6:
- The presentation went well, what I came up with led to a very general approach to the kind of problem that we are working with. This approach yielded a scheme to try to maximize the amount that we could get from a clique like structure for coded data. I showed that with this scheme, the system is capable of handling certain amounts of demands. While it is certainly nice to know that the system CAN handle this amount of demand, it would be nice to know that this is the MAXIMUM amount of demand. What I have is an "Achievability" proof, that the system can handle a certain amount, and what is better (though often more difficult), is the "True Service Capacity" to show that the amount is the maximum that the system can handle. The linear programming ideas that I worked with last week actually yielded the true service capacity (which matched my approach), though only for a specific case. Our final presentation is next week, I'm hoping to squeeze a few more things out of the research before we show everyone what we've been up to.
- Week 7:
- Wow. What a week. We were doing a few things with the simulation to have a few more things to talk about for the presentation this week, playing around with a constant delta (minimum job time). A few simulations were showing some strange behavior and I was wracking my brain trying to see why something so weird was happening. Going through the debug logs of the simulation was maddening because everything appeared to be working exactly as intended. Over the next few days I had a realization about why something like this could be expected from a system like this, so we spoke with our mentors about it and they agreed that it was a fair (though very unintuitive) reason. We also spoke with the mentors about looking into a delta scheme which had some probability to give a small job or a large job, which we refer to as "mice and elephants." Unfortunately, we didn't have time to get that into the presentation. Speaking of which, the presentation went great. Sadly, we had to cut quite a lot of the development and build up to the problem since we were so constrained by time. Thankfully, the Neekon quote made people laugh, that's always nice. Now all that's left is to wrap things up and write the paper.
- Week 8:
- Wrapping things up has been a little more challenging than we thought it would be. We have some nice plots for the mice and elephants scheme, but beyond the plot we don't have much in terms of evidence or rigor. Instead of just one paper, we have decided to write one on each of the problems that we considered. This choice has left us with quite a bit of work to do to ensure that each paper is long enough and quality enough to stand on their own, and unfortunately this has reduced the amount of time that we have to work through the few things that we have left to do. We hoped to be able to publish these results but I don't think they'll be ready by the time we have to leave, hopefully we can continue to work on them after going home. If it works out, I'd like to finish the mice and elephants, as well as give a more complete scheme for the special (but more important case) of more systematic nodes than coded. I think this will be my last update, so I would like to say thanks to everyone, The NSF, DIMACS, Lazaros, and my mentors Dr. Soljanin and Amir.
A visualization of a distribution scheme based on the Fano Plane, the columns are queues and each cell is a task.
Presentations
Additional Information