Dynamic Matrix Approximations for Data-Streaming Applications

Adam Kirsch
Mentors: Graham Cormode, S. Muthu Muthukrishnan

Project Summary

Many information retrieval techniques rely on analyzing the spectral structure of a large matrix. Unfortunately, the matrix of interest is often so large that it cannot be stored in memory, so progressive algorithms that make few passes over the data become desirable. In fact, it is conceivable that the given matrix is so large that it cannot even be stored on disk, and that it is presented to us in the form of increments and decrements to the coefficients from multiple sources. This is the data streaming scenario. We are looking for small-space data structures that provide good approximations for the spectral structure of a matrix in this setting. So far, our techniques are based on results from the study of random matrices. If you like, you can take a look at some slides.

References

D. Achlioptas and F. McSherry, Fast Computation of Low Rank Matrix Approximations, Proceedings of the 33rd Annual Symposium on Theory of Computing, 2001.

G. Cormode and S. Muthukrishnan, Improved Data Stream Summaries: The Count-Min Sketch and its Applications, DIMACS Tech. Report 2003-20, 2003.

R. A. Horn and C. A. Johnson, Matrix Analysis, Cambridge University Press, 1985.

P. Drineas and R. Kannan, Fast Monte-Carlo Algorithms for Approximate Matrix Multiplication, Proceedings of the 42nd Annual Symposium on Foundations of Computer Science, 2001.

P. Drineas and R. Kannan, Pass-Efficient Algorithms for Approximating Large Matrices, Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2003.

A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carlo Algorithms for Finding Low-Rank Approximations, Proceedings of the 39th Annual Symposium on Foundations of Computer Science, 1998.

Z. Furedi and J. Komlos, The Eigenvalues of a Random Symmetric Matrix, Combinatorica 1(1981), no. 3, 233-241.

M. Krivelevich and V. H. Vu, On The Concentration of Eigenvalues of Random Symmetric Matrices, Tech. Report 60, Microsoft Research, 2000.

S. Muthukrishnan, Data Streams: Algorithms and Applications, http://athos.rutgers.edu/~muthu/stream-1-1.ps, or search for "adorisms" using Google.

Y. Saad, Numerical Methods for Large Eigenvalue Problems, Manchester University Press, 1992.