Low-rank matrix approximation and its close relative, principal component analysis, are ubiquitous in statistics and machine learning. To address computational concerns as data size increases, recent years have a seen a flurry of work on fast approximation algorithms for these problems. I'll present a line of research that introduces a new approach for low-rank approximation which gives strong provable guarantees for any data matrix. Our methods iteratively select a subsample of representative rows or columns from a matrix which are then used to compute approximate principal components.
Our methods match the runtimes guaranteed by the fastest known "random projection" methods for low-rank approximation. However, they can be much faster for sparse or structured data and can be applied to data problems where matrices are not represented explicitly. Specifically, combined with the Nystrom method, our sampling scheme gives the first algorithms for provably approximating kernel matrices in time *linear* in the number of data points. This leads to the fastest known methods for kernel ridge regression, kernel principal component analysis, kernel k-means clustering, and kernel canonical correlation analysis.
This talk is based on work with Michael B. Cohen and Cameron Musco that appears in preprints http://arxiv.org/abs/1511.07263 and http://arxiv.org/abs/1605.07583.