Ryan O'Donell: Lots of fun things...(and Quantam PCA)

Friday, October 9, 2015 - 1:30pm to 4:30pm
Pizza at 1:15p
Harvard, Maxwell-Dworkin 119
Ryan O'Donell

Abstract: Uncompressed Title: Lots of fun things, like longest increasing subsequences in random strings, property testing of probability distributions, the Robinson-Schensted-Knuth algorithm, some interesting symmetric polynomials, and basic representation theory of S_n. (And quantum PCA.)

Suppose we want to learn something about an unknown probability distribution p = (p_1, p_2, ..., p_d) on [d] = {1, 2, ..., d}. Assume we get to see n independent draws from p, forming a string w in [d]^n. If we want to learn p to accuracy eps, having n ~ d/eps^2 is necessary and sufficient. (This is one of the most ancient results in statistics: the algorithm is to just output the 'empirical histogram' of w.) If we only want to distinguish whether p is the uniform distribution or eps-far from it, then having n ~ sqrt(d)/eps^2 is necessary and sufficient. (This was only determined in 2008!)

We investigate the problem of learning an unknown d-dimensional *quantum state*. This is a strict generalization in which, roughly speaking, p is a probability distribution over d unknown orthonormal *vectors*. By virtue of some basic results in representation theory (which we will explain), a large portion of the job reduces to the following: Suppose we want to learn about an unknown probability distribution p, but instead of getting to see the random string w we only get to see the data (LIS_1(w), LIS_2(w), ..., LIS_d(w)), where LIS_k(w) denotes the total length of the longest k disjoint increasing (=nondecreasing) subsequences in w. Now how large does n need to be to understand various properties of p? (For example, is it true that p_max ~ E[LIS_1(w)]?)

This is joint work with John Wright of Carnegie Mellon. A one-hour overview of this material will be presented at the Charles River Lectures on Probability on Oct. 2 but attending that event is not a prerequisite for this talk.