Instance-by-instance optimal identity testing

Tuesday, November 12, 2013 - 4:15pm to 5:15pm
3:45pm in 32-G449 (Patil/Kiva)
Paul Valiant, Brown University

The amount of data being routinely processed by computers is exploding, and this puts a new demand on the algorithms community: do our algorithms make efficient use of their data? Many familiar problems can be reexamined in this light to yield significant and practical improvements. A small asymptotic algorithmic improvement, from needing terabytes of data to merely gigabytes, effectively cut costs by a factor of a thousand in a data-driven world.

In this talk we consider a fundamental hypothesis testing problem: given data from a probability distribution, was it drawn from a known distribution P, or from a distribution far from P? Our analysis reveals several distinct features that previous algorithms failed to take advantage of, and culminates by showing that these are essentially all the possible features an algorithm could take advantage of. Namely, we introduce what we call an "instance-by-instance optimal" algorithm and show that our algorithm, when testing against an input distribution P, performs within constant factors of the best possible performance of the best possible algorithm customized for P.

Our analyses introduces a new way of using linear programming duality to automatically analyze a broad class of inequalities that generalizes Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms.

Joint work with Gregory Valiant