MSR/MIT Theory Day -- Friday June 20, 2014!

Mon, 06/16/2014
MSR/MIT Theory Day - Friday, June 20th

Location: Barton Room (first floor)
Microsoft Research, One Memorial Drive

10:30-11:15: Nikhil Bansal, "Minimizing Flow Time on Unrelated Machines"
11:15-12:00: Aaron Roth, "Correctness Protection via Differential Privacy"
12:00-1:00: Lunch Break
1:00-1:45: Ankur Moitra, "New Algorithms for Dictionary Learning"
1:45-2:30: David Woodruff, "Turnstile Streaming Algorithms Might as Well Be
Linear Sketches"
2:30-3:00: Coffee Break
3:00-3:45: James Lee, "Anti-concentration of Smoothed Functions and Talagrand's
Convolution Conjecture"

----------------------------------------------------------------------------------

Minimizing Flow Time on Unrelated Machines

Nikhil Bansal, Eindhoven University

The flow time of a job is defined as the amount of time the job spends in the
system, i.e. its completion time minus its arrival time, and is a widely
studied measure of quality in scheduling. I will describe the first
poly-logarithmic approximation algorithms for the problems of minimizing (i)
the total flow time and (ii) the maximum flow time on unrelated machines. The
main idea is to consider a new LP relaxation for these problems that has much
fewer constraints, and apply iterated rounding.

Joint work with Janardhan Kulkarni

----------------------------------------------------------------------------------

Correctness Protection via Differential Privacy

Aaron Roth, UPenn

False discovery is a growing problem in scientific research. Despite
sophisticated statistical techniques for controlling the false discovery rate
and related statistics designed to protect against spurious discoveries, there
is significant evidence that many
published scientific papers contain incorrect conclusions.

In this talk we consider the role that adaptivity has in this problem. A
fundamental disconnect between the theorems that control false discovery rate
and the practice of science is that the theorems assume a fixed collection of
hypotheses to be tested, selected non-adaptively before the data is gathered,
whereas science is by definition an
adaptive process, in which data is shared and re-used, while hypotheses are
generated after seeing the results of previous tests.

We note that false discovery cannot be prevented when a substantial number of
adaptive queries are made to the data, and data is used naively -- i.e. when
queries are answered exactly with their empirical estimates on a given finite
data set. However we show that remarkably, there is a different way to evaluate
statistical queries on a data set that allows even an adaptive analyst to make
exponentially many queries to the data set, while guaranteeing that with high
probability, all of the conclusions he draws generalize to the underlying
distribution. This technique counter-intuitively involves actively perturbing
the answers given to the data analyst, using techniques developed for privacy
preservation -- but in our application, the perturbations are added entirely to
increase the utility of the data.

Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, and Omer Reingold.

----------------------------------------------------------------------------------

New Algorithms for Dictionary Learning

Ankur Moitra, MIT

Sparse representations play an essential role in many fields including
statistics, signal processing and machine learning. But can we efficiently
learn a basis that enables a sparse representation, if one exists? This problem
has many names, and is often called dictionary learning or sparse coding. Here
we study a stochastic model for this problem where there is an unknown $n
\times m$ dictionary $A$ and our goal is to learn $A$ from random examples of
the form $Y = AX$ where $X$ is sparse and is drawn from an appropriate
distribution. Recently Spielman, Wang and Wright gave an algorithm that works
when $A$ has full column rank.

Here we give the first algorithms for the overcomplete setting ($m > n$). See
also independent, concurrent work of Agarwal et al. Our algorithms work for
$\mu$-incoherent dictionaries, which have long been a central object of study
in sparse recovery. We require $X$ to be $k$-sparse for $k$ approximately
$\sqrt{n}/C\mu \log n$. Notably, this bound is within a logarithmic factor of
the information theoretic threshold for sparse recovery on incoherent
dictionaries. Our work is based on new connections between dictionary learning
and overlapping clustering in random graphs, and also new insights into when
and why alternating minimization converges.

Joint work with Sanjeev Arora, Rong Ge and Tengyu Ma

----------------------------------------------------------------------------------

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

David Woodruff, IBM Almaden

In the turnstile model of data streams, an underlying vector x in {-m, -m+1,...,
m-1, m} is presented as a long sequence of arbitrary positive and negative
integer updates to its coordinates. A randomized algorithm seeks to approximate
a function f(x) with constant probability while only making a single pass over
this sequence of updates and using a small amount of space. All known
algorithms in this model are linear sketches: they sample a matrix A from a
distribution on integer matrices in the preprocessing phase, and maintain the
linear sketch Ax while processing the stream. At the end of the stream, they
output an arbitrary function of Ax. One cannot help but ask: are linear
sketches universal?

In this work we answer this question by showing that any 1-pass constant
probability streaming algorithm for approximating an arbitrary function f of x
in the turnstile model can also be implemented by sampling a matrix A from the
uniform distribution on O(n log m) integer matrices, with entries of magnitude
poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of
the number of possible states of Ax, as x ranges over {-m, -m+1, ..., m}^n,
plus the amount of randomness needed to store A, is at most a logarithmic
factor larger than the space required of the space-optimal algorithm. Our
result shows that to prove space lower bounds for 1-pass streaming algorithms,
it suffices to prove lower bounds in the simultaneous model of communication
complexity, rather than the stronger 1-way model. Moreover, the fact that we
can assume we have a linear sketch with polynomially-bounded entries further
simplifies existing lower bounds, e.g., for frequency moments we present a
simpler proof of the tilde{Omega}(n^{1-2/k}) bit complexity lower bound without
using communication complexity.

Joint work with Yi Li and Huy L. Nguyen

----------------------------------------------------------------------------------

Anti-concentration of Smoothed Functions and Talagrand's Convolution Conjecture

James Lee, University of Washington

I will present a proof of the Gaussian case of Talagrand's convolution
conjecture which asserts that the noised version of an arbitrary non-negative
function cannot be concentrated on any fixed scale.  In other words, Markov's
inequality is not tight for such functions.  This follows from a more general
logarithmic anti-concentration phenomenon for non-negative functions that are
not too log-concave.  The proof proceeds by analyzing a stochastic evolution of
the Gaussian measure to our target measure.  I will also discuss versions of
this process on the discrete cube and the relationship to Fourier analysis and
information theory.

Joint work with Ronen Eldan.