TOC Seminar Fall 2010 | MIT CSAIL Theory of Computation

Sept 7

Dana Moshkovitz

MIT

Hardness of Approximately Solving Linear Equations Over Reals

We consider the problem of approximately solving a system of homogeneous linear equations over reals, where each equation contains at most three variables.

Since the all-zero assignment always satisfies all the equations exactly, we restrict the assignments to be ``non-trivial". Here is an informal statement of our result: it is NP-hard to distinguish whether there is a non-trivial assignment that satisfies 1-\delta fraction of the equations or every non-trivial assignment fails to satisfy a constant fraction of the equations with a ``margin" of \Omega(\sqrt{\delta}).

Unlike the well-studied case of linear equations over finite fields, for equations over reals, the best approximation algorithm known (SDP-based) is the same no matter whether the number of variables per equation is two or three.

Our result is motivated by the following potential approach to proving The Unique Games Conjecture:

1. Prove the NP-hardness of solving approximate linear equations over reals, for the case of three variables per equation (we prove this result).

2. Prove the NP-hardness of the problem for the case of two variables per equation, possibly via a reduction from the three variable case.

3. Prove the Unique Games Conjecture.

An interesting feature of our result is that it shows NP-hardness result that matches the performance of a non-trivial SDP-algorithm. Indeed, the Unique Games Conjecture predicts that an SDP-based algorithm is optimal for a huge class of problems (e.g. all constraint satisfaction problems by Raghavendra's result).

(This is joint work with Subhash Khot)

Sept 14

Sebastien Roch

UCLA

Phylogeny Estimation Beyond the Classical Model of Sequence Evolution

Much progress has been made in the design and analysis of efficient algorithms for reconstructing the tree of life from present-day molecular sequences. Most rigorous results however rely on biological assumptions which have been questioned in the phylogenetic literature. In this talk, I will discuss recent results on the probabilistic analysis of more realistic settings, including mutation-rate variations and insertion-deletion events.

Sept 21

Wojciech Szpankowski

Purdue University

TRIES

Digital trees have myriad applications, including the ubiquitous family of Lempel-Ziv data compression schemes, distributed hashing, and computational biology. Tries, introduced by de la Briandais in 1959, are digital trees that store keys, usually represented as strings, in external nodes (leaves). In the first part of this talk we present precise analysis of the trie's profile which is defined as the number of external nodes with the same distance to the root. It appears that the profile of tries exhibits several fascinating phenomena: Near the root of a trie, the profile is exponentially small (with the number of strings stored), then it decays in a logarithmic rate until it abruptly starts growing, first logarithmically, then oscillating polynomially, and finally tending polynomially to zero again. We next describe the average profile of another digital tree, namely the digital search tree in which keys are stored directly in internal nodes. As one possible application, we shall analyze a suffix trie arising in an error resilient Lempel-Ziv'77 scheme. These studies fall under the so called analytic information theory that applies complex-analytic techniques to problems of information theory. Following Hadamard's precept, we tackle these problems by such methods as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. If time allows, in the second part of the talk we move to the ``Beyond Shannon'' territory and describe the newly established NSF Science and Technology Center on Science of Information. Its mission is to develop rigorous principles guiding the extraction, manipulation, and exchange of information, integrating elements of space, time, structure, semantics and context. This is a multi-disciplinary and multi-institution effort that involves Purdue, Berkeley, Bryn Mawr, Howard, MIT, Princeton, Stanford, UCSD, and UIUC

Sept 28

Jonathan Kelner

MIT

Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs

The maximum flow problem and its dual, the minimum s-t cut problem, are two of the most fundamental and extensively studied problems in Operations Research and Optimization. They have many direct applications and are also often used as subroutines in other algorithms.

In this talk, I'll describe a fundamentally new technique for approximating the maximum flow in capacitated, undirected graphs. I'll then use this technique to develop the asymptotically fastest-known algorithm for solving this problem. For graphs with n vertices and m edges, the algorithm computes epsilon-approximate maximum flows in time \tilde{O}(m^{4/3})*poly(1/epsilon) and computes epsilon-approximate minimum s-t cuts in time \tilde{O}(m+n^{4/3})*poly(1/epsilon).

We compute these flows by treating our graph as a network of resistors and solving a sequence of electrical flow problems with varying resistances on the edges. Each of these may be reduced to the solution of a system of linear equations in a Laplacian matrix, which can be solved in nearly-linear time.

This is joint work with Paul Christiano, Aleksander Madry, Daniel Spielman, and Shanghua Teng.

Oct 5

Allan Sly

Microsoft Research

Correspondence of Computational and Statistical Physics Thresholds

I'll discuss the problem of approximately counting independent sets in sparse graphs and its relationship to the hardcore model from statistical physics. The hardcore model is the probability distribution over independent sets I of a graph weighted proportionally to \lambda^{|I|} with parameter 0<\lambda<\infty. We prove that at the "uniqueness threshold" of the hardcore model on the d-regular tree, approximating the partition function becomes computationally hard on graphs of maximum degree d.

Specifically, we show that unless NP=RP there is no polynomial time approximation scheme for the partition function (the sum of such weighted independent sets) on graphs of maximum degree d for \lambda_c(d) < \lambda < \lambda_c(d) + \epsilon where \lambda_c(d) is the uniqueness threshold on the d-regular tree. Weitz produced an FPTAS for approximating the partition function when 0<\lambda <\lambda_c(d) so this result demonstrates that the computational threshold exactly coincides with the statistical physics phase transition thus confirming the main conjecture of Mossel, Weitz and Wormald. We further analyze the special case of \lambda=1 and d=6 and show there is no polynomial time approximation scheme for approximately counting independent sets on graphs of maximum degree d= 6, which is optimal, improving the previous bound of d= 25.

Oct 12

Nikhil Bansal

IBM TJ Watson

Constructive Algorithms for Discrepancy Minimization

The problem of finding the minimum discrepancy coloring is the following: Given a collection of sets S1,...,Sm, color the elements red and blue such that each set is colored as evenly as possible. While several techniques have been developed to show the existence of good colorings, obtaining such colorings algorithmically has been a long standing question.

In this talk, we will describe the first algorithmic results for the problem that essentially match the known existential guarantees. Among other results, we show how to efficiently construct an O(n^{1/2}) discrepancy coloring when m = O(n). This matches the celebrated "six standard deviations suffice" result of Spencer up to constant factors.

Oct 19

Boaz Patt-Shamir

Tel Aviv University

Algorithmic Recommender Systems

Recommender systems help users identify objects they may find interesting, where objects may be books to read, films to watch, web pages to browse, and even other users to contact. Formally, the input to the system is the known preferences of the users, as deduced somehow from their past choices. While many interesting ideas have been developed to analyze characteristics of users (or objects) based on past choices, this approach suffers from a foundational theoretical flaw: feedback is ignored. Put simply, this setting is such that choices determine recommendations, but recommendations are supposed to influence choices.

In a recent line of work this gap was bridged by a simple algorithmic model that assumes that the system may ask the user's opinion on any object, and not only make recommendations about supposedly `nice' objects. Typically, algorithms in this model ask users for their opinions on controversial objects, and in return, the output consists of almost complete reconstruction of user preferences. In this talk we discuss this model and survey some basic and recent results. Surprisingly, it turns out that there are algorithms that can reconstruct user preferences (with high probability), using only a little (polylog factor) more questions than the minimum possible.

Oct 26

No Colloquium due to FOCS

Nov 2

Michael Kearns

UPenn

Two Vignettes in Computational Finance

I will discuss two results in quantitative finance in which algorithmic arguments play a central role.

In the first part, I will introduce the problem of order dispersion in dark pools, a relatively new kind of equities exchange in which traders seek to "invisibly" trade large volumes at market prices. I will present a provably efficient algorithm for near-optimal order placement in dark pools. This algorithm is based on methods from reinforcement learning and the Kaplan-Meier estimator from survival analysis.

In the second part, I will compare two different models for the arriving limit order prices in the standard continuous double auction mechanism of electronic trading. In one model, traders are seen as having "fundamental" views on price, while in the other they form prices only relative to those of other traders. I will quantify a sense in which the former model has highly stable dynamics and the latter has highly unstable dynamics, and discuss some implications for the ongoing debate over high frequency trading.

The talk will be self-contained, and the theoretical results illustrated by experiments on trading data.

Joint work with Kuzman Ganchev, Yuriy Nevmyvaka, and Jennifer Wortman Vaughan (dark pools); and with Eyal Even-Dar, Sham Kakade, and Yishay Mansour (dynamic instability).

Nov 9

Keren Censor-Hillel

MIT

Fast Information Spreading in Graphs with Large Weak Conductance

Gathering data from nodes in a network is at the heart of many distributed applications, most notably, while performing a global task. We consider information spreading among n nodes of a network, where each node v has a message m(v) which must be received by all other nodes. The time required for information spreading has been previously upper-bounded with an inverse relationship to the conductance of the underlying communication graph. This implies high running times for graphs with small conductance.

In this talk I will describe an information spreading algorithm which overcomes communication bottlenecks and thus achieves fast information spreading for a wide class of graphs, despite their small conductance. As a key tool in our study we use the recently defined concept of weak conductance, a generalization of classic graph conductance which measures how well-connected the components of a graph are. Our hybrid algorithm, which alternates between random and deterministic communication phases, exploits the connectivity within components by first applying partial information spreading, after which messages are sent across bottlenecks, thus spreading further throughout the network. This yields substantial improvements over the best known running times of algorithms for information spreading on any graph that has a large weak conductance, from polynomial to polylogarithmic number of rounds.

(Joint work with Hadas Shachnai, to appear in SODA 2011)

Nov 16

Amir Shpilka

Technion and

Microsoft Research

Recent Results on Polynomial Identity Testing

Polynomial Identity Testing (PIT for short) is the problem of efficiently and deterministically deciding whether a given (explicitly or via black-box access) arithmetic circuit computes the zero polynomial. This is one of the most fundamental problems in computational complexity and it is strongly related to the problem of proving lower bounds for arithmetic circuits. In this talk I will survey several results on the PIT problem. Specifically I will discuss the relation between derandomization of PIT and circuit lower bounds, explain the importance of derandomizing PIT in the bounded depth model and give an overview of the ideas behind several recent detrerministic PIT algorithms, designed for restricted circuit classes. At the end of the talk, time permitting, I will present what I find to be the most accessible important open questions in the field.

Nov 23

Ricky Rosen

Tel Aviv University

A Strong Parallel Repetition Theorem for Projection Games on Expanders

The parallel repetition theorem states that for any Two Prover Game with value at most 1-\eps (for \eps<1/2), the value of the game repeated n times in parallel is at most (1-\eps^3)^{\Omega(n/s)}, where s is the length of the answers of the two provers \cite{R,Hol}. For Projection Games, the bound on the value of the game repeated n times in parallel was improved to (1-\eps^2)^{\Omega(n)} \cite{Rao} and this bound was shown to be tight \cite{R08}.

In this paper we study the case where the underlying distribution, according to which the questions for the two provers are generated, is uniform over the edges of a (bipartite) expander graph.

We show that if \lambda is the (normalized) spectral gap of the underlying graph, the value of the repeated game is at most (1-\eps^2)^{\Omega(c(\lambda) \cdot n/ s)}, where c(\lambda) = \poly(\lambda); and if in addition the game is a projection game, we obtain a bound of (1-\eps)^{\Omega(c(\lambda) \cdot n)}, where c(\lambda) = \poly(\lambda), that is, a strong parallel repetition theorem (when \lambda is constant).

This gives a strong parallel repetition theorem for a large class of two prover games.

This is a joint work with Ran Raz.

Nov 30

Ryan Williams

IBM Almaden

Non-Uniform ACC Circuit Lower Bounds

The class ACC consists of circuit families with constant depth over unbounded fan-in AND, OR, NOT, and MODm gates, where m > 1 is an arbitrary constant. We prove:

- NTIME[2^n] does not have non-uniform ACC circuits of polynomial size. The size lower bound can be slightly strengthened to quasi-polynomials and other less natural functions.

- E^{NP}, the class of languages recognized in 2^{O(n)} time with an NP oracle, doesn’t have non-uniform ACC circuits of 2^{n^{o(1)}} size. The lower bound gives an exponential size-depth tradeoff: for every d there is an e > 0 such that E^{NP} doesn’t have depth-d ACC circuits of size 2^{n^e}.

These results finally make progress on some notorious problems in circuit complexity. (It was not known whether EXP^{NP} had depth-3 polynomial size circuits made out of only MOD6 gates.)

Dec 7

Rafail Ostrovsky

UCLA

Linear-Size Universal Hashing

In this talk, we resolve a 30-year-old line of work on the asymptotic computational complexity of universal hashing, an intensively studied topic since the introduction of universal hashing by Carter and Wegman (STOC 1977). Universal hash functions are fundamental algorithmic constructs, and their use is widespread throughout computer science. It was widely believed that pairwise-independent hashing, because of its pseudorandom properties, is inherently computationally costly. Our work shows that this is not the case by giving an explicit construction of linear-size circuits for universal hash functions, refuting a conjecture of Mansour, Nisan, and Tiwari (STOC 1990). We were led to this result through our study of methods to reduce the computational complexity of cryptographic primitives and protocols. Current constructions of cryptographic primitives typically involve a large multiplicative computational overhead that grows with the desired level of security. We explore the possibility of implementing cryptographic primitives (such as encryption, authentication, signatures, or secure two-party computation) while incurring only a constant computational overhead compared to insecure implementations of the same tasks. The talk will be self-contained and accessible to the general audience.

This is joint work with Yuval Ishai, Eyal Kushilevitz, and Amit Sahai.