Tengyu Ma: Analyzing Non-convex Optimization: Matrix Completion and Linear Residual Networks

Wednesday, November 30, 2016 - 4:00pm to 5:00pm
Tengyu Ma

Non-convex optimization is the main algorithmic technique behind
many state-of-art results in machine learning. It has been conjectured that
the landscape of many training objectives has the nice geometric property
that “all local minima are global minima,” and thus admits efficient
optimization algorithms.

In the first part of the talk, I will show that the optimization landscape
of matrix completion — a famous problem in ML with wide applications in
recommender system and collaborative filtering —  does have this property.
This implies that (stochastic) gradient descent from arbitrary
initialization can solve matrix completion in polynomial time.

Next, we will discuss linear residual networks, as a simplified model
towards the first-cut understanding of residual networks. We will give a
strikingly simple proof that arbitrarily deep linear residual networks have
no spurious critical points (=points with vanishing gradient that are not
global minima). In contrast, the landscape of standard linear neural
networks does have spurious critical points. This demonstrates that
re-parameterization using the identity shortcut connection can make the
optimization easier.

Based on joint works https://arxiv.org/abs/1605.07272 with Rong Ge and
Jason D. Lee, and https://arxiv.org/abs/1611.04231 with Moritz Hardt.https://toc.csail.mit.edu/node/1057#overlay-context=node/1054&overlay=node/1057/edit