Yoram Singer: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Tuesday, April 23, 2019 - 4:00pm to 5:00pm

Refreshments:

Light Refreshments at 3:45pm

Location:

Patil/Kiva G449

Speaker:

Yoram Singer, Princeton University

Seminar group:

Theory of Computation (TOC) Seminar Joint with LIDS Seminar

Abstract:

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training some of the largest deep models used at Google.

Bio:

Yoram Singer is the head of Principles Of Effective Machine-learning (POEM) research group in Google Brain and a professor of Computer Science at Princeton. He was a member of the technical staff at AT&T Research from
1995 through 1999 and an associate professor at the Hebrew University from 1999 through 2007. He is a fellow of AAAI. His research on machine learning algorithms received several awards.