Yoram Singer: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Tuesday, April 23, 2019 - 4:00pm to 5:00pm
Light Refreshments at 3:45pm
Patil/Kiva G449
Yoram Singer, Princeton University
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training some of the largest deep models used at Google.
Yoram Singer is the head of Principles Of Effective Machine-learning (POEM) research group in Google Brain and a professor of Computer Science at Princeton.  He was a member of the technical staff at AT&T Research from
1995 through 1999 and an associate professor at the Hebrew University from 1999 through 2007. He is a fellow of AAAI. His research on machine learning algorithms received several awards.