Foundations For Learning in the Age of Big Data

Tuesday, April 8, 2014 - 4:15pm to 5:15pm
Refreshments: 
3:45pm in 32-G449
Location: 
32-G449
Speaker: 
Maria Florina Balcan, Georgia Tech
Biography: 
Maria Florina Balcan is an assistant professor in the School of Computer Science at Georgia Institute of Technology. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Faculty Research Fellowship, a Sloan Research Fellowship, and several paper awards at COLT. She is currently a board member of the International Machine Learning Society and Program Committee chair for COLT 2014.

Abstract: With the variety of applications of machine learning across science, engineering, and computing in the age of Big Data, re-examining the underlying foundations of the field has become imperative. In this talk, I will describe new models and algorithms for important emerging paradigms, specifically, interactive learning and distributed learning.

For active learning, where the algorithm itself can ask for labels of carefully chosen examples from a large pool of unannotated data with the goal of minimizing human labeling effort, I will present results giving computationally efficient, optimal label complexity algorithms. I will also discuss learning with more general forms of interaction, as well as unexpected implications of these results for classic supervised learning paradigms.

For distributed learning, I will discuss a model that for the first time addresses the core question of what are the fundamental communication requirements for achieving accurate learning.  Broadly, we consider a framework where massive amounts of data is distributed among several locations, and our goal is to learn a low-error predictor with respect to the overall distribution of data using as little communication, and as few rounds of interaction, as possible.  We provide general upper and lower bounds on the amount of communication needed to learn a given class, as well as broadly-applicable techniques for achieving communication-efficient learning.