Sebastien Bubeck: Kernel-based methods for bandit convex optimization

Friday, October 21, 2016 - 1:30pm to 4:00pm

Refreshments:

Pizza at 1:15p

Location:

TBA

Speaker:

Sebastien Bubeck

Seminar group:

Harvard/MIT/MSR Reading Group

Abstract:  In bandit optimization one wants to optimize some unknown function using an approximate function value oracle. In the stochastic approximation literature one usually assumes that the oracle reports the correct value up to some zero-mean noise term, and that the noise is independent for each query. The bandit framework goes much beyond this independence assumption by saying that for each query there is a different function (potentially chosen adversarially given all the past choices) and what one cares about is to optimize the average function. In this three hours lecture I will present the first polynomial-time method with poly(dimension) sqrt(#queries)-regret for bandit convex optimization. This new algorithm is based on three ideas which will be described in details: (i) kernel methods, (ii) a generalization of Bernoulli convolutions (this a self-similar process that has been studied since the 1930's, most notably by Erdos), and (iii) a new annealing schedule for exponential weights (with increasing learning rate).
Joint work with Ronen Eldan and Yin Tat Lee.