Differential privacy is a rigorous definition of privacy in statistical databases that is the focus of a growing body of work in areas ranging across computer science, including algorithms, learning, game theory, and programming languages, among others.
In this talk, I will discuss how different notions of stability from learning theory can be used to design differentially private algorithms. We focus on designing algorithms for statistical model selection: Given a data set and a discrete collection of models, each of which is a family of probability distributions, the goal is to determine the model that best ``fits'' the data. This is a basic problem in statistics and machine learning.
Given a nonprivate model selection procedure f, we design corresponding differentially private algorithms that output the correct value f(x) when f satisfies any of several notions of stability on the data set x.
We give two classes of results: generic ones, that apply to any function with a discrete output set; and specific algorithms for the problem of sparse linear regression. In many cases, our algorithms match the optimal nonprivate sample complexity. Along the way, we develop the first robustness analysis of the popular LASSO estimator.
Based on joint works with Daniel Kifer and Abhradeep Guha Thakurta.