Morteza Zadimoghaddam: Randomized Composable Core-sets for Distributed Submodular and Diversity Maximization

Friday, September 11, 2015 - 4:00pm to 5:00pm
Morteza Zadimoghaddam

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the diversity, coverage, monotone and non-monotone submodular problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings.

In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds. We also provide improved approximate core-sets for diversity maximization.