Abstract: “I will discuss the following fundamental communication problem -- there is data that is distributed among servers, and the servers want to compute the intersection of their data sets, e.g., the common records in a relational database. They want to do this with as little communication and as few messages (rounds) as possible. Computing the intersection is at least as hard as the set disjointness problem, which asks whether the intersection is empty.
Formally, in the two-server setting, the players hold subsets S, T of the universe [n]. In many realistic scenarios, the sizes of S and T are significantly smaller than n, so we impose the constraint that |S|, |T | ≤ k. We give a smooth communication/round tradeoff which shows that with O(log^*k) rounds, O(k) bits of communication is possible, which improves upon the trivial protocol by an order of magnitude. This is in contrast to other basic problems such as computing the union or symmetric difference, for which Ω(k log(n/k)) bits of communication is required for any number of rounds. For two players, known lower bounds for the easier problem of set disjointness imply our algorithms are optimal up to constant factors in communication and number of rounds. We extend our protocols to m-player protocols, obtaining an optimal O(mk) bits of communication with a similarly small number of rounds.
To appear in PODC 2014, joint work with Joshua Brody, Amit Chakrabarti, Ranganath Kondapally and David Woodruff.”