Abstract:
We survey recent advances in the approximate nearest neighbor search
problem in high-dimensional Euclidean/Hamming spaces, which go beyond
the classic Locality Sensitive Hashing technique for the problem. The
culmination of these advances is a new optimal hashing algorithm that
achieves the full trade-off between space vs query time. For example,
we obtain the first algorithm with near-linear space and sub-linear
query time for any approximation factor greater than 1, which is
perhaps the most important regime in practice.
Our algorithm also unifies, simplifies, and improves upon the previous
data structures for the problem, combining elements of data-dependent
hashing and Locality Sensitive Filtering.
Finally, we discuss matching lower bounds for hashing algorithms, as
well as for 1- and 2- cell probe algorithms. In particular, the 2-
cell probe lower bound exploits a connection to locally-decodable
codes, and yields the first space lower bound that is not polynomially
smaller than the 1-probe bound (for any static data structure).
Joint work with Thijs Laarhoven (IBM Research Zurich), Ilya
Razenshteyn (MIT), and Erik Waingarten (Columbia). Preprint: