We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves Oc(dnρ) query time and Oc(n1+ρ+nd) space, where ρ≤7/(8c2)+O(1/c3)+oc(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and ℓ1 norm with ρ≤7/(8c)+O(1/c3/2)+oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).
Our data structure is able to bypass the locality-sensitive hashing barrier by using data-dependent hash functions (previous work used functions that are oblivious to the data).
Joint work with Alexandr Andoni, Piotr Indyk and Nguyễn Lê Huy
A short summary of the paper can be found here.