Approximating Large Frequency Moments with Pick-and-Drop Sampling

Wednesday, May 22, 2013 - 4:00pm to 5:00pm
Vladimir Braverman
Johns Hopkins University

Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from
$\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j =
i\}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k =
\sum_{i=1}^n f_i^k$. We consider the problem of approximating frequency
moments in insertion-only streams for $k\ge 3$. For any constant $c$ we
show an $O(n^{1-2/k}\log(n)\log^{(c)}(n))$ upper bound on the space
complexity of the problem. Here $\log^{(c)}(n)$ is the iterative $\log$
function. To simplify the presentation, we make the following assumptions:
$n$ and $m$ are polynomially far; approximation error $\epsilon$ and
parameter $k$ are constants. We observe a natural bijection between
streams and special matrices. Our main technical contribution is a
non-uniform sampling method on matrices. We call our method a
\emph{pick-and-drop sampling}; it samples a heavy element (i.e., element
$i$ with frequency $\Omega(F_k)$) with probability $\Omega(1/n^{1-2/k})$
and gives approximation $\tilde{f_i} \ge (1-\epsilon)f_i$. In addition,
the estimations never exceed the real values, that is $ \tilde{f_j} \le
f_j$ for all $j$. As a result, we reduce the space complexity of finding a
heavy element to $O(n^{1-2/k}\log(n))$ bits. We apply our method of
recursive sketches and resolve the problem with
$O(n^{1-2/k}\log(n)\log^{(c)}(n))$ bits.

Relevant URL(S):
For more information please contact: Eric Price, ,