Estimating the Longest Increasing and Longest Common Subsequences

Tuesday, November 15, 2022 - 4:15pm to 5:15pm

Refreshments:

Milk & Cookies served 4--4:15 pm

Location:

32-G449 (Patil/Kiva)

Speaker:

Alexandr Andoni

Seminar group:

Theory of Computation (TOC) Seminars

Abstract. Longest Increasing Subsequence (LIS) is a fundamental statistic of a sequence, for which the decades-old Patience Sorting algorithm computes it on a sequence of length n in O(n log n) time. But how well can one estimate the (length of the) LIS in *sublinear* time, say, when it is >sqrt{n} ?

We present an algorithm that, given a sequence with LIS > n/k, approximates the LIS up to a factor of k^c in k*n^c time for any arbitrarily small c>0. Our run-time essentially matches the trivial sample complexity lower bound of ~k. All the past work had polynomial factors in both approximation and extra time slowdown.

An important corollary is the first algorithm to estimate the Longest *Common* Subsequence in O(n) time up to a sub-polynomial, n^o(1), factor.

We highlight two of the new ideas that may be of independent interest. First, we introduce a new type of sublinear query model. In the Genuine-LIS
problem, each sequence element may be either genuine or corrupted, and the goal is to estimate the LIS amongst the genuine elements only. The
new aspect is that we have unrestricted access to the sequence, but it costs to check whether a fixed element is genuine or corrupted --- hence we want to minimize the number of such tests. The second idea is a generic sampling scheme that allows efficient composition of recursive sampling algorithms.

Joint work with Negev Shekel Nosatzki, Sandip Sinha, and Cliff Stein.