Spectral Clustering in Birthday Paradox Time

Wednesday, April 22, 2026 - 4:00pm to 5:00pm
Location: 
32-G575
Speaker: 
Weronika Wrzos-Kaminska (EPFL)
Biography: 
https://weronikawk.github.io/
Given a vertex in a (k,φ,ϵ)-clusterable graph, i.e., a graph whose vertex set can be partitioned into a disjoint union of φ-expanders of size ≈n/k with outer conductance bounded by ϵ, can one quickly tell which cluster it belongs to?
 
This question goes back to the expansion testing problem of Goldreich and Ron'11. In the case k=2, a sample of ≈n^{1/2 +O(ϵ/φ^2)} logarithmic length walks from a given vertex suffice to determine its cluster, by the birthday paradox. The study of the general case k>2 was initiated by Czumaj, Peng and Sohler [STOC'15], and the works of Chiplunkar et al. [FOCS'18], Gluch et al. [SODA'21] showed that ≈poly(k)⋅n^{1/2+O(ϵ/φ^2)} random walk samples suffice for general k. This matches the k=2 result up to polynomial factors in k, but creates a conceptual inconsistency: if the birthday paradox is the guiding phenomenon, then the query complexity should decrease, as opposed to increase, with the number of clusters k!
 
We design a novel representation of vertices in a (k,φ,ϵ)-clusterable graph using a mixture of logarithmic length walks. This representation uses the optimal ≈(n/k)^{1/2+O(ϵ/φ^2)} walks per vertex, and allows for a fast nearest neighbor search. This gives a clustering oracle with query time ≈(n/k)^{1/2+O(ϵ/φ^2)} and space complexity k⋅(n/k)^{1/2+O(ϵ/φ^2)}, matching the birthday paradox bound.
 
Based on joint work with Michael Kapralov and Ekaterina Kochetkova.