Recent advances in neural networks interpretability suggest that superposition, the ability of a network to represent many more features than it has neurons, is a key mechanism underlying how neural networks compute. This work explores the theoretical foundations of computing in superposition, focusing on explicit, provably correct algorithms and their efficiency.
We introduce a lower bound technique that provides the first general lower bounds for neural network size. Using this technique, we show that for a broad class of problems, including permutations and pairwise logical operations, a neural network requires at least Omega(sqrt{m' log m'}) neurons and Omega(m' log m') parameters, where m' is the number of output features being computed. This establishes a limit to how much superposition a neural network can exhibit. Conversely, we prove a nearly matching upper bound: fundamental logical operations, such as computing a set of m’ pairwise AND operations in parallel, can be computed using O(sqrt{m'} log m') neurons and O(m' log^2 m') parameters.
These results have several implications. They demonstrate an exponential gap between computing in superposition (the focus of this work) and merely representing the features in superposition, which requires only O(log m') neurons by the Johnson-Lindenstrauss Lemma. Moreover, our lower bound implies that any of the methods used in practice to compress neural networks must produce a model with at least Omega(m' log m') parameters, regardless of the initial dense network size. We hope that our results open a path for applying complexity-theoretic techniques to future research on neural network interpretability.
Joint work with Nir Shavit.