Peter van Emde Boas: History of the van Emde Boas Trees

Wednesday, March 18, 2015 - 3:30pm to 4:30pm
Peter van Emde Boas
University of Amsterdam

The stratified tree, also called van Emde Boas tree, is a data structure implementing the full repertoire of instructions manipulating a single subset A of a finite ordered Universe U = [0 ... u-1]. Instructions include member, insert, delete, min, max, predecessor and successor, as well as composite ones like extract-min. The processing time per instruction is O(log log u). Hence it improves upon the traditional comparison based tree structures for dense subsets A; if A is sparse, meaning that the size n = |A| = O(log u), the improvement vanishes.
Examples exist where this improvement helps to speed-up algorithmic solutions of real problems; such applications can be found for example in graph algorithms, computational geometry and forwarding of packets on the internet. The structure was invented during my three months postdoc residence at Cornell University in the fall of 1974.
Given the fact that MIT is the site of origin of the presentation of the vEB trees in the textbook of Cormen et al. it seems superfluous to explain today how the structure works. Instead I want to evoke the cultural climate in Algorithmics of the mid 1970-ies against which the structure was developed and given the form used in the early publications: the implementation of the vEB trees on a pointer machine.
There are several strategies for understanding how the O(log log u) improvement can be obtained. In contemporary literature, a direct recursive approach is used where the universe is divided into a cluster of sqrt(u) galaxies each of size sqrt(u); the set manipulation instructions decompose accordingly in a instruction at the cluster and galaxy level, but one of these two instructions is always of a special trivial type. The processing complexity thus satisfies a recurrence T(u) = T(sqrt(u)) + O(1). Consequently T(u) = O(log log u).
This simple recursive approach is deeply hidden in the presentation of the trees in the original papers, which better is described in terms of a binary-search-on-levels strategy. So how did I arrive at this more complex solution?
The reason is that the recursive approach requires address calculations on the arguments which use multiplicative arithmetical instructions. In 1974 these instructions were not allowed in the Random Access Machine model (RAM) which was the standard model in the developing research area of design and analysis of algorithms. Today the rules of the game have been changed and nobody will object against the use of such multiplicative instructions.
So my early implementations of the stratified trees were based on a different approach. In this approach the address calculations are not required, and the structure can be implemented using pointers. The downside of this approach is that it leads to rather complex algorithms, which are still hard to present correctly even today. Another bad consequence was the super linear space consumption of the data structure, which was only eliminated three years later.
A curious fact is that the version presented in the Cormen et al. textbook is again different from the recursive version I have presented in my later publications. (By the way, the earliest written presentation of this implementation is given in a classroom note of Don Knuth.) The difference is that in the textbook version the minimal element is not stored in recursive sub-trees, whereas in the original papers the insertion of the minimal element is delayed until a second element arrives in this substructure.