Ukkonen’s Algorithm is an online method for constructing suffix trees in linear time. It relies on three core ideas:

  • Suffix Links
  • Four Optimisation Tricks

The algorithm incrementally builds an implicit suffix tree for each prefix of str[1...n], and then extends all suffixes with $ to produce the regular suffix tree.

The suffix links are introduced to speed up traversal, and the optimisation tricks to eliminate redundant computations. Without them, the algorithm can degrade to — worse than the naive approach.

Incremental Construction Of Implicit Suffix Trees

Given a string str[1...n], Ukkonen’s Algorithm proceeds over phases:

  • In phase , the algorithm constructs the implicit suffix tree for the prefix str[1...i] ()
  • In phase , it constructs incrementally from via suffix extension

A suffix extensions involves extending each suffix str[j...i] () from the implicit suffix tree, to include the new character str[i+1]. Specifically, it:

  • Locates the end of the path corresponding to the suffix str[j...i] in , then
  • Extends this path by appending str[i+1], modifying the tree structure based on a set of extension rules.

After phases, is converted to the regular suffix tree by appending the terminal symbol $ and running one final phase (). In practice, it’s simpler to append $ to str[1...n] at the start and run all phases directly.

Suffix Extension Rules

Suffix extension rules determine how each suffix str[j...i] is extended to incorporate the next character str[i+1].

Rule 1 — Leaf Extension

At suffix extension in phase , if the substring α = str[j...i] has not appeared prior to , then, as suffixes with common prefixes share the same path, its path must end at a leaf.

Thus, to accommodate the new character str[i+1], a simple leaf extension is employed

Rule 1 — Leaf Extension

At suffix extension in phase , if the path str[j...i] in ends at a leaf, then extend that leaf by adjusting the label of the edge to account for the added character str[i+1].

Rule 2 Regular — Internal Split

At suffix extension in phase , if the substring α has appeared prior to , and the prior a’s were followed by the same character x, then contains a root-to-leaf path representing α followed by x (and any subsequent characters)

Thus, to accommodate the new character str[i+1], the path for α must split.

Rule 2 Regular — Internal Split

At suffix extension in phase , if:

  • The path str[j...i] in doesn’t end at a leaf and,
  • The next character in the path is some x != str[i+1]

Then split the edge after str[j...i] by inserting a new internal node u followed by a leaf numbered , and assigning the character str[i+1] as the edge label between u and j

Rule 2 Alternate — Internal Append

At suffix extension in phase , if the substring α has appeared prior to , and the prior a’s were followed by different characters x and y, then contains a root-to-leaf path representing α ending at an internal node, which branches out to both x and y (followed by any subsequent characters)

Thus, to accommodate the new character str[i+1], another branch is appended to the internal node after α

Rule 2 Alternate — Internal Append

At suffix extension in phase , if:

  • The path str[j...i] in doesn’t end at a leaf and,
  • The next characters in the path are some x, y != str[i+1] and,
  • An internal node already exists at the end of str[j...i]

Then create a new leaf numbered branching from the internal node u, and assign the character str[i+1] as the edge label between u and j

Rule 3 — Do Nothing

At suffix extension in phase , if the substring α has appeared prior to , and the prior a’s were followed by same characters z = str[i+1], then already contains a root-to-leaf path representing α followed by str[i+1] = z.

In this case, no changes are required since the path str[j...i+1] is already present.

Rule 3 — Do Nothing

At suffix extension in phase , if:

  • The path str[j...i] in doesn’t end at a leaf and,
  • The next characters in the path is str[i+1],

Then str[i+1] is already in the tree, and no further action is needed

Naive Implicit Suffix Tree Construction

The following presents a base implementation of Ukkonen’s algorithm, without any optimisation.

def implicit_suffix_tree(str):
	n = len(str)
	Construct I₁
	for i from [1...n-1]:
	    # Begin PHASE i + 1
	    for j from [1...i+1]
		    # Begin EXTENSION j
		    Follow the path str[j...i] from the root in the current state of the implicit suffix tree
		    Extend this path by appending str[i+1] using the suffix extension rules
	    # str[j...i+1] is now in the tree
		# end of phase i+1 (I_{i+1} computed)

In each phase, the algorithm makes up to extensions, and finding the end of each path str[j...i] takes up to time. Summing all character traversals across phases gives a worst case time complexity of .

Although this implementation is inefficient, its structure mirrors Ukkonen’s algorithm, which is derived by refining this approach to achieve linear time.