Overview Of Ukkonen's

Ukkonen’s Algorithm is an online method for constructing suffix trees in linear time. It relies on three core ideas:

Suffix Links
Four Optimisation Tricks

The algorithm incrementally builds an implicit suffix tree for each prefix of str[1...n], and then extends all suffixes with $ to produce the regular suffix tree.

The suffix links are introduced to speed up traversal, and the optimisation tricks to eliminate redundant computations. Without them, the algorithm can degrade to $O (n^{3})$ — worse than the naive approach.

Incremental Construction Of Implicit Suffix Trees

Given a string str[1...n], Ukkonen’s Algorithm proceeds over $n$ phases:

In phase $i$ , the algorithm constructs the implicit suffix tree for the prefix str[1...i] ( $I_{i}$ )
In phase $i + 1$ , it constructs $I_{i + 1}$ incrementally from $I_{i}$ via suffix extension

A suffix extensions involves extending each suffix str[j...i] ( $1 \leq j \leq i + 1$ ) from the implicit suffix tree, to include the new character str[i+1]. Specifically, it:

Locates the end of the path corresponding to the suffix str[j...i] in $I_{i}$ , then
Extends this path by appending str[i+1], modifying the tree structure based on a set of extension rules.

After $n$ phases, $I_{n}$ is converted to the regular suffix tree by appending the terminal symbol $ and running one final phase ( $n + 1$ ). In practice, it’s simpler to append $ to str[1...n] at the start and run all $n + 1$ phases directly.

Suffix Extension Rules

Suffix extension rules determine how each suffix str[j...i] is extended to incorporate the next character str[i+1].

Rule 1 — Leaf Extension

At suffix extension $j$ in phase $i + 1$ , if the substring α = str[j...i] has not appeared prior to $j$ , then, as suffixes with common prefixes share the same path, its path must end at a leaf.

Thus, to accommodate the new character str[i+1], a simple leaf extension is employed

Rule 1 — Leaf Extension

At suffix extension $j$ in phase $i + 1$ , if the path str[j...i] in $I_{i}$ ends at a leaf, then extend that leaf by adjusting the label of the edge to account for the added character str[i+1].

Rule 2 Regular — Internal Split

At suffix extension $j$ in phase $i + 1$ , if the substring α has appeared prior to $j$ , and the prior a’s were followed by the same character x, then $I_{i}$ contains a root-to-leaf path representing α followed by x (and any subsequent characters)

Thus, to accommodate the new character str[i+1], the path for α must split.

Rule 2 Regular — Internal Split

At suffix extension $j$ in phase $i + 1$ , if:

The path str[j...i] in $I_{i}$ doesn’t end at a leaf and,

The next character in the path is some x != str[i+1]

Then split the edge after str[j...i] by inserting a new internal node u followed by a leaf numbered $j$ , and assigning the character str[i+1] as the edge label between u and j

Rule 2 Alternate — Internal Append

At suffix extension $j$ in phase $i + 1$ , if the substring α has appeared prior to $j$ , and the prior a’s were followed by different characters x and y, then $I_{i}$ contains a root-to-leaf path representing α ending at an internal node, which branches out to both x and y (followed by any subsequent characters)

Thus, to accommodate the new character str[i+1], another branch is appended to the internal node after α

Rule 2 Alternate — Internal Append

At suffix extension $j$ in phase $i + 1$ , if:

The path str[j...i] in $I_{i}$ doesn’t end at a leaf and,

The next characters in the path are some x, y != str[i+1] and,

An internal node already exists at the end of str[j...i]

Then create a new leaf numbered $j$ branching from the internal node u, and assign the character str[i+1] as the edge label between u and j

Rule 3 — Do Nothing

At suffix extension $j$ in phase $i + 1$ , if the substring α has appeared prior to $j$ , and the prior a’s were followed by same characters z = str[i+1], then $I_{i}$ already contains a root-to-leaf path representing α followed by str[i+1] = z.

In this case, no changes are required since the path str[j...i+1] is already present.

Rule 3 — Do Nothing

At suffix extension $j$ in phase $i + 1$ , if:

The path str[j...i] in $I_{i}$ doesn’t end at a leaf and,

The next characters in the path is str[i+1],

Then str[i+1] is already in the tree, and no further action is needed

Naive Implicit Suffix Tree Construction

The following presents a base implementation of Ukkonen’s algorithm, without any optimisation.

def implicit_suffix_tree(str):
	n = len(str)
	Construct I₁
	for i from [1...n-1]:
	    # Begin PHASE i + 1
	    for j from [1...i+1]
		    # Begin EXTENSION j
		    Follow the path str[j...i] from the root in the current state of the implicit suffix tree
		    Extend this path by appending str[i+1] using the suffix extension rules
	    # str[j...i+1] is now in the tree
		# end of phase i+1 (I_{i+1} computed)

In each phase, the algorithm makes up to $i + 1$ extensions, and finding the end of each path str[j...i] takes up to $O (i - j + 1)$ time. Summing all character traversals across $n + 1$ phases gives a worst case time complexity of $O (n^{3})$ .

Although this implementation is inefficient, its structure mirrors Ukkonen’s algorithm, which is derived by refining this approach to achieve linear time.

Example Of Constructing An Implicit Suffix Tree

Given a string str[1...7] = abaaba$, constructing an implicit suffix tree (using space-efficient representation) goes as follows:

Phase $i + 1 = 0$ | Prefix ε

Initially at phase $0$ , the implicit suffix tree $I_{0}$ contains no edges or suffixes, only the root node.

Phase $i + 1 = 1$ | Prefix a

At phase $1$ , each suffix str[j...i] is extended by character str[i+1] = str[1] = a.

The first extension $j = 1$ , extends the path str[1...0] = "" by a. This can be considered a “base case”, as $I_{0}$ only contains the root, so str[1] is never present in the tree.

As the path ends at an internal node (the root), rule 2 alternate applies: a new leaf labelled $j = 1$ is added directly to the root, with the edge label being str[i+1] = a

Phase $i + 1 = 2$ | Prefix ab

At phase $2$ , each suffix str[j...i] is extended by str[2] = b.

Extension $j = 1$ extends the path str[1...1] = a by b, forming str[j...i+1] = str[1...2] = ab. As the path ends at a leaf, rule 1 applies: the leaf is extended by adjusting the edge endpoint to $i + 1 = 2$ .

Extension $j = 2$ extends the path str[2...1] = "" by b. As this path is non-existent in the tree, rule 2 alternate applies.

Phase $i + 1 = 3$ | Prefix aba

At phase $3$ , each suffix str[j...i] is extended by str[3] = a.

Extension $j = 1$ extends the path str[1...2] = ab by a, forming aba. As the path ends at a leaf, rule 1 applies.

Extension $j = 2$ extends the path str[2...2] = b by a, forming ba. As this path, ends at a leaf, rule 1 applies.

Extension $j = 3$ extends the path str[3...2] = "" by a, forming a. As the path str[j...i+1] = str[3...3] = a already exists in the tree, rule 3 applies: no changes are needed.

Phase $i + 1 = 4$ | Prefix abaa

At phase $4$ , each suffix str[j...i] is extended by str[4] = a.

Extension $j = 1$ extends the path str[1...3] = aba by a, forming abaa. As the path ends at a leaf, rule 1 applies.

Extension $j = 2$ extends the existing path str[2...3] = ba by a, forming baa. As this path, ends at a leaf, rule 1 applies.

Extension $j = 3$ extends the path str[3...3] = a by a, forming aa. The path a exists implicitly (ends in the middle of an edge, not at a leaf, nor an internal node), therefore rule 2 regular applies: an internal node is created after a, and two edges are appended one for the previousbaa, and a new edge for a.

Extension $j = 4$ extends the path str[4...3] = "" by a, forming a. As the path str[4...4] = a already exist, rule 3 applies.

Phase $i + 1 = 5$ | Prefix abaab

At phase $5$ , each suffix str[j...i] is extended by str[5] = b.

Extension $j = 1$ extends the path str[1...4] = abaa by b, forming abaab. As the path ends at a leaf, rule 1 applies.

Extension $j = 2$ extends the path str[2...4] = baa by b, forming baab. As this path, ends at a leaf, rule 1 applies.

Extension $j = 3$ extends the path str[3...4] = aa by b, forming aab. As this path, ends at a leaf, rule 1 applies.

Extension $j = 4$ extends the path str[4...4] = a by b, forming ab. As the path ab already exists, rule 3 applies.

Extension $j = 5$ extends the path str[5...4] = "" by b, forming b. As the path b already exists, rule 3 applies.

Phase $i + 1 = 6$ | Prefix abaaba

At phase $6$ , each suffix str[j...i] is extended by str[6] = a.

Extension $j = 1$ extends the path str[1...5] = abaab by a, forming abaaba. As the path ends at a leaf, rule 1 applies.

Extension $j = 2$ extends the path str[2...5] = baab by a, forming baaba. As this path, ends at a leaf, rule 1 applies.

Extension $j = 3$ extends the path str[3...5] = aab by a, forming aaba. As this path, ends at a leaf, rule 1 applies.

Extension $j = 4$ extends the path str[4...5] = ab by a, forming aba. As the path aba already exists, rule 3 applies.

Extension $j = 5$ extends the path str[5...5] = b by a, forming ba. As the path ba already exists, rule 3 applies.

Extension $j = 6$ extends the path str[5...5] = "" by a, forming a. As the path a already exists, rule 3 applies.

Phase $i + 1 = 7$ | Prefix abaaba$

At phase $7$ , each suffix str[j...i] is extended by str[7] = $.

Extension $j = 1$ extends the path str[1...6] = abaaba by $, forming abaaba$. As the path ends at a leaf, rule 1 applies.

Extension $j = 2$ extends the path str[2...6] = baaba by $, forming baaba$. As this path, ends at a leaf, rule 1 applies.

Extension $j = 3$ extends the path str[3...6] = aaba by $, forming aaba$. As this path, ends at a leaf, rule 1 applies.

Extension $j = 4$ extends the path str[4...6] = aba by $, forming aba$. As the path exists implicitly, rule 2 regular applies.

Extension $j = 5$ extends the path str[5...6] = ba by $, forming ba$. As the path exists implicitly, rule 2 regular applies.

Extension $j = 6$ extends the path str[5...6] = a by $, forming a$. As the path ends at an internal node, rule 2 alternate applies.

Extension $j = 7$ extends the path str[6...6] = "" by $, forming $. As the path ends at an internal node, rule 2 alternate applies.

The additional phase $i + 1 = 7$ is no different to any other phase, and constructs the explicit suffix tree for str[1...7 = abaaba$.

Quartz 4

Explorer

Overview Of Ukkonen's

Incremental Construction Of Implicit Suffix Trees

Suffix Extension Rules

Rule 1 — Leaf Extension

Rule 2 Regular — Internal Split

Rule 2 Alternate — Internal Append

Rule 3 — Do Nothing

Naive Implicit Suffix Tree Construction

Graph View

Table of Contents

Backlinks