To ensure Ukkonens Algorithm maintains linear-time complexity, the general incremental procedure is optimised through four implementation tricks.
Space-Efficient Representation Of Edge Labels
Explicitly storing the substring labels of each edge in a suffix tree requires space.
Such a representation prevents the development of a linear-time construction method, as it would be impossible to even output the resulting tree in time.
A more efficient approach involves implicitly representing edge labels, as follows:
Define a tuple to denote the substring
str[j...i], where .
This approach reduces the space complexity to , assuming a fixed-size alphabet.
Example Of A Space Efficient Suffix Tree
The space efficient suffix tree for
str[1...6] = refer$utilises tuples to store indices instead of explicit substrings.
Skip Counting
When extending a suffix tree, a path labelled by the remainder substring must already exist below the active node. Therefore, it is unnecessary to traverse this path character by character.
The skip-counting process optimises this redundancy by following these steps:
- At each internal node, check the first character of each outgoing edge against the current position in the remainder, to determine which edge continues the path.
- Use the number of characters along each edge to determine how far the traversal progresses through the remainder substring.
- Skip to the next internal node along the path, rather than traversing one character at a time.
- Repeat this skip-count process, possibly passing through multiple internal nodes, until the extension point is reached.
By applying this optimisation, the time required for traversal depends on the number of nodes along the path, rather than the sum of the edge lengths.
When combined with suffix links, skip-counting reduces the worst-case time complexity of the extension process to
While skip-counting down the remainder substring, the active node should be continually updated to the most recent (or deepest) internal node visited.
This ensures that after traversal, the active node is positioned immediately above the extension point, as required.
Example Of Skip Counting
Imagine an extension is made below a node
u, and the remainder substring isstr[k...i] = zabcdefghy. We first traverse the suffix link fromutov, and then traverse down the remainder using skip-counting as follows:
- From node
v, determine how many characters are on the edge beginning withz, continuing until the next internal node is reached.
- In this case, there are characters (
za)- As the length of the remainder is and , traversing this edge won’t reach the end of the remainder.
- So skip to the node receiving that edge
- The next character from the remainder to look for is at position (
b), thus determine the amount of characters on the edge beginning withb, until the next internal node is reached
- Again, there are are characters (
bc).- After traversing this edge, characters of the remainder have been traversed.
- Since , skip to the node receiving that edge and continue traversing down the tree.
- The next character to look for is at position (
d),
- There are characters (
def) on the edge starting withd.- Since , it is safe to skip to the node receiving that edge.
- The next character to look for is at position (
g)
- There are characters (
ghyx) on the edge starting withg- Since , traversing the entire edge would go beyond the end of the remainder substring
- Therefore, only traverse partway down this edge, skipping over characters to reach the extension point.
- The appropriate extension can then be performed.
Remember to continually update the active node
The Show Stopper Rule
Lemma 3: Rule 3 extensions are followed by rule 3 extensions
If in any phase the extension is performed via rule 3, then every extension after () for the remainder of the phase, will be performed using rule 3.
- This is a consequence of Lemma 2
- If a rule 3 occurs at extension , then the path
str[j...i]must be continued bystr[i+1]- Via Lemma 2, any character
xthat continues the pathstr[j...i], must also continue the pathstr[j+1...i]- Thus, extension must also be performed using rule 3, and similarly, extension uses rule 3, and so on until the end of the phase
The showstopper rule states that if extension in phase is performed using rule 3, the phase can be immediately terminated and extension for phase can begin.
As rule 3 extension don’t alter the state of the suffix tree, it would remain unchanged until the first extension of the next phase (which is always rule 1 after the base case, phase ).
Therefore, the traversal of all extension points for in phase can be skipped, and the process can move directly to the first extension of phase .
Example Of The Showstopper Rule
Consider phase in the implicit suffix tree construction for
str[1...9] = abacabade.Extension in this phase is the first rule 3 extension, where
str[5...7] = abais being inserted. This path already exists in the tree asstr[5...7]is a proper prefix of the longer suffixstr[1...7], which was added in the earlier extension.
Similarly, in extension ,
str[6...7] = baalready exists as a proper prefix of the longer suffixstr[2...7].
Finally, in extension ,
str[7] = aalso exists as a proper prefix of the longer suffixstr[3...7].
As observed, after the first rule 3 extension at , all subsequent extensions were also performed using rule 3.
This means that after extension of phase , the process could have immediately moved to extension of phase
Rapid Leaf Extension
Lemma 4: Rule 1 extensions remain rule 1 extensions in the next phase
If extension of phase is performed using rule 1, then the same extension in the next phase will also be performed using rule 1.
- A rule 1 extension at phase implies the path
str[j...i−1]ends at a leaf , and is extended tostr[j...i].- These extension are the only ones to alter existing leaf path.
- Extensions after in phase traverse shorter paths than
str[j...i], so they never reach leaf .- Extensions before in phase traverse longer paths than
str[j...i], and also cannot reach leaf .Therefore, at extension in phase , the path
str[j...i]still ends at leaf , and is therefore extended using rule 1
Lemma 5: Rule 2 extensions are rule 1 extensions in the next phase
If extension of phase is performed using rule 2, then the same extension in the next phase will be performed using rule 1.
- A rule 2 extension at phase creates a new leaf numbered whose path from the root is
str[j...i]- By the same reasoning used in proof of Lemma 4, the path
str[j...i]is unaltered until extension of phaseTherefore, at phase , extension traverses the path
str[j...i]to reach leaf and extends it tostr[j...i+1]using rule 1
The regular suffix tree for str[1...n] contains leaves, one for each suffix.
The only extensions that create a leaf are rule 2 extensions, and the only extensions that update the path to a leaf are rule 1 extensions.
Intuitively, Lemma 5 shows that a suffix str[j...n] is added to the tree via a rule 2 extension in some phase .
In combination with Lemma 4, the prefix str[j...i] of the full suffix is then extended by one character in each subsequent phase via rule 1 extensions, until phase where the full suffix exists implicitly in the tree.
In summary:
- Rule 2 extensions add new suffixes to the tree.
- Rule 1 extensions grow existing suffixes in the tree.
Once a suffix is added to the tree, it is never added again and so if extension in phase is performed using rule 2, then any extension for subsequent phases will be performed using rule 1.
Optimising This Redundancy
The goal is to track how many extensions have already been added using rule 2 before phase . To do this, define:
last_jas the last extension that created a leaf via rule 2.
Such that, in each phase, all extensions will be rule 1 extensions.
Once a leaf is added, the only change to the edge involves extending it by one character in each phase, using space-optimal representation, this involves incrementing the end index.
As all leaf edges share the same end index in each phase, simply have a global pointer that is incremented in each phase. That is define:
globalEndas a shared end index for all leaves
By setting globalEnd = i at the start of phase , all necessary rule 1 extensions are performed implicitly without traversal. Therefore, the first last_j extensions in phase are handled in time, and explicit extensions start from
Rapid Leaf Extensions
Given
str[1...n], in Ukkonen’s algorithm, all rule 1 extensions are handled implicitly as follows:
- Initialise
last_j = 0andglobalEnd = 0.- At the start of each phase , increment
globalEnd.- Begin explicit extensions from .
- If an explicit extension uses rule 2:
- Create a new leaf numbered and attach it appropriately.
- Set the edge’s
endIndexas a pointer toglobalEnd.- Increment
last_j.
This mechanism allows all early rule 1 extensions to be performed in constant time.
Additional Steps To Rule 3s
Lemma 6: Rule 3 extensions are rule 2 or rule 3 extensions in the next phase
If extension of phase is performed using rule 3, then the same extension in the next phase will be performed using either rule 2 or rule 3.
- A rule 3 extension occurs when the substring
str[j...i]already exists in the tree and the path ends inside an edge (i.e. not at a leaf or internal node).- As rule 3 makes no structural changes, the tree remains unchanged in the next phase.
- In phase , extension attempts to extend
str[j...i]byi+1. This path may:
- Already exist → Another rule 3 extension
- Not exist → Since it doesn’t end at a leaf, it must be a rule 2 extension.
- Additionally, the first extension of any phase (after the base case) always ends at a leaf and is therefore always a rule 1, never rule 3.
Intuitively, Lemma 6 shows that although a rule 3 extension doesn’t alter the tree, there is still “work waiting to happen”.
If extension in phase is a rule 3 extension:
- As no internal node is created,
last_jis not incremented - By the show stopper rule, the next phase immediately begins
Therefore, the next explicit (involves traversal) extension occurs at extension in phase (all extensions up to and including last_j, are rule 1’s that don’t require traversal due to globalEnd)
An issue with this is that:
- Extension in phase traverses the path
str[j...i-1], storingstr[k...i-1]as the remainder - Whereas, extension of phase traverses the path
str[j...i] - However, the remainder is one character short — it lack
str[i].
Additionally, the remainder below the active node u is only traversed by skip counting, and as rule 3 doesn’t create an internal node, no suffix link is followed.
Rule 3 Extensions In Ukkonen's
If extension of phase is performed using rule 3, the following steps are taken:
- Append
str[i]to the remainder, extending it fromstr[k...i−1]tostr[k...i].- If a previous extension created an internal node
v, set its outgoing suffix link fromvto the current active nodeu- Terminate phase using the showstopper rule.
Summarising Each Optimisation
The first two optimisations target cost per extension:
- The space-efficient representation reduces the cost of updating and storing leaf edges, enabling rapid leaf extension by allowing all leaves to share a common end index.
- Skip counting reduces traversal time by limiting traversals to the number of nodes visited rather than the total number of characters.
The next two optimisations target the total number of extensions
- The show stopper rule terminates phases early when further extensions are guaranteed to be redundant.
- Rapid leaf extension eliminates the need to explicitly handle many rule 1 extensions,
Together, these strategies transform Ukkonen’s algorithm from to by both reducing the cost per extension and the total number of extensions.




