Speeding Up Traversals With Suffix Links

In phase $i + 1$ , the naive method traverses $O (i^{2})$ characters, since each extension $j$ follows the pathstr[j...i] from the root. The next extension $j + 1$ follows str[j+1...i], which overlaps entirely with the previous path except for str[j].

This redundancy can be avoided using shortcuts that exploit the shared structure between successive extensions to avoid repeated traversal.

Suffix Links

A suffix link is a pointer from one internal node u to another internal node v in an implicit suffix tree, where:

The path to u is labelled by str[j...k−1]
The path to v is labelled by str[j+1...k−1]

If the path str[j...k−1] is decomposed into x = str[j] and β = str[j+1...k−1], so that str[j...k−1] = xβ. Then if an internal node’s path corresponds to xβ, its suffix link points to the node whose path is β.

These links serve as shortcuts that speed up traversal by removing the need to re-traverse shared substrings.

Corner Cases To Consider

If the path from r to u yields a single character substring, the path from r to v will yield an empty substring (i.e. r = v)

If u = r then the substring traverse is empty, and so v = r as well, therefore the root always links to itself.

Example Of A Suffix Link

In the implicit suffix tree for str[1...8] = abacabad, examining each internal node reveals the suffix links:

The path from r to u represents the substring “a” (x = a, β = ε) and has a suffix link pointing back to the root r, as it follows the initial path.

The path from r to w represents the substring “ba” (x = b, β = a) and has a suffix link pointing back to u, which corresponds to the path β = a.

The path from r to v represents the substring “aba” (x = a, β = ba) and has a suffix link pointing back to w, which corresponds to the path β = ba.

Active Node & Remainder

In any given extension:

The active node is the internal node at the end of the path str[j...k-1] under whose direct edge the suﬃx extension rules are applied.
The remainder is the substring str[k...i] remaining below the active node, and before the extended character str[i+1].

These variables are used to keep track of the extension point—the point in the tree where the next extension rule is applied (i.e. the end of the path labelled str[j...i]).

If the active node is the root, the remainder is updated by removing the first character of the current substring

The remainder is the portion of the string that remains to be extended.

When the active node is the root, its suffix link points to itself.

The suffix link points to a node with one less character (the first character) in its path

This leads to the removal of the first character from the remainder

E.g. if the active node is r and the remainder is str[k...i] = cabad, following the suffix link updates the remainder to str[k+1...i] = abad, discarding c.

Using Suffix Links

Starting at extension $j = 1$ of phase $i + 1$ , there isn’t enough information to use suffix links, the algorithm must first naively locate the initial extension point by traversing the tree from the root.

First Extension Of A Phase

The First Extension of a Phase is Always Rule 1

At the start of any phase $i + 1$ , the first extension ( $j = 1$ ) always follows Rule 1.

This is because the longest existing path in the tree at this point is str[1...i], which ends at a leaf and has not yet been extended.

Therefore, the next character str[i+1] is simply appended to this leaf.

This behaviour follows from a more general structural property of suffix trees—see Lemma 2.

This provides a guaranteed, well-structured starting point for the phase as, since the path ends at a leaf:

The active node is the parent of this leaf, and
The remainder is the substring remaining below the parent (excluding str[i+1]).

After this first extension, the tree has sufficient structure, and the algorithm has enough information to support suffix link traversal in subsequent extensions.

Extension $j = 1$ of phase $i + 1 = 9$

The extension j = 1 in phase i + 1 = 9 of the implicit suffix tree construction for str[1...9] = abacabade requires traversing str[1...8] = abacabad, which leads to leaf $1$ where a rule 1 extension is performed to add str[9] = e.

Here the active node would be v and the remainder is str[4...8] = cabad

The active node and remainder for extension $j = 1$ are then used for the next extension $j = 2$

Subsequent Extensions

Extension $j = 2$ of phase $i + 1 = 9$

To find the end of str[2...i] and extend it by str[i + 1]:

From the active node v, follow its suffix link to w

From w, traverse down along the remainder substring str[k...i] = str[4...8] = cabad

Apply the pertinent suffix extension rule

The path to v is the substring str[1...3] = aba, and the remainder substring is str[4...8] = cabad, which is the remaining portion of the path to the $j = 1$ extension point (the end of the path labelled str[1...8])

By following the suffix link to w, the algorithm avoids re-traversing the path str[2...3] = ba, allowing it to directly traverse down the remainder substring str[4...8] to reach the $j = 2$ extension point (the end of the path labelled str[2...8])

After the $j = 2$ extension, the new active node is w, while the remainder remains as str[4...8] = cabad

Extension $j = 3$ of phase $i + 1 = 9$

To find the end of str[3...i] and extend it by str[i + 1]:

From the active node w, follow its suffix link to u

From u, traverse down along the remainder substring str[4...8] = cabad

Apply the pertinent suffix extension rule

Once again, note that if the active node is the root r, then the suffix link points to itself. Due to this, the first character of the remainder substring needs to be removed, to update the extension point from the end of the path str[1...i] to the end of the path str[2...i].

General Extension Procedure Using Suffix Links

For any extension $j$ of phase $i + 1$ , where $1 \leq j \leq i + 1$ :

Base Case: At extension $j = 1$ , suffix links are not used. Instead:
- Naively locate the initial extension point by traversing the tree from the root.
- Determine the active node u and the remainder str[k...i].
- Apply a rule 1 extension (always the case for the first extension).
For extensions $j > 1$ ,
- Given the active node u and the remainder str[k...i] (which may be empty) from the previous extension $j - 1$ :
- Follow the suffix link from u to the receiving node v, and traverse from v to the end of the remainder.
- Note that, If u = r, then v = r, and the first character must be removed from the remainder before traversing.
Once the remainder is fully traversed, the extension point is reached, and the appropriate suffix extension rule is applied.
After each extension, the active node and remainder are updated, and the algorithm moves to the next extension $j + 1$ .

This procedure enables efficient traversal between extension points by removing the need to traverse the entire path str[j...i] for each extension.

Every Implicit Suffix Tree Has An Outgoing Suffix Link

The general extension procedure using suffix links assumes that every internal node in the implicit suffix tree has an outgoing suffix link. To prove this, there are some important lemmas to consider:

Lemma 1: Rule 2 extensions are never followed by rule 1 extensons

If extension $j$ of phase $i + 1$ is performed using rule 2, then extension $j + 1$ will not be performed using rule 1

If rule 2 applies at extension $j$ , the path str[j...i] doesn’t end at a leaf and is followed by other characters

Since str[j...i] was not a leaf and is a prefix of some longer string, in the next extension $j + 1$ , the path str[j+1...i] must be the same

Rule 1 only applies if the path ends at a leaf, so it cannot apply here.

Therefore, a rule 2 extension is always followed by either rule 2 or rule 3 in the next extension of the same phase, never rule 1.

Rule 2 Regular:

Rule 2 Alternate:

This lemma is actually a corollary of Lemma 2

Lemma 2: There's more structure below the shorter path

The subtree below some path str[j...i] in the $I_{i}$ is a subtree of the subtree below the path str[j+1...i].

“Subtree below some path str[j...i]” → Follow the path str[j...i] in the tree, the subtree referenced is the subtree formed by any subsequent characters after that path.

If str[j...i] is added to the tree at extension $j$ of phase $i$ ,

Then str[j+1...i] is added in the next extension $j + 1$ .

If in a later phase, str[j...i]x is inserted (where $x$ is non-empty),

Then in the next extension str[j+1...i]x will also be inserted.

Thus, by the end of any later phase: Every extension under str[j...i] also exists under str[j+1...i].

However, not all instances of str[j+1...i] are extensions of str[j...i] as str[j+1...i] may include additional branches in its subtree that str[j...i] does not.

The subtree below str[1...3] = aba has two branches: one to d, another to cabad.

It is identical to the subtree below str[2...3] = ba because every extension of aba is followed by an extension of ba.

In contrast, str[3] = a also contains the subtree for ba, but has extra structures.

That’s because a appears in other places (e.g., str[1], str[5]) where it’s not part of ba, and those branches (like one beginning with b) are present only under a, not ba.

This additional structure is what allows rule 3 to follow rule 2 — even if str[j...i] isn’t followed by the next character, str[j+1...i] might be, due to these additional branches

Theorem 1: Every internal node in the implicit suffix tree $I_{i}$ has an outgoing suffix link.

Let $P (k)$ be the statement that every internal node in the implicit suffix tree $I_{i}$ has an outgoing suffix link, we will prove this statement by mathematical induction on $k$ .

Base Case:

Let $k = 1$ and consider $I_{1}$ , which contains a single internal node, the root. By definition the root has an outgoing suffix link that points to itself and so $P (1)$ is true.

Inductive Hypothesis

Assume that $P (i)$ is true, where $i \geq 1$ . That is, assume that _every internal node in _ $I_{i}$ has an outgoing suffix link.

Inductive Step

In phase i + 1 the implicit suffix tree $I_{i + 1}$ is constructed from $I_{i}$ .

By the inductive hypothesis every internal node in $I_{i}$ has an outgoing suffix link at the beginning of the phase and no suffix links are ever removed from the tree.

Thus, all that remains is to prove that every new internal node created in phase $i + 1$ has an outgoing suffix link by the conclusion of the phase.

At some extension $j$ , a new internal node v is only added to the tree via Rule 2 Regular extension that splits an existing edge. For this to occur the path labelled str[j...i] must have been continued by a character x, such that x != str[i+1].

![[Speeding Up Traversals With Suffix Links.png|An example of a Rule 2 extension adding a new internal node v to the implicit suffix tree]]

By Lemmas 1 and 2 we know that extension $j + 1$ can only be performed using a rule 2, or a rule 3, thus there are only two cases to consider:

Case 1

The path labelled str[j+1...i] continues only via x != str[i+1] and so extension j + 1 will also be performed via rule 2 regular and a new internal node w will be created.

The path to this new internal node is str[j+1...i] and so by definition the outgoing suffix link from v points to w

![[suffix_link_existence_2.png|If a new internal node v is created in extension j and a new internal node w is created in extension j + 1, then the outgoing suffix link from v points to w]]

Case 2

The path labelled str[j+1...i] ends in an existing internal node u, with one branch extending below via x != str[i+1] and other branches via other characters, i.e. the subtree under str[j+1...i] is larger than the subtree under str[j...i].

If none of these characters are str[i + 1], then the j + 1 extension is performed using rule 2 alternate. Otherwise, if one of the characters is str[i + 1], then the extension is performed using rule 3.

In both cases the outgoing suffix link from w points to u by definition.

![[suffix_link_existence_3.png|If a new internal node w is created in extension $j$ and the path str[j+1...i] ends in an existing node u directly below which either a rule 2 alternate, or rule 3 is performed, then the outgoing suffix link from w points to u by definition]]

Thus, whenever a new internal node is created, its suffix link is resolved in the very next extension of the phase. This however relies on the fact a new node is never created in the last extension of a phase.

Note the last extension in any phase $i + 1$ traverses the path str[i+i...i] = "" so the extension is made directly from the root.

The suffix str[i+1] is either added via a rule 2 alternate extension, or via a rule 3 extension, and so no new internal node is created.

With this we have shown that by the end of phase $i + 1$ , every internal node in $I_{i + 1}$ will have an outgoing suffix link and so $P (i + 1)$ is true.

Therefore, by the principle of mathematical induction $P (k)$ is true for all $k \geq 1$ .

Theorem 1 proves that the general extension process is sound as the active node is never a newly created internal node and so its outgoing suffix link is always defined.

However, even when using suffix links, the method still requires $O (n^{3})$ -time as it naively traverse the remainder substring character by character in every iteration.

There are four optimisation tricks to ensure the algorithm performs more efficiently.

Quartz 4

Explorer

Speeding Up Traversals With Suffix Links

Suffix Links

Active Node & Remainder

Using Suffix Links

First Extension Of A Phase

Subsequent Extensions

General Extension Procedure Using Suffix Links

Every Implicit Suffix Tree Has An Outgoing Suffix Link

Base Case:

Inductive Hypothesis

Inductive Step

Case 1

Case 2

Graph View

Table of Contents

Backlinks