Suffix-Based Tree Data Structures

The substring matching problems asks to pre-process a text[1...n], so that any pattern pat[1...m] can be searched in $O (m)$ time. Unlike exact pattern matching, where the pattern is known upfront, substring matching requires building a structure from the text to support efficient multiple-pattern queries.

Suffix-Based Tree Data Structures can be constructed from the text to support pattern matching in $O (m)$ time.

In string processing, a unique terminal symbol $, chosen as the lexicographically smallest character, is often appended to mark the end of the string and ensure all suffixes are distinct.

Suffix Tries

A Suffix Trie is a rooted tree where each edge is labelled with a character, and each root-to-leaf path represents a suffix of the input string.

It stores all suffixes of a given string in a compact, prefix-sharing structure.

Example Of A Suffix Trie

str[1...6] = refer$ stores a root-to-leaf path for each suffix of str: refer$, efer$, fer$, er$, r$, and $.

The suffix trie for

Suffix Trees

A suffix tree for a string str[1...n] is a path compressed suffix trie, with the following properties:

It has exactly $n$ leaves numbered $1$ through $n$ , representing the suffix start index
every internal node, except for the root, has at least two children
Each edge is labelled with a non-empty substring of str
Every node has at most one edge beginning with any given character of the alphabet.
The string label of the path from root to leaf $i$ is the suffix str[i...n] for $1 \leq i \leq n$

Example Of A Suffix Tree

MULTI COLUMN

str[1...6] = refer$ is a path compressed suffix trie. Note that it satisfies the above properties:

The suffix tree for

$6$ leaves numbered $1$ through $6$

Every internal node has at least two children

Each edge is labelled with a substring of str

At every node no two edges start with the same character,

Any path from the root to a leaf node will yield a suffix of str.

Naive Construction Of Suffix Trees (Pseudocode)

def naive_suffix_tree(str[1...n]):
    initialise root node
    for i from [1...n]:
        suffix = str[i...n]
        current = root
        j = 0
        while j < length(suffix):
            if there is an edge from current starting with suffix[j]:
                follow the edge and match characters
                if mismatch occurs mid edge:
                    split the edge, insert a new node and create a new leaf
                    break
                else:
                    advance to next node
            else:
                create a new edge labeled suffix[j...end] from current
                break

The naive suffix tree construction has a time complexity of $O (n^{2})$ . It generates $n$ suffixes, and for each suffix str[i...n], it may require traversing the tree from the root, potentially examining up to $n - i + 1$ characters. The total comparisons across all suffixes thus sum to $O (n^{2})$ .

Ukkonen’s algorithm offers a more efficient, linear-time approach to building suffix trees.

Implicit Suffix Trees

An implicit suffix tree for a str[1...n] is the suffix tree constructed for str, without appending a terminal character ($).

The terminal symbol ensures all suffixes are distinct, resulting in $n$ leaves (one for each suffix).
By omitting $, only unique paths are represented as suffixes with matching prefixes share branches.

Example Of An Implicit Suffix Trie

The implicit suffix tree for str[1...6] = refer$ is simply the suffix tree for str[1...5] = refer (without considering the $).

Additionally, A full suffix tree is obtained by appending $ to each suffix contained within the tree.

Space Efficient Representation

Explicitly storing the substring labels of each edge in a suffix tree requires $O (n^{2})$ space. A more efficient representation is implicitly using a tuple $(j, i)$ to denote the substring str[j...i], where $1 \leq (j \leq i) \leq n$ .

This reduces the space complexity to $O (n)$ , assuming a fixed size alphabet.

Example Of A Space Efficient Suffix Tree

The space efficient suffix tree for str[1...6] = refer$ utilises tuples to store indices instead of substrings

Quartz 4

Explorer

Suffix-Based Tree Data Structures

Suffix Tries

Suffix Trees

Implicit Suffix Trees

Space Efficient Representation

Graph View

Table of Contents

Backlinks