The substring matching problems asks to pre-process a text[1...n], so that any pattern pat[1...m] can be searched in time. Unlike exact pattern matching, where the pattern is known upfront, substring matching requires building a structure from the text to support efficient multiple-pattern queries.
Suffix-Based Tree Data Structures can be constructed from the text to support pattern matching in time.
In string processing, a unique terminal symbol
$, chosen as the lexicographically smallest character, is often appended to mark the end of the string and ensure all suffixes are distinct.
Suffix Tries
A Suffix Trie is a rooted tree where each edge is labelled with a character, and each root-to-leaf path represents a suffix of the input string.
It stores all suffixes of a given string in a compact, prefix-sharing structure.
Example Of A Suffix Trie
str[1...6] = refer$stores a root-to-leaf path for each suffix ofstr:refer$,efer$,fer$,er$,r$, and$.The suffix trie for
Suffix Trees
A suffix tree for a string str[1...n] is a path compressed suffix trie, with the following properties:
- It has exactly leaves numbered through , representing the suffix start index
- every internal node, except for the root, has at least two children
- Each edge is labelled with a non-empty substring of
str - Every node has at most one edge beginning with any given character of the alphabet.
- The string label of the path from root to leaf is the suffix
str[i...n]for
Example Of A Suffix Tree
MULTI COLUMN
str[1...6] = refer$is a path compressed suffix trie. Note that it satisfies the above properties:The suffix tree for
- leaves numbered through
- Every internal node has at least two children
- Each edge is labelled with a substring of
str- At every node no two edges start with the same character,
- Any path from the root to a leaf node will yield a suffix of
str.
def naive_suffix_tree(str[1...n]):
initialise root node
for i from [1...n]:
suffix = str[i...n]
current = root
j = 0
while j < length(suffix):
if there is an edge from current starting with suffix[j]:
follow the edge and match characters
if mismatch occurs mid edge:
split the edge, insert a new node and create a new leaf
break
else:
advance to next node
else:
create a new edge labeled suffix[j...end] from current
breakThe naive suffix tree construction has a time complexity of . It generates suffixes, and for each suffix str[i...n], it may require traversing the tree from the root, potentially examining up to characters. The total comparisons across all suffixes thus sum to .
Ukkonen’s algorithm offers a more efficient, linear-time approach to building suffix trees.
Implicit Suffix Trees
An implicit suffix tree for a str[1...n] is the suffix tree constructed for str, without appending a terminal character ($).
- The terminal symbol ensures all suffixes are distinct, resulting in leaves (one for each suffix).
- By omitting
$, only unique paths are represented as suffixes with matching prefixes share branches.
Example Of An Implicit Suffix Trie
The implicit suffix tree for
str[1...6] = refer$is simply the suffix tree forstr[1...5] = refer(without considering the$).
Additionally, A full suffix tree is obtained by appending $ to each suffix contained within the tree.
Space Efficient Representation
Explicitly storing the substring labels of each edge in a suffix tree requires space. A more efficient representation is implicitly using a tuple to denote the substring str[j...i], where .
This reduces the space complexity to , assuming a fixed size alphabet.
Example Of A Space Efficient Suffix Tree
The space efficient suffix tree for
str[1...6] = refer$utilises tuples to store indices instead of substrings



