The Good Suffix Shift Rule

The Good Suffix Rule shifts the pattern after a mismatch by focusing on the portion of the pattern that matched the text, i.e. the good suffix, aligning it with another occurrence of the same or a similar suffix earlier in the pattern.

It’s especially useful when the Bad Character Rule offers no advantages, such as after a full match or a mismatch near the pattern’s start.

Example of the Good Suffix Rule

Suppose we are matching the pattern pat = acababacaba against some text, via right-to-left scan, and a mismatch occurs at txt[8] = b with pat[8] = c, while the suffix pat[9...11] = aba has matched.

To apply the Good Suffix Rule, we look for another occurrence of the suffix aba in the pattern to the left of the mismatch. In this case, two ranges match:

pat[3...5] and,

pat[5...7]

We select the rightmost occurrence (pat[5...7]) and check if its preceding character pat[4] = b differs from the mismatched character pat[8] = c, which it does. This allows us to shift the pattern by $4$ positions.

If pat[3...5] were the rightmost occurrence, its preceding character pat[2] = c would match the mismatched character, making it less effective for shifting

Additionally, using the Bad Character Rule instead would only allow a shift of $2$ positions.

If no other occurrence of the full good suffix (aba) existed, we would instead look for a suffix of the good suffix (e.g., ba or a) elsewhere in the pattern to determine the shift

Formalising The Good Suffix Rule

Consider a general iteration where pat[1...m] is aligned with text[j...j+m-1] and a right-to-left scan is performed. If a mismatch occurs at some position $k$ in pat, then the good suffix is the suffix pat[k+1...m] that matches with text[j+k...j+m-1].

To apply the Good Suffix Rule, check if there exists a position $p < m$ in pat such that:

It is the endpoint of the rightmost occurrence of a substring that exactly matches the good suffix
- pat[p-(m-k)+1...p] == pat[k+1...m]
The preceding character of that substring is different from the mismatched character in pat
- pat[p-(m-k)] != pat[k]

If such a position $p$ exists, the pattern can be shifted right by $m - p$ positions, allowing a new iteration to begin.

This is actually known as the Strong Good Suffix Rule, as the standard Good Suffix Rule does not require the preceding character to be different from the mismatched character in the pattern.

Implementing The Good Suffix Rule

To efficiently implement the Good Suffix Rule, a pre-processing step, inspired by the computation of Z-values, is used to define Z-suffix values. These values are then used to construct the Good Suffix Table.

Z-Suffix Values

For a given pattern pat[1...m], let $Z_{i}^{suffix}$ , for $1 \leq i < m$ , be the length of the longest substring ending at position $i$ that matches its suffix. Formally:

$pat [i - Z_{suffix} [i] + 1 \dots i] = pat [m - Z_{suffix} [i] + 1 \dots m]$

To compute these values efficiently:

Reverse the pattern
Compute Z-values for the reversed pattern
Reverse the resulting $Z$ -values to obtain $Z_{i}^{suffix}$

Since the $Z$ -algorithm runs in $O (m)$ time, this pre-processing step is also $O (m)$ .

Example of Computing Z-Suffix Values

For pat = acababacaba (length of 11), the Z-suffix values are computed as so:

Reverse the pattern

This gives reverse(pat) = abacababaca

Compute Z-values for the reversed pattern

Doing this gives the Z-values Z = [-, 0, 1, 0, 3, 0, 5, 0, 1, 0, 1]

Reverse the resulting Z-values to obtain Z-suffix values

This gives Z_suffix = reverse(Z) = [1, 0, 1, 0, 5, 0, 3, 0, 1, 0, -]

The Good Suffix Table

The Good Suffix Table determines the amount to shift the pattern by when a mismatch occurs. It identifies the rightmost position in the pattern where another good suffix occurs, whilst ensuring that the character preceding differs from the mismatched character.

If no exact match for the good suffix is found, the table helps determine the next best shift by considering smaller suffixes of the current match.

Computing The Good Suffix Table

The Z-suffix values are used to construct the Good Suffix Table. For each starting position of a suffix in the pattern, j, the table stores good_suffix[j] = p, where p is the rightmost position in the pattern such that:

The substring pat[p - Z_suffix[p] + 1...p] matches the suffix pat[j...m], and
The preceding character pat[p - Z_suffix[p]] is not equal to pat[j-1].

Note, this uses 0-based indexing

# Compute Z-suffix values by applying the Z-algorithm to the reversed pattern.
z_suffix = reverse(z_algorithm(reverse(pattern)))
z_suffix[-1] = 0
 
# Initialise the Good Suffix table with default values (0).
m = len(pat)
good_suffix = [0] * (m+1)
 
# Populate the Good Suffix table using Z-suffix values.
for p in range(m-1): # m - 1 because z_suffix[-1] provides no valuable information
	# `z_suffix[p]` is the length of the longest suffix of `pattern[0:p]` 
	# that matches the suffix of `pattern`.
 
	# Compute `j`, the starting index of this suffix in the original pattern.
	j = m - z_suffix[p] # +1 for 1 base indexing
 
	# `good_suffix[j]` now stores the rightmost position in the pattern 
	# where `pattern[j...m]` appears in `pattern[0:p]`.
	good_suffix[j] = p

The Good Suffix Table is of length $m + 1$ instead of $m$ because an extra position is needed to handle the case where the entire pattern is a good suffix.

The table good_suffix[j] stores the rightmost endpoint of a suffix pat[j...m] elsewhere in the pattern.

When j = 1, the suffix is pat[1...m], which is the entire pattern itself.

There must be a way to define a shift when no other valid suffix occurrence exists in the pattern.

By extending the table to m+1, a shift can be assigned for cases where the entire pattern needs to be moved forward, ensuring correctness in the algorithm.

Example Of Computing The Good Suffix Table

For pat = acababacaba (m = 11), the Z_suffix values (computed in the previous example) and the initial good_suffix array is displayed in the table as:

j 1 2 3 4 5 6 7 8 9 10 11 12
pat a c a b a b a c a b a
Z_suffix 1 0 1 0 5 0 3 0 1 0 - -
good_suffix 0 0 0 0 0 0 0 0 0 0 0 0

To populate the table, iterate over each position p from 1 to m-1 and calculate the starting position—j—of a suffix in the pattern, using:

$j = m - Z_{suffix} [p] + 1$

After this, set good_suffix[j] = p, giving the rightmost endpoint of a suffix pat[j...m] elsewhere in the pattern.

When p = 1:

Z_suffix[1] = 1

j = m - Z_suffix[1] + 1 = 11 - 1 + 1 = 11

We set good_suffix[11] = 1.

When p = 2:

Z_suffix[2] = 0

j = m - Z_suffix[2] + 1 = 11 - 0 + 1 = 12

We set good_suffix[12] = 2.

When p = 3:

Z_suffix[3] = 1

j = m - Z_suffix[3] + 1 = 11 - 1 + 1 = 11

We set good_suffix[11] = 3 (overwrites the previous value).

When p = 4:

Z_suffix[4] = 0

j = m - Z_suffix[4] + 1 = 11 - 0 + 1 = 12

We set good_suffix[12] = 4 (overwrites the previous value).

When p = 5:

Z_suffix[5] = 5

j = m - Z_suffix[5] + 1 = 11 - 5 + 1 = 7

We set good_suffix[7] = 5.

When p = 6:

Z_suffix[6] = 0

j = m - Z_suffix[6] + 1 = 11 - 0 + 1 = 12

We set good_suffix[12] = 6 (overwrites the previous value).

When p = 7:

Z_suffix[7] = 3

j = m - Z_suffix[7] + 1 = 11 - 3 + 1 = 9

We set good_suffix[9] = 7.

When p = 8:

Z_suffix[8] = 0

j = m - Z_suffix[8] + 1 = 11 - 0 + 1 = 12

We set good_suffix[12] = 8 (overwrites the previous value).

When p = 9:

Z_suffix[9] = 1

j = m - Z_suffix[9] + 1 = 11 - 1 + 1 = 11

We set good_suffix[11] = 9.

When p = 10

Z_suffix[10] = 0

j = m - Z_suffix[9] + 1 = 11 - 0 + 1 = 12

We set good_suffix[12] = 10.

After completing all the iterations, the final Good Suffix Table (good_suffix) for pat = acababacaba is:

j 1 2 3 4 5 6 7 8 9 10 11 12
pat a c a b a b a c a b a
Z_suffix 1 0 1 0 5 0 3 0 1 0 - -
good_suffix 0 0 0 0 0 0 5 0 7 0 9 10

The resulting Good Suffix Table tells us how far to shift the pattern when a mismatch occurs.

`j`	1	2	3	4	5	6	7	8	9	10	11	12
`pat`	a	c	a	b	a	b	a	c	a	b	a
`Z_suffix`	1	0	1	0	5	0	3	0	1	0	-	-
`good_suffix`	0	0	0	0	0	0	0	0	0	0	0	0

`j`	1	2	3	4	5	6	7	8	9	10	11	12
`pat`	a	c	a	b	a	b	a	c	a	b	a
`Z_suffix`	1	0	1	0	5	0	3	0	1	0	-	-
`good_suffix`	0	0	0	0	0	0	5	0	7	0	9	10

Using The Good Suffix Rule

To apply the Good Suffix Rule, the following cases must be handled, based on whether a suffix of the pattern has a valid alignment elsewhere.

Case 1a: Good Suffix Found Elsewhere

If a mismatch occurs at position $k$ in the pattern, and good_suffix[k+1] > 0, then the pattern can be shifted by m - good_suffix[k+1] places to align the matched good suffix withs its rightmost occurrence elsewhere in the pattern.

As good_suffix[k+1] is an index, in 0-based indexing, we must use m - good_suffix[k+1] - 1

Case 1b: Good Suffix Not Found Elsewhere

If a mismatch occurs at position $k$ in the pattern, and good_suffix[k+1] = 0, then no proper alignment exists for the good suffix in the pattern. Instead, a shift based on the matched prefix is required.

Naively shifting by m may skip valid matches, as a suffix of the good suffix might still occur elsewhere in the pattern.

Matched Prefix

The matched prefix value matched_prefix[k+1] is the length of the longest suffix of the good suffix pat[k+1...m] that matches a prefix of the pattern pat[1...m−k].

When good_suffix[k+1] = 0, this allows a safe shift of m − matched_prefix[k+1] positions.

These values can be precomputed in O(m) time using the Z-algorithm.

def matched_prefix_table(pattern: str):
    m = len(pattern)
    # Step 1: Compute the Z-array for the pattern.
    # Z[i] gives the length of the longest substring starting at i 
    # that matches the prefix of pattern.
    Z = z_algorithm(pattern)
 
    # Step 2: Initialize the matched_prefix array.
    # matched_prefix[i] will eventually store the length of the longest prefix of pattern
    # that matches the suffix starting at position i.
    matched_prefix = [0] * m
 
    # Step 3: Scan Z-array from right to left.
    # We maintain `max_l`, which tracks the maximum length of a matching prefix found so far 
    # that aligns with a suffix starting at or after position i.
    max_l = 0
    for i in range(m - 1, -1, -1):
        # If Z[i] == m - i, then the substring from i to the end is a suffix
        # that exactly matches a prefix of the pattern.
        if Z[i] == m - i:
            max_l = Z[i]
 
        # Regardless of whether the current Z[i] is a full suffix or not, 
        # we store the maximum length we've seen so far.
        # This ensures matched_prefix[i] always has the longest matching prefix
        # that can be found among any suffix starting at or after i.
        matched_prefix[i] = max_l
 
    return matched_prefix

Example of Calculating Matched Prefix Values

Once again consider pat[1...11] = a c a b a b a c a b a:

i 1 2 3 4 5 6 7 8 9 10 11 12
pat a c a b a b a c a b a
Z_suffix 1 0 1 0 5 0 3 0 1 0 - -
good_suffix 0 0 0 0 0 0 5 0 7 0 9 10
matched_prefix

The corresponding matched prefix values are calculated as:

For each i = k+1, we compare each suffix of pat[k+1...m] with the prefix pat[1...m-k], and then store the length of the largest suffix that matches the prefix.

i = k+1 Suffix: pat[k+1...m] Prefix: pat[1...m-k] Matched Prefix matched_prefix[i]
1 acababacaba acababacaba acababacaba $11$
2 cababacaba acababacab acaba $5$
3 ababacaba acababaca acaba $5$
4 babacaba acababac acaba $5$
5 abacaba acababa acaba $5$
6 bacaba acabab acaba $5$
7 acaba acaba acaba $5$
8 caba acab a $1$
9 aba aca a $1$
10 ba ac a $1$
11 a a a $1$
12 - - - $0$

This results in our final table

i 1 2 3 4 5 6 7 8 9 10 11 12
pat a c a b a b a c a b a -
Z_suffix 1 0 1 0 5 0 3 0 1 0 - -
good_suffix 0 0 0 0 0 0 5 0 7 0 9 10
matched_prefix 11 5 5 5 5 5 5 1 1 1 1 -

`i`	1	2	3	4	5	6	7	8	9	10	11	12
`pat`	a	c	a	b	a	b	a	c	a	b	a
`Z_suffix`	1	0	1	0	5	0	3	0	1	0	-	-
`good_suffix`	0	0	0	0	0	0	5	0	7	0	9	10
`matched_prefix`

`i = k+1`	Suffix: `pat[k+1...m]`	Prefix: `pat[1...m-k]`	Matched Prefix	`matched_prefix[i]`
1	`acababacaba`	`acababacaba`	`acababacaba`	$11$
2	`cababacaba`	`acababacab`	`acaba`	$5$
3	`ababacaba`	`acababaca`	`acaba`	$5$
4	`babacaba`	`acababac`	`acaba`	$5$
5	`abacaba`	`acababa`	`acaba`	$5$
6	`bacaba`	`acabab`	`acaba`	$5$
7	`acaba`	`acaba`	`acaba`	$5$
8	`caba`	`acab`	`a`	$1$
9	`aba`	`aca`	`a`	$1$
10	`ba`	`ac`	`a`	$1$
11	`a`	`a`	`a`	$1$
12	-	-	-	$0$

`i`	1	2	3	4	5	6	7	8	9	10	11	12
`pat`	a	c	a	b	a	b	a	c	a	b	a	-
`Z_suffix`	1	0	1	0	5	0	3	0	1	0	-	-
`good_suffix`	0	0	0	0	0	0	5	0	7	0	9	10
`matched_prefix`	11	5	5	5	5	5	5	1	1	1	1	-

As matched_prefix[k+1] is a length, the formula remains m - matched_prefix[k+1] in 0-based indexing.

Case 2: Exact Match Found

If an exact occurrence of the pattern was found in the text at a position starting from $j$ , i.e. pat[1...m] fully matches text[j...j+m-1]. Then the pattern can be shifted by m - matched_prefix[2] positions.

This is because matched_prefix[2] gives the longest prefix of the pattern, that also appears as a suffix in pat[2...m], so we can safely realign this overlapping prefix with the suffix of the current match — avoiding redundant comparisons and ensuring that the next potential match is not missed.

Example of Exact Match

For example, consider the pattern pat = acababacaba.

Here, matched_prefix[2] = 5, which means the longest prefix of pat — namely acaba — also appears as a suffix in the substring pat[2...m] (i.e., cababacaba).

If an exact match of the pattern is found at some position j in the text, we can safely shift the pattern by m - matched_prefix[2] = 11 - 5 = 6 positions.

This shift realigns the overlapping prefix acaba from the start of the pattern with its corresponding suffix in the previous match, ensuring that no potential matches are skipped and redundant comparisons are avoided.

Quartz 4

Explorer

The Good Suffix Shift Rule

Formalising The Good Suffix Rule

Implementing The Good Suffix Rule

Z-Suffix Values

The Good Suffix Table

Computing The Good Suffix Table

Using The Good Suffix Rule

Case 1a: Good Suffix Found Elsewhere

Case 1b: Good Suffix Not Found Elsewhere

Matched Prefix

Case 2: Exact Match Found

Graph View

Table of Contents

Backlinks