The Good Suffix Rule shifts the pattern after a mismatch by focusing on the portion of the pattern that matched the text, i.e. the good suffix, aligning it with another occurrence of the same or a similar suffix earlier in the pattern.

It’s especially useful when the Bad Character Rule offers no advantages, such as after a full match or a mismatch near the pattern’s start.

Formalising The Good Suffix Rule

Consider a general iteration where pat[1...m] is aligned with text[j...j+m-1] and a right-to-left scan is performed. If a mismatch occurs at some position in pat, then the good suffix is the suffix pat[k+1...m] that matches with text[j+k...j+m-1].

To apply the Good Suffix Rule, check if there exists a position in pat such that:

  • It is the endpoint of the rightmost occurrence of a substring that exactly matches the good suffix
    • pat[p-(m-k)+1...p] == pat[k+1...m]
  • The preceding character of that substring is different from the mismatched character in pat
    • pat[p-(m-k)] != pat[k]

If such a position exists, the pattern can be shifted right by positions, allowing a new iteration to begin.

This is actually known as the Strong Good Suffix Rule, as the standard Good Suffix Rule does not require the preceding character to be different from the mismatched character in the pattern.

Implementing The Good Suffix Rule

To efficiently implement the Good Suffix Rule, a pre-processing step, inspired by the computation of Z-values, is used to define Z-suffix values. These values are then used to construct the Good Suffix Table.

Z-Suffix Values

For a given pattern pat[1...m], let , for , be the length of the longest substring ending at position that matches its suffix. Formally:

To compute these values efficiently:

  1. Reverse the pattern
  2. Compute Z-values for the reversed pattern
  3. Reverse the resulting -values to obtain

Since the -algorithm runs in time, this pre-processing step is also .

The Good Suffix Table

The Good Suffix Table determines the amount to shift the pattern by when a mismatch occurs. It identifies the rightmost position in the pattern where another good suffix occurs, whilst ensuring that the character preceding differs from the mismatched character.

If no exact match for the good suffix is found, the table helps determine the next best shift by considering smaller suffixes of the current match.

Computing The Good Suffix Table

The Z-suffix values are used to construct the Good Suffix Table. For each starting position of a suffix in the pattern, j, the table stores good_suffix[j] = p, where p is the rightmost position in the pattern such that:

  • The substring pat[p - Z_suffix[p] + 1...p] matches the suffix pat[j...m], and
  • The preceding character pat[p - Z_suffix[p]] is not equal to pat[j-1].
Note, this uses 0-based indexing
# Compute Z-suffix values by applying the Z-algorithm to the reversed pattern.
z_suffix = reverse(z_algorithm(reverse(pattern)))
z_suffix[-1] = 0
 
# Initialise the Good Suffix table with default values (0).
m = len(pat)
good_suffix = [0] * (m+1)
 
# Populate the Good Suffix table using Z-suffix values.
for p in range(m-1): # m - 1 because z_suffix[-1] provides no valuable information
	# `z_suffix[p]` is the length of the longest suffix of `pattern[0:p]` 
	# that matches the suffix of `pattern`.
 
	# Compute `j`, the starting index of this suffix in the original pattern.
	j = m - z_suffix[p] # +1 for 1 base indexing
 
	# `good_suffix[j]` now stores the rightmost position in the pattern 
	# where `pattern[j...m]` appears in `pattern[0:p]`.
	good_suffix[j] = p

Using The Good Suffix Rule

To apply the Good Suffix Rule, the following cases must be handled, based on whether a suffix of the pattern has a valid alignment elsewhere.

Case 1a: Good Suffix Found Elsewhere

If a mismatch occurs at position in the pattern, and good_suffix[k+1] > 0, then the pattern can be shifted by m - good_suffix[k+1] places to align the matched good suffix withs its rightmost occurrence elsewhere in the pattern.

As good_suffix[k+1] is an index, in 0-based indexing, we must use m - good_suffix[k+1] - 1

Case 1b: Good Suffix Not Found Elsewhere

If a mismatch occurs at position in the pattern, and good_suffix[k+1] = 0, then no proper alignment exists for the good suffix in the pattern. Instead, a shift based on the matched prefix is required.

Naively shifting by m may skip valid matches, as a suffix of the good suffix might still occur elsewhere in the pattern.

Matched Prefix

The matched prefix value matched_prefix[k+1] is the length of the longest suffix of the good suffix pat[k+1...m] that matches a prefix of the pattern pat[1...m−k].

When good_suffix[k+1] = 0, this allows a safe shift of m − matched_prefix[k+1] positions.

These values can be precomputed in O(m) time using the Z-algorithm.

def matched_prefix_table(pattern: str):
    m = len(pattern)
    # Step 1: Compute the Z-array for the pattern.
    # Z[i] gives the length of the longest substring starting at i 
    # that matches the prefix of pattern.
    Z = z_algorithm(pattern)
 
    # Step 2: Initialize the matched_prefix array.
    # matched_prefix[i] will eventually store the length of the longest prefix of pattern
    # that matches the suffix starting at position i.
    matched_prefix = [0] * m
 
    # Step 3: Scan Z-array from right to left.
    # We maintain `max_l`, which tracks the maximum length of a matching prefix found so far 
    # that aligns with a suffix starting at or after position i.
    max_l = 0
    for i in range(m - 1, -1, -1):
        # If Z[i] == m - i, then the substring from i to the end is a suffix
        # that exactly matches a prefix of the pattern.
        if Z[i] == m - i:
            max_l = Z[i]
 
        # Regardless of whether the current Z[i] is a full suffix or not, 
        # we store the maximum length we've seen so far.
        # This ensures matched_prefix[i] always has the longest matching prefix
        # that can be found among any suffix starting at or after i.
        matched_prefix[i] = max_l
 
    return matched_prefix

As matched_prefix[k+1] is a length, the formula remains m - matched_prefix[k+1] in 0-based indexing.

Case 2: Exact Match Found

If an exact occurrence of the pattern was found in the text at a position starting from , i.e. pat[1...m] fully matches text[j...j+m-1]. Then the pattern can be shifted by m - matched_prefix[2] positions.

This is because matched_prefix[2] gives the longest prefix of the pattern, that also appears as a suffix in pat[2...m], so we can safely realign this overlapping prefix with the suffix of the current match — avoiding redundant comparisons and ensuring that the next potential match is not missed.