The Bad Character Shift Rule allows the pattern to be shifted by multiple characters upon a mismatch, rather than one character, as in the naive approach.

Understanding The Bad Character Shift Rule

When scanning a pattern[1...m] from right-to-left against a text[1...n] and a mismatch occurs at some position in the pattern, i.e. pat[k] != txt[j+k-1].

Then the bad character is defined as the mismatched character in the text, i.e. bad_character = txt[j+k-1].

The pattern is moved so that the bad_character in the text aligns with its rightmost occurrence (to the left of the point of mismatch) in the pattern. To efficiently determine the shift amount, a Rightmost Occurrence Table is utilised.

Rightmost Occurrence Table

The Rightmost Occurrence Table is a precomputed table for all possible characters in the pattern that stores the rightmost occurrence of x to the left of in the pattern. If no such occurrence exists,

This table can be used to determine how far to shift the pattern when a mismatch occurs.

Note that "the left of " is exclusive, so not including pat[k]

Computing

Let text[1...n] and pat[1...m] be strings from the alphabet .

To efficiently compute , pre-process pat and store, for each character , and index , the rightmost occurrence of x to the left of in the pattern.

To achieve this, the algorithm iterates over the pattern whilst keeping track of the last seen index of each character.

def compute_rightmost_occurrence_table(pattern: str, alphabet: set) -> list[dict]:
    m = len(pattern)
    R = []
    # Tracks the last seen index of each character, defaults to -1. 
    last_seen = {char: -1 for char in alphabet}     
    # Base Case: R_0 is always 0, as there is nothing to the left of index 0
    R.append(last_seen.copy())
    # Iterates from 1 to m
    for k in range(1, m):
        # Get the character at position k-1 (to the left of k)
        char = pattern[k-1]  
        # Update the last seen position of the character
        last_seen[char] = k - 1
        # Copy the updated last_seen mapping into R_k
        R.append(last_seen.copy())
    return R

At the end of computation, the last_seen dictionary is actually the value for the Simple Bad Character Shift Rule

Implementing The Bad Character Shift Rule

Consider a general iteration where pat[1...m] is aligned with text[j...j+m-1] and a right-to-left scan is performed. If a mismatch occurs at some position in pat, then the bad charactertext[j+k-1], must align with its rightmost occurrence in pat (to the left of ).

To determine the shift amount, we use the pre-computed Rightmost Occurrence Table :

When , then the shift amount is . This makes sense because if the character doesn’t appear in the pattern, a match is impossible, so we can safely shift the entire pattern past it.