The Bad Character Shift Rule allows the pattern to be shifted by multiple characters upon a mismatch, rather than one character, as in the naive approach.
Understanding The Bad Character Shift Rule

When scanning a pattern[1...m] from right-to-left against a text[1...n] and a mismatch occurs at some position in the pattern, i.e. pat[k] != txt[j+k-1].
Then the bad character is defined as the mismatched character in the text, i.e.
bad_character = txt[j+k-1].
The pattern is moved so that the bad_character in the text aligns with its rightmost occurrence (to the left of the point of mismatch) in the pattern. To efficiently determine the shift amount, a Rightmost Occurrence Table is utilised.
Rightmost Occurrence Table
The Rightmost Occurrence Table is a precomputed table for all possible characters in the pattern that stores the rightmost occurrence of x to the left of in the pattern. If no such occurrence exists,
This table can be used to determine how far to shift the pattern when a mismatch occurs.
Note that "the left of " is exclusive, so not including
pat[k]
Example of a Rightmost Occurrence table
For the pattern,
pat[1...7] = tbapxab, then the corresponding is computed as so:
- For :
- There are no characters to the left of , thus consider this the base case and default all values to
- For :
- The only character to the left of is
tat position 1.- Thus, , and all other values remain .
- For :
- The characters to the left of are
tat position 1 andbat position 2.- , , and all other values remain .
- For :
- The characters to the left of are
tat 1,bat 2, andaat 3.- , , , and all others remain .
- For :
- The characters to the left of are
tat 1,bat 2,aat 3, andpat 4.- , , , , and others remain .
- For :
- The characters to the left of are
tat 1,bat 2,aat 3,pat 4, andxat 5.- , , , , , and others remain .
- For :
- The characters to the left of are
tat 1,bat 2,aat 3,pat 4,xat 5, andaat 6.- Since
aappears at both 3 and 6, we take the rightmost occurrence before 7, which is 6.- , , , , , and others remain .
This results in a final table:
abptxz0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 1 0 0 3 2 0 1 0 0 3 2 4 1 0 0 3 2 4 1 5 0 6 2 4 1 5 0 Note that all values are since
zdoes not appear in the string. Although no other characters are shown, in practice, this table is precomputed for the entire alphabet and ordered by character value, which is whyzis highlighted.
Computing
Let text[1...n] and pat[1...m] be strings from the alphabet .
To efficiently compute , pre-process pat and store, for each character , and index , the rightmost occurrence of x to the left of in the pattern.
To achieve this, the algorithm iterates over the pattern whilst keeping track of the last seen index of each character.
def compute_rightmost_occurrence_table(pattern: str, alphabet: set) -> list[dict]:
m = len(pattern)
R = []
# Tracks the last seen index of each character, defaults to -1.
last_seen = {char: -1 for char in alphabet}
# Base Case: R_0 is always 0, as there is nothing to the left of index 0
R.append(last_seen.copy())
# Iterates from 1 to m
for k in range(1, m):
# Get the character at position k-1 (to the left of k)
char = pattern[k-1]
# Update the last seen position of the character
last_seen[char] = k - 1
# Copy the updated last_seen mapping into R_k
R.append(last_seen.copy())
return RAt the end of computation, the
last_seendictionary is actually the value for the Simple Bad Character Shift Rule
Time Complexity Analysis
- The
last_seendictionary (as well as ) is initialised in time- The main loop iterates times and includes:
- Dictionary lookup and updates which is amortised and worst case due to hash collisions
- Copying the
last_seendictionary, which is time.- The total complexity is thus in the worst case, however, note that the alphabet is often a constant.
- Additionally, an ASCII dictionary can be used to ensure no hash collisions and lookup and update.
Space Complexity Analysis
- The
last_seendictionary has complexity.- The table is of size , where each entry is a dictionary of size resulting in space.
- The total complexity is thus , however, note that the alphabet is often a constant.
Implementing The Bad Character Shift Rule

Consider a general iteration where pat[1...m] is aligned with text[j...j+m-1] and a right-to-left scan is performed. If a mismatch occurs at some position in pat, then the bad character—text[j+k-1], must align with its rightmost occurrence in pat (to the left of ).
To determine the shift amount, we use the pre-computed Rightmost Occurrence Table :
When , then the shift amount is . This makes sense because if the character doesn’t appear in the pattern, a match is impossible, so we can safely shift the entire pattern past it.
This is actually known as the Extended Bad Character Shift Rule. The regular rule uses , which only considers the rightmost occurrence of in the pattern without respect to
The regular rule benefits over the extended rule in space efficiency, as the space complexity is reduced to due to the only data being stored is the
last_seendictionary ().However, it is less time efficient in the case where (which can’t happen in the extended rule), because there’s no appropriate shift, so a naive 1-character shift must be made.
Example of the Bad Character Shift Rule (Three Cases)
Case 1: Bad Character Appears in the Pattern
- The pattern and text are compared right-to-left until a mismatch is found at
txt[5] = Twithpat[3] = A.
- The bad character is
txt[5] = T.- We look for the rightmost occurrence of
Tin the pattern to the left of index .
- The rightmost occurrence is
pat[1] = T.- The pattern is shifted so that
pat[1] = Taligns withtxt[5] = T.
- The shift amount is calculated as
3 - 1 = 2.Case 2: Multiple Occurrences of Bad Character in the Pattern
- A mismatch occurs at
txt[9] = Bwithpat[5] = X.
- The bad character is
txt[9] = B.Bappears twice in the pattern: atpat[4] = Bandpat[7] = B.
- We choose the rightmost occurrence before index , which is
pat[4] = B.- The pattern is shifted so that
pat[4] = Baligns withtxt[9] = B.
- The shift amount is
5 - 4 = 1.Case 3: Bad Character Not in the Pattern
- A mismatch occurs at
txt[12] = Jwithpat[7] = B.
- The bad character is
txt[12] = J.Jdoes not appear in the pattern.
- Since
Jis absent, the pattern is fully shifted past it.- The shift amount is
7 - 0 = 7.


