Boyer-Moore leverages both the extended bad character rule and the good suffix rule to compute the maximum possible shift at each alignment, significantly improving efficiency over naive matching.
Overview Of The Algorithm
- Pre-process the pattern
pat[1...m]to compute:- The values needed for the extended bad character shift rule, and
- The
good_suffixandmatched_prefixvalues need for the good suffix shift rule
- The main search algorithm proceeds as follows:
- Align
pat[1...m]withtext[1...n], and scan characters right-to-left within the current alignment. - If a mismatch occurs,
- Use the bad character rule to compute a potential shift, denoted as
- Use the good suffix rule to compute a second shift, denoted as
- Shift the pattern to the right by positions
- If a complete match occurs:
- Report the match position
- Then shift the pattern by
m - matched_prefix[2]positions
- Align
The Boyer-Moore algorithm requires Galil’s optimisation to achieve linear-time complexity in the worst case, as without, the time complexity can degrade to .
Worst-Case Example for Boyer-Moore
Let
text[1...10] = aaaaaaaaaaandpat[1...3] = aaa.In this case, every valid alignment between
patandtextresults in a complete match.Since
m - matched_prefix[2] = 1, the pattern shifts by only one position each time. This means Boyer-Moore examines all possible alignments, resulting in an time complexity.Clearly, this is inefficient despite all matches being trivial.
Galil’s Optimisation
Galil’s optimisation is a technique that avoids redundant character comparisons during the search phase of the Boyer-Moore algorithm:
- Each time the pattern shifts, there’s often regions of the pattern that are guaranteed to match the text at the new alignment.
- These matching regions arise naturally from the logic behind the good suffix and matched prefix rules.
- Galil’s insight is to skip comparisons in these known-matching regions in the next iteration, thus avoiding unnecessary work.
Implementing Galil’s Optimisation
To implement this optimisation, maintain two indices:
start— the leftmost index of the region that is know to matchstop— the rightmost index of the matching region
When the pattern is realigned, the right-to-left scan of the pattern:
- Starts from the end (
pat[m]) and proceeds down tostop + 1. - Skips the region from
starttostopentirely. - Resumes from
start - 1down topat[1]if needed.
This segmentation ensures that the known-matching region pat[start ... stop] is never re-compared against the text.
Two Cases To Consider
There are two main scenarios in which Galil’s optimisation is applied:

- Good Suffix Shift
- If a mismatch occurs at some position , and the good suffix rule suggests a shift such that
good_suffix[k+1] = p > 0 - Then it is known that
pat[p-m+k+1...p]matchestxt[j+k...j+m-1] - Therefore, comparisons in this region can be skipped in the next scan, and the variable are set as:
start = p - m + k + 1stop = p
- If a mismatch occurs at some position , and the good suffix rule suggests a shift such that

- Matched Prefix Shift
- If the shift is based on a matched prefix, (when
good_suffix[k+1] = 0), - Then the prefix
pat[1...matched_prefix[k+1]matchestxt[j+m-matched_prefix[k+1]...j+m-1] - Therefore, comparisons in this region can be skipped in the next scan, and the variable are set as:
start = 1stop = matched_prefix[k+1]
- If the shift is based on a matched prefix, (when
Galil’s optimisation ensures that Boyer-Moore does not re-check character pairs that are already known to match. By efficiently skipping over these regions, the algorithm maintains its worst-case time complexity of after an initial pre-processing phase.
Worst-Case Example with Galil's Optimisation
Let
text[1...10] = aaaaaaaaaaandpat[1...3] = aaa.As before, every valid alignment between
patandtextresults in a complete match. Without optimisation, Boyer-Moore shifts bym - matched_prefix[2] = 1each time, re-checking almost every character.With Galil’s optimisation, after the first match is found, the region
pat[1...matched_prefix[2]] = pat[1...2]is known to match the corresponding region in the text. So on the next shift:
Set
start = 1andstop = matched_prefix[2] = 2During the right-to-left scan, we skip comparing
pat[1...2], since it’s guaranteed to matchThe scan only needs to compare
pat[3]withtxt[j + 2], and then immediately confirms another full matchThis process repeats for each alignment: only one comparison (the rightmost character) is needed at each step, instead of 3.
As a result, the algorithm still explores all alignments, but now does so in linear time. The number of character comparisons is reduced from to — achieving the optimal performance Boyer-Moore promises in theory.