The Burrow-Wheelers Transform of a string is a useful pre processing step to parse large strings, and match very short patterns.

The semantics of LF mapping is the same for inversion and pattern matching. The minus 1 in ep.

When doing inversion, you have the burrow wheelers transform, the string, you have the rank array and the nocc pre processed table. Thus pattern matching is proportional to the pattern length.

To find out where the pat appear in the original text you use the suffix array needed to compute the burrow wheelers transform

For go, sp = 2 and ep = 3, SA[2] and SA[3] gives the start position of the occurrences

Matching ogo In The BWT lo$oogg

pat = gogo bwt = lo$oogg

Preprocess the rank matrix just as in inversion to get:

rank = [$ = 1, g = 2, l = 4, o = 5]

Initialise the start point sp and end point ep to 1 and 7 respectively.

Searching backwards in pat, the first letter to search for is o. The start point is updated using the rule

So to search for go, we want to minimise the range covered by the start and end point. So starting at o:

  • sp = rank(o) + nocc(o, BWT[1...0]) = 5 + nocc(o, "") = 5 + 0 = 5
  • ep = rank(o) + nocc(o, BWT[1...7]) - 1 = 5 + 3 - 1 = 7

Next we need to search for g:

  • sp = rank(g) + nocc(g, BWT[1...5-1]) = 2 + 0 = 2
  • ep = rank(g) + nocc(g, BWT[1...7]) - 1 = 2 + 2 - 1 = 3

So cyclically sp and ep points to the occurrence of go

Next we nee to search for o,

  • sp = rank(o) + nocc(o, BWT[1...2-1]) = 5 + 0 = 5
  • ep = rank(o) + nocc(o, BWT[1...3]) - 1 = 5 + 1 - 1 = 5

So cyclically sp and ep point to an occurrence of ogo. More specifically the range bounded by sp and ep are all occurrences

If we search for g now

  • sp = rank(g) + nocc(o, BWT[1...5-1]) = 2 + 0 = 2
  • ep = rank(g) + nocc(o, BWT[1...5]) - 1 = 2 + 0 - 1 = 1

As gogo doesn’t exist, the ep is greater than the sp.