The Burrows-Wheeler Transform (BWT) of a string, is a transformation formed by the characters in the last column of the Sorted Cyclic Permutation Matrix of the string. It is widely used in pattern matching and lossless compression.
Sorted Cyclic Permutation Matrix
A Cyclic Permutation Matrix is a set of all cyclic rotations (permutations) of a string, with each rotation forming a row in the matrix. Sorting the rows lexicographically yields the Sorted Cyclic Permutation Matrix
Sorted Cyclic Permutation Matrix Of
googol$.The set of cyclic permutations of
googol$are:
googol$oogol$gogol$gogol$goool$googl$googo$googolAfter sorting, these rotations form the Sorted Cyclic Permutation Matrix
As usual to mark the end of the string and ensure uniqueness, a unique symbol $, which is lexicographically smallest, is appended to the string. Since each row is a rotation, characters beyond $ are redundant. Thus, the matrix can be simplified by truncating rows after $.
Simplifying The Sorted Cyclic Permutation Matrix Of
googol$.
The resulting matrix represents the sorted suffixes of the original string. Each row is a suffix text[i...n], therefore a suffix array can capture this by recording only the starting indices of these suffixes relative to the original string.
Therefore, a suffix array can be constructed in place of the Sorted Cyclic Permutation Matrix without any loss of information.
The suffix array of
googol$is[7, 4, 1, 6, 3, 5, 2]
Properties Of The Sorted Cyclic Permutation Matrix
The manipulations made above are supported by the structural properties of the matrix :
Property 1: Each column of the matrix is a permutation of the original string
This is because every character appears exactly once in each rotation, and the matrix contains all such rotations.
Property 2: Any consecutive columns of the matrix represent all possible substrings (in lexicographic order) of length (i.e., all -mers)
This follows from the fact that each row is a rotated version of the text, so each -mer appears exactly once across the rows.
A corollary of Property 2 is that, as consists of sorted cyclic permutations, the last column always precedes the first column in the original text.

Extracting The Burrows-Wheeler Transform
The BWT of a string, is the string formed by the letters in the last column of its Sorted Cyclic Permutation Matrix.
From the last column of the sorted matrix, the BWT of
googol$islo$oogg
Since a suffix array is often used instead of constructing the full matrix, the last column must be computed indirectly.
Due to the cyclic nature of the rotations, the characters in the last column are the ones preceding the characters in the first column. Thus, the index of the characters in the first column is calculated as follows:
bwt[i] = text[suffix_array[i] - 1], where
This provides an alternate definition of the BWT as the string formed by the characters immediately preceding the start positions indicated in the suffix array.
The Burrows-Wheeler Transform Of
googol$Using Suffix ArraysThe suffix array of
googol$is[7, 4, 1, 6, 3, 5, 2]The BWT indices of
googol$are[6, 3, 7, 5, 2, 4, 1]. Note that, whensuffix_array[i] - 1 = 0, simply cycle back ton = 7.The BWT of
googol$is thuslo$oogg
Time Complexity Of BWT Construction
The time complexity of generating the burrow wheelers transform of a string depends on the suffix array construction:
- Naive Method:
- Prefix doubling:

