Prefix-Free Coding For Integers

Standard Huffman coding is inefficient for streams of natural numbers when:

The range is large or unbounded,
Frequencies are flat (e.g. each number appears once),
The integers to encode are large.

This motivates the design of universal codes: prefix-free binary encodings optimized for all positive integers, under the assumption that smaller values are more likely.

Minimal Binary Code

The minimal binary code of a number is its standard binary representation without leading zeros, ensuring the most significant bit is always 1.

For example, the minimal binary code of $56 1_{10} = \dots 0000010001100 1_{2}$ is just 100011001 with no leading zeroes.

Elias Omega Code For Positive Integers

Elias Omega encoding assigns prefix-free binary codewords to any positive integer $N \in Z^{+} = {1, 2, 3, \dots}$ . To perform this method, define:

$binary (\cdot)$ as the minimal binary codeword of a positive integer, and
$len (\cdot)$ as the length (number of bits) of a binary string.

The codeword for any positive integer $N$ is then built recursively as a concatenation of minimal binary representations:

minimal binary of L_{k} binary (L_{k}) \dots minimal binary of L_{2} binary (L_{2}) minimal binary of L_{1} binary (L_{1}) minimal binary of N binary (N)

Where each $L_{n}$ is the length of the minimal binary representation of the previous length, where $L_{0}$ is the length of the minimal binary representation of the input integer $N$ . In other words:

L_{n} = len (binary (L_{n - 1})) - 1, with L_{0} = N,

The recursion continues until $len (binary (L_{k})) = 1$ , i.e. the final encoded length is only one bit long.

???

Lets denote the binary code for each $L_{n}$ as length components $L$ and that of $N$ as the code component

Starting from the code component, the number of bits is floor(log binary(N)) + 1, i.e. $O (lo g binary (N))$ . Each subsequent length component is another logarithmic surrounding i.e. $L_{1}$ is $lo g lo g binary (N) + 1 - 1$ or $O (lo g lo g binary (N))$ , $L_{2}$ is $O (lo g lo g lo g binary (N))$ , and this is continues until $\dots lo g lo g lo g binary (N) = 1$

The minus one in $L_{n}$ guarantees that the length of the code component is strictly greater than length of the length components, and each length component is strictly greater than the length of the preceding length component

Note that the codeword is called $lo g^{*}$ code (why?) due to its repeated $lo g$ , and these log star codes are shown to be universal codes for positive integers.

However this is not prefix free.

Decoding can be problematic using just minimal binary codes. Encoding $N$ and $L_{n}$ integers using directly their minimal binary codes poses a problem. During decoding, we cannot deifferentiate between the length components and the actual code component of $N$ is no prefix free, due to the use of minimal binary codes,

Issues With This Representation

The resultant binary string poses some issues however, first of all, there is no method of differentiating the length components from each other and from the code component.

For $N = 561$ ,

Suppose we are given the Elias Omega encoded string for $N = 561$ :
$L_{2} 11 L_{1} 1001 N 1000110001$
Decoding proceeds by interpreting the sequence as a chain of length components $L_{k}$ followed by the original value $N$ .

When reading the first character, we do not know whether 1 could be 1, or the start of 11, or of 111, etc.

In general, it is impossible to delineate the boundaries of each component

Additionally, another issue that arises is that these binary strings are not prefix-free because all parts, both the length components and the actual number, are encoded using minimal binary, which lacks self-delimiting structure. Minimal binary codes can be prefixes of one another (e.g., 1 is a prefix of 10), so when these are concatenated without boundaries, there’s no way to unambiguously tell where one component ends and the next begins. As a result, one codeword could be the prefix of another, violating prefix-freeness and making decoding ambiguous.

Differentiating Components & Ensuring Prefix Freeness

To make the binary codeword prefix free, as minimal binary codes have their most significant bit equal to 1, a signalling method can be used to differentiate length components, by flipping the most significant bit to zero.

For $N = 561$ , we would thus have

$L_{2} 01 L_{1} 0001 N 1000110001$

If the code used only minimal binary representations, it wouldn’t be prefix-free. This is because one number’s code can be the prefix of another’s, for example, the binary code for $1$ is 1, which is a prefix of $2$ , whose code is 10.

Even when we concatenate length components (which are also minimal binary codes) before flipping any bits, prefix-freeness still fails. For instance, the Elias Omega code for $1$ is 1, while that for $2$ is 110; clearly, the codeword for $1$ is a prefix of that for $2$ .

Prefix-freeness is restored by modifying the encoding: flip the most significant bit of each length component from 1 to 0. This guarantees that all length components begin with a 0, while the final code component (the actual number) always begins with a 1. This clean separation makes it impossible for a length component to be confused with a code component, ensuring no codeword is a prefix of another.

This works because minimal binary codes have no leading zeroes, so any component that begins with 0 must be a length component, and any segment that begins with 1 must be the final value. Since the Elias Omega code builds up in increasing bit-length, prefix conflicts are structurally avoided.

For $N = 313$

The binary representation is $b (N) = 100111001$ , thus we have:

$100111001$

The length $ℓ (b (N)) = 9$ , thus $L_{1} = 9 - 1 = 8$ . As $b (8) = 1000$ , flipping the most significant bit results in $0000$ .

$0000100111001$

The length $ℓ (b (L_{1})) = 4$ , thus $L_{2} = 4 - 1 = 3$ . As $b (3) = 11$ , flipping the most significant bit results in $01$ .

$010000100111001$

The length $ℓ (b (L_{2})) = 2$ , thus $L_{3} = 2 - 1 = 1$ . As $b (1) = 1$ , flipping the most significant bit results in $0$ .

$0010000100111001$

The length $ℓ (b (L_{3})) = 1$ , therefore stop, the above is your prefix-free code word.

Decoding Elias Omega Codes

When decoding a variable-length encoded integer, we don’t know how many length components the codeword and what $N$ is. But we do know that the very last length component, $L_{k}$ , in the codeword has length = (1)dec, and it is encoded with 0 bit.

But note that the most-significant bit of the component can be 1, only when $N = 1_{10}$

Input: codeword[1. . . ]
Initialize: readlen = (1)dec, component = , pos = 1
component = codeword[pos . . . pos + readlen − 1]

If the most-significant bit of component is 1, then N = (component)dec. STOP.

Else, if the most-significant bit of component is 0, then flip 0 → 1 and reset pos = pos + readlen, readlen = (component)dec + 1.

Repeat from step 3 (until N is decoded when step 4 is true).

For code_word = 0010000100111001...

Look at the most significant bit, here it is 0, so we know its a length component. Flip the bit to get:

$1010000100111001 \dots$

The current length component in decimal form is $1$ , so the length of the next component is $1 + 1 = 2$ bits, so read the next $2$ bits and see if the most significant bit is 0 or 1, if its a 1 we found our code component, but here it is 0, so flip and read again:

$1110000100111001 \dots$

As $1 1_{2} = 3_{10}$ , the length of the next component is $3 + 1 = 4$ , reading the next $4$ bits, the most significant bit is 0, so flip and read again:

$1111000100111001 \dots$

As $100 0_{2} = 8_{10}$ , the length of the next component is $8 + 1 = 9$ , reading the next $9$ bits, the most significant bit is 1, so we are at our code component, thus the next $9$ bits can be decoded into the integer

$111100031 3_{10} \dots$

Then for the rest $\dots$ , repeat the algorithm, looking at the most significant bit first (to see if its a 0 or 1).

Elias Omega Code For Non-Negative Integers

The set of positive integers (Z>0) is isomorphic to the set of non-negative integers (Z≥0) meaning… They are both countable sets that share a one-to-one correspondence (bijection).

So Elias omega codewords for non-negative integers applies the same method but for any integer z ∈ Z≥0, assign it the Elias Omega codeword for $z + 1$ in positive range. That is:

Codeword for z = 0 in Z≥0 is same as the codeword for z = 1 in Z>0.
Codeword for z = 1 in Z≥0 is same as the codeword for z = 2 in Z>0.
Codeword for z = 2 in Z≥0 is same as the codeword for z = 3 in Z>0.
Codeword for z = 3 in Z≥0 is same as the codeword for z = 4 in Z>0.
…

This can be done for any countable sets, for all integers you can have a total ordering

Codeword for z = 0 in Z≥0 is same as the codeword for z = 1 in Z>0.
Codeword for z = -1 in Z≥0 is same as the codeword for z = 2 in Z>0.
Codeword for z = 1 in Z≥0 is same as the codeword for z = 3 in Z>0.
Codeword for z = -2 in Z≥0 is same as the codeword for z = 4 in Z>0.
Codeword for z = 2 in Z≥0 is same as the codeword for z = 5 in Z>0.
…

Quartz 4

Explorer

Prefix-Free Coding For Integers

Minimal Binary Code

Elias Omega Code For Positive Integers

???

Issues With This Representation

Differentiating Components & Ensuring Prefix Freeness

Decoding Elias Omega Codes

Elias Omega Code For Non-Negative Integers

Graph View

Table of Contents

Backlinks