Introduction To B-Trees

Large databases that can’t fit in memory are often stored on external drives like HDDs. An issue with this is that disk access is painfully slow compared to CPU and RAM.

Since disk reads happen in chunks (pages), the goal is to minimise how often the disk is accessed.

Unlike binary search trees, which can require many small reads, B-trees are multi-way search trees designed to reduce the number of disk accesses by grouping data to fit neatly into pages, making each read more efficient.

B-Trees

A B-tree is a self-balancing, rooted search tree that generalises binary search trees (BSTs) by allowing each internal node to have more than two children.

An arbitrary node in a B-Tree stores $n$ keys/elements in non-decreasing order: $k_{1} < k_{2} < \dots < k_{n}$ , and has $n + 1$ pointers to subtrees: $T_{1}, T_{2}, \dots, T_{n}, T_{n + 1}$ that divide the key space into intervals.

For example, a node with $3$ keys will have $4$ children, each pointing to a range of values that fall between or outside the keys. The keys act as decision boundaries, guiding searches, insertions, and deletions to the correct subtree.

Properties Of B-Trees

Each successive triplet $(k_{i}, T_{i}, T_{i + 1})$ behaves like a node of a binary tree and satisfies the BST property: $T_{i} < k_{i} < T_{i + 1}$

These pointers to $T_{i}$ and $T_{i + 1}$ divide the key space into intervals, values in the leftmost subtree are less than the first key, values in between keys fall into corresponding subtrees, and values in the rightmost subtree are greater than the last key.

All leaves have the same depth from the root, making the tree strictly balanced

A B-Tree maintains its balance by ensuring that all leaf nodes appear at the same depth, and it maintains balance dynamically by growing or shrinking from the root during insertions and deletions.

Internal nodes have a minimum and maximum number of keys based on the tree’s order.

The bounds are determined by a fixed parameter $t \geq 2$ and $2 t$ , known as the minimum and maximum degrees of the B-Tree.

By definition of degree, $t$ and $2 t$ are the minimum and maximum number of children any internal node (except the root) must have. This means each node must store at least $t - 1$ keys and at most $2 t - 1$ keys.

Additionally, the root is exempted from the lower bound, i.e. it may have fewer than $t - 1$ keys. However, it must still conform to the upper bound. This enables the definition of empty B-trees or B-trees with one element (or any $x < t$ ).

This design keeps B-Trees shallow, reducing the number of required node (or disk) accesses for operations like search, insert and delete.

This efficiency makes them effective for managing large volumes of sorted data, especially when that data lives on slow secondary storage like hard drives or SSDs, where minimising disk accesses is critical.

E.g. filesystems, database indexing, and other high-performance, disk-backed systems

Examples Of B-Trees

![[b_tree_example.png|Example B-Tree ( $t = 3$ ) containing consonants of the English Alphabet]]

In this example, with a minimum degree of $t = 3$ , every internal non-root node must have at least 3 children and thus at least 2 keys. The tree is strictly balanced, meaning all leaves appear at the same depth, and for any given key, all elements in the left subtrees are smaller, and all elements in the right subtrees are larger, preserving the search tree property.

The simplest B-Tree occurs when $t = 2$ , where each internal node may have 2, 3, or 4 children and 1, 2, or 3 keys. This case is known as a 2-3-4 tree. Interestingly, every Red-Black Tree can be seen as a binary encoding of a 2-3-4 tree.

Quartz 4

Explorer

Introduction To B-Trees

B-Trees

Properties Of B-Trees

Examples Of B-Trees

Graph View

Table of Contents

Backlinks