Operations On B-Trees

Retrieval (Search)

Searching in a B-Tree follows the same principles as in a Binary Search Tree, but with multi-way branching due to each node having $t \geq 2$ children.

The algorithm recursively descends the tree based on comparisons:

As each node contains multiple keys sorted, a binary search within the node is used to locate the appropriate interval for the target key.
If the key matches one in the node, the search terminates successfully.
If not, the key must lie between two existing keys or fall outside the key range and the algorithm follows the child pointer that covers that range.
If the correct child does not exist (i.e. the current node is a leaf), the search terminates unsuccessfully.

This structure enables fast lookups with minimal disk reads, as each node access potentially narrows the search range significantly.

Example Of Search

Inserting Into A B-Tree

Insertion in a B-Tree involves traversing to the appropriate leaf node and placing the new key, while preserving all B-tree properties, particularly:

No node exceeds its maximum key capacity and,
All leaves remain at the same depth (i.e., the tree stays balanced).

Therefore, the algorithm cannot insert into a full node or create arbitrary new leaves. Instead, it follows a proactive strategy:

Traverse from the root to the appropriate leaf, making comparisons as in a standard search.
Before descending into any child node, check if it is full (i.e., contains $2 t - 1$ keys).
If a node is full, do not step on it, split it before descending:
1. Divide the node into two around its median key.
2. Promote the median key to the parent node, which is guaranteed to have space due to this proactive strategy.
After all necessary splits, descend to the correct non-full leaf and insert the new key in sorted order (using binary search to find the insertion point).

This algorithm never descend into a full node, allowing insertion without violating the B-tree’s degree constraints. Additionally, as the tree’s height increases from the root (when it splits), the leaves are always at the same depth.

Splitting Preserves B-Tree Bounds

A full node has $2 t - 1$ keys. When split, one key is promoted to the parent, and the remaining $2 t - 2$ keys are divided evenly into two nodes with $t - 1$ keys each, satisfying the lower and upper bounds. This will always be a clean split.

When a new root is created during a split, it starts with just one key. This is valid, as the root is exempt from the minimum key constraint.

Deletion

Deletion in a B-Tree is analogous to insertion but must avoid descending into nodes with the minimum number of keys ( $t - 1$ ), to prevent underflow. The root is once again, exempt from this restriction.

Deletion is handled through three main cases:

Case 1: `x` Is At A Leaf Node With $> t - 1$ Keys

If the element to delete, x, is at a leaf node that is not at minimum capacity ( $> t - 1$ ), then simply delete x. This is the only case where deletion occurs, all other cases eventually transform into this one.

Case 2: `x` Is In An Internal Node

If x is in an internal node, then replace x with a suitable in-order replacement:

Case 2a: If the left child of x has $> t - 1$ keys (more than the lower bound):
- Find the predecessor w in the left subtree, replace x with w, and recursively delete w.
Case 2b: If the right child of x has $> t - 1$ keys (more than the lower bound):
- Find the successor y in the right subtree, replace x with y, and recursively delete y.
Case 2c: If both children have exactly $t - 1$ keys:
- Merge x and its two children into a single node, then recursively delete x.

When both children exceed the lower bound, either Case 2a or 2b may be chosen.

Case 3: Descending Into A Subtree With $t - 1$ Keys

If the recursive deletion path reaches a subtree containing a child node with exactly $t - 1$ keys (the lower bound), the algorithm must first ensure the node has sufficient keys to avoid underflow before descending further.

Case 3a: If the subtrees immediate sibling has $> t - 1$ keys (i.e. more than the lower bound):
- Let $T_{k}$ be the deficient child (with $t - 1$ keys), and $T_{k + 1}$ or $T_{k - 1}$ be the richer sibling (with $> t - 1$ keys).
- Perform a rotation among the triple $(T_{k}, parent key, T_{k + 1})$ as follows:
  1. Move the appropriate parent key down into $T_{k}$ to restore capacity.
  2. Promote the leftmost key from $T_{k + 1}$ (or rightmost from $T_{k - 1}$ ) into the parent to preserve key order.
  3. Reassign child pointers accordingly: the leftmost/rightmost subtree associated with the promoted key must be transferred to the rotated key in $T_{k}$ .
Case 3b: If the subtrees immediate sibling has $t - 1$ keys
- Merge the node with its immediate sibling and the separating parent key into a single node.
  - This new merged node will have $2 t - 1$ keys, satisfying all constraints
- If the parent was the root and becomes empty, the merged node becomes the root (the trees height decreases).

A "sibling" refers to the adjacent subtree of a shared parent key.

Complexity Analysis

Let $n$ be the total number of keys in the B-tree, and $t$ the minimum degree.

Height

The height of a B-tree is $O (lo g_{t} n)$ because each node has at least $t$ children (except possibly the root), leading to exponential growth in capacity as the depth increases.

Time Complexity

Searching requires $O (lo g_{t} n)$ node accesses, with each node searched using binary search in $O (lo g t)$ time. The combined complexity is $O (lo g_{t} n + lo g t) = O (lo g n)$ .

Insertion involves traversing from root to leaf over $O (lo g_{t} n)$ levels. At most one split occurs per level, each costing $O (t)$ . Thus, the total insertion complexity is $O (t lo g_{t} n)$ .

Deletion also descends $O (lo g_{t} n)$ levels and may perform merges or rotations costing $O (t)$ per level. This results in an overall complexity of $O (t lo g_{t} n)$ .

Space Complexity

A B-tree stores all $n$ keys distributed across its nodes, each containing between $t - 1$ and $2 t - 1$ keys, resulting in $O (n)$ space overall. The node structure and child pointers add only constant overhead per key, so the total space used scales linearly with the number of stored keys.

Although node splits and merges cost $O (t)$ , when $t$ is fixed, both insertion and deletion run in $O (lo g n)$ time overall.

In practice, $t$ is chosen large (e.g., $t = 100$ ) to keep tree height small and optimise cache and disk performance.

Quartz 4

Explorer

Operations On B-Trees

Retrieval (Search)

Inserting Into A B-Tree

Deletion

Case 1: `x` Is At A Leaf Node With $> t - 1$ Keys

Case 2: `x` Is In An Internal Node

Case 3: Descending Into A Subtree With $t - 1$ Keys

Complexity Analysis

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Operations On B-Trees

Retrieval (Search)

Inserting Into A B-Tree

Deletion

Case 1: x Is At A Leaf Node With >t−1 Keys

Case 2: x Is In An Internal Node

Case 3: Descending Into A Subtree With t−1 Keys

Complexity Analysis

Graph View

Table of Contents

Backlinks

Case 1: `x` Is At A Leaf Node With $> t - 1$ Keys

Case 2: `x` Is In An Internal Node

Case 3: Descending Into A Subtree With $t - 1$ Keys