Chapter 12: Heaps and Priority Queues

Part III: Trees and Hierarchies

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” — Linus Torvalds

The Scheduler Debate

The team was arguing about data structures. We needed a task scheduler for a real-time operating system that could:

Insert new tasks with priorities (O(log n))
Get the highest-priority task (O(1))
Remove the highest-priority task (O(log n))

“Use a sorted array,” someone suggested. But insertion is O(n)—you have to shift elements.

“Use a linked list,” another said. But finding the max is O(n)—you have to scan the whole list.

“Use a binary search tree,” a third suggested. But we already knew from Chapter 9 that BSTs have terrible cache behavior.

The debate continued until someone mentioned binary heaps. The benchmark results ended the discussion:

$ perf stat -e cache-misses,cycles ./scheduler_heap
  Performance counter stats:
    45,000 cache-misses
 1,200,000 cycles
 
$ perf stat -e cache-misses,cycles ./scheduler_bst
  Performance counter stats:
   180,000 cache-misses
 4,800,000 cycles

The heap was 4× faster than a Red-Black tree, with 4× fewer cache misses.

Why? Heaps are stored in arrays, so they have excellent cache locality.

The Textbook Story

A binary heap is a complete binary tree where each parent is greater than (or less than) its children.

Max-heap example:

        90
       /  \
      70   50
     / \   / \
    40 30 20 10

Properties:

Complete tree: All levels filled except possibly the last, which fills left-to-right
Heap property: Parent ≥ children (max-heap) or parent ≤ children (min-heap)
Operations:
- Insert: O(log n) — add at end, bubble up
- Extract max: O(log n) — remove root, bubble down
- Peek max: O(1) — just read root

The textbook pitch:

Perfect for priority queues
O(log n) insert and delete
O(1) access to max/min
Simple to implement

Sounds great! But there’s a catch…

The Reality Check: Array-Based Heaps

Here’s the key insight: you can store a binary heap in an array using index arithmetic.

Heap as array:

Index:  0   1   2   3   4   5   6
Array: [90][70][50][40][30][20][10]

Tree:
        90 (index 0)
       /  \
      70   50 (indices 1, 2)
     / \   / \
    40 30 20 10 (indices 3, 4, 5, 6)

Index arithmetic:

Parent of node i: (i - 1) / 2
Left child of node i: 2i + 1
Right child of node i: 2i + 2

No pointers! Just array indices.

Here’s the implementation I used:

typedef struct {
    int *data;
    int size;
    int capacity;
} heap_t;

void heap_insert(heap_t *heap, int value) {
    // Add at end
    heap->data[heap->size] = value;
    int i = heap->size;
    heap->size++;
    
    // Bubble up
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (heap->data[i] <= heap->data[parent]) break;
        
        // Swap with parent
        int temp = heap->data[i];
        heap->data[i] = heap->data[parent];
        heap->data[parent] = temp;
        
        i = parent;
    }
}

int heap_extract_max(heap_t *heap) {
    int max = heap->data[0];
    
    // Move last element to root
    heap->size--;
    heap->data[0] = heap->data[heap->size];
    
    // Bubble down
    int i = 0;
    while (2 * i + 1 < heap->size) {
        int left = 2 * i + 1;
        int right = 2 * i + 2;
        int largest = i;
        
        if (left < heap->size && heap->data[left] > heap->data[largest]) {
            largest = left;
        }
        if (right < heap->size && heap->data[right] > heap->data[largest]) {
            largest = right;
        }
        
        if (largest == i) break;
        
        // Swap with largest child
        int temp = heap->data[i];
        heap->data[i] = heap->data[largest];
        heap->data[largest] = temp;
        
        i = largest;
    }
    
    return max;
}

Cache behavior:

All data in contiguous array
Bubble up/down accesses nearby elements
Excellent spatial locality

This is why the heap was 4× faster than the Red-Black tree. No pointer-chasing, just array indexing.

The Benchmark Results

I tested the heap-based scheduler against other data structures:

Dataset: 10,000 tasks with random priorities
Test: 100,000 insert + extract-max operations

Red-Black tree:
  Cycles/operation: 4,800
  Cache misses: 18.0

Binary heap (array-based):
  Cycles/operation: 1,200
  Cache misses: 4.5
  Speedup: 4.0×

Sorted array:
  Insert: 12,000 cycles (O(n) shifting)
  Extract-max: 100 cycles (O(1) pop from end)
  Average: 6,050 cycles/operation

The heap was the clear winner for this workload.

Why the heap wins:

Array-based: All data contiguous, excellent cache locality
Balanced operations: Both insert and extract-max are O(log n)
No pointer-chasing: Just array indexing

Why sorted array loses:

Insert is O(n) because you have to shift elements
For insert-heavy workloads, this dominates

Cache-Conscious Optimization: d-ary Heaps

Binary heaps have a problem: as the heap grows, bubble-up and bubble-down operations jump around in memory.

For a heap with 1 million elements:

Height: log₂(1M) ≈ 20 levels
Bubble-down: 20 cache misses (each level is a different cache line)

Solution: Use a d-ary heap where each node has d children instead of 2.

4-ary heap:

Index:  0   1   2   3   4   5   6   7   8   9  10  11  12
Array: [90][70][60][50][40][65][55][45][30][35][25][20][15]

Tree:
           90 (index 0)
        /  |  |  \
      70  60  50  40 (indices 1, 2, 3, 4)
     /|\ /|\ /|\ /|\
    ... (indices 5-20)

Index arithmetic (d-ary heap):

Parent of node i: (i - 1) / d
First child of node i: d × i + 1
Last child of node i: d × i + d

Trade-off:

Shorter tree: Height = log_d(n) instead of log₂(n)
More comparisons per level: Must compare d children instead of 2
Better cache behavior: Fewer levels = fewer cache misses

I tested different values of d:

Dataset: 1,000,000 elements
Test: 100,000 insert + extract-max operations

Binary heap (d=2):
  Height: 20 levels
  Cycles/operation: 2,400
  Cache misses: 8.5

4-ary heap (d=4):
  Height: 10 levels
  Cycles/operation: 1,600
  Cache misses: 4.2
  Speedup: 1.5×

8-ary heap (d=8):
  Height: 7 levels
  Cycles/operation: 1,400
  Cache misses: 2.8
  Speedup: 1.7×

16-ary heap (d=16):
  Height: 5 levels
  Cycles/operation: 1,500
  Cache misses: 2.1
  Speedup: 1.6× (diminishing returns)

Sweet spot: d=8 for most workloads. Reduces cache misses by 3× without too many comparisons per level.

Real-World Example: Linux Kernel CFS Scheduler

The Linux kernel’s Completely Fair Scheduler (CFS) uses a red-black tree, not a heap. Why?

Because the scheduler needs more than just “get highest priority task”:

Range queries: Find all tasks with priority > X
Arbitrary removal: Remove a specific task (not just the max)
Fair scheduling: Track virtual runtime, not just priority

Heaps can’t do these efficiently. They’re optimized for one thing: priority queue operations (insert, extract-max).

But for simpler schedulers (like in embedded RTOSes), heaps are perfect.

Example: FreeRTOS uses a simple priority-based scheduler with a heap-like structure (though implemented as a linked list for small task counts).

When to Use Heaps

After using heaps in our RTOS scheduler, I learned when they’re the right choice:

1. Priority Queues

If you need:

Insert with priority: O(log n)
Get max/min: O(1)
Remove max/min: O(log n)

Use a heap. It’s the textbook data structure for priority queues.

Examples:

Task schedulers
Event queues
Dijkstra’s shortest path algorithm
Huffman coding

2. Top-K Problems

Finding the K largest/smallest elements in a stream:

Maintain a min-heap of size K
For each new element, if it’s larger than the heap’s min, replace the min
Final heap contains the K largest elements

Time: O(n log k) instead of O(n log n) for full sorting

3. Median Maintenance

Maintain the median of a stream using two heaps:

Max-heap for the smaller half
Min-heap for the larger half
Median is the root of the larger heap (or average of both roots)

Time: O(log n) per insert, O(1) to get median

4. When NOT to Use Heaps

Don’t use heaps for:

Arbitrary removal: Removing a non-root element is O(n)
Search: Finding an arbitrary element is O(n)
Range queries: Can’t efficiently find all elements in a range
Sorted iteration: Heaps don’t maintain full sorted order

For these, use a balanced BST or B-tree instead.

The scheduler debate was settled by the numbers. The binary heap delivered 4× better performance than the Red-Black tree, with 4× fewer cache misses. The heap’s array-based layout provided the cache locality that pointer-based trees couldn’t match.

Key insights:

Heaps are array-based. No pointers, no pointer-chasing, just array indexing. This gives excellent cache locality.
Complete binary trees fit perfectly in arrays. The index arithmetic (parent = (i-1)/2, children = 2i+1 and 2i+2) is simple and cache-friendly.
d-ary heaps reduce cache misses. By increasing the branching factor to 4 or 8, you reduce tree height and cache misses by 2-3×.
Heaps are for priority queues. If you need insert, extract-max, and peek-max, heaps are perfect. But they can’t do arbitrary removal or range queries efficiently.
Trade-offs matter. Binary heaps (d=2) have fewer comparisons per level. 8-ary heaps (d=8) have fewer cache misses. The sweet spot depends on your workload.

The numbers from our RTOS scheduler:

Red-Black tree: 4,800 cycles/operation, 18.0 cache misses
Binary heap: 1,200 cycles/operation, 4.5 cache misses
8-ary heap: 1,400 cycles/operation, 2.8 cache misses

For our scheduler, the binary heap was the clear winner. Simple, fast, and cache-friendly.

Next chapter: Part IV begins with graphs and their memory representations.

Data Structures in Practice