Chapter 21: Stack Analysis and Estimation

Part VI: Embedded Constraints

"Stack overflow is the most insidious bug in embedded systems—it corrupts silently and strikes randomly." — Miro Samek

The Unpredictable Crash

This is a classic embedded nightmare story.

The product had shipped. It was running smoothly. Customers were happy. Then suddenly, one customer reported: "The system randomly reboots. No pattern whatsoever."

The team checked all logs—no error messages. Checked power—stable. Checked temperature—normal range. The problem was like a ghost, impossible to reproduce.

Three weeks later, the issue was narrowed down to a specific usage scenario: when a user rapidly pressed three buttons in sequence, the system had a 30% chance of rebooting.

An engineer finally captured a crash dump and found the program counter (PC) pointing to a completely meaningless address.

The answer: Stack overflow.

The button interrupt handler had nested calls three levels deep into decoding logic, combined with a 512-byte local variable buffer. This combination only occurred under a specific sequence of operations that was never tested during development.

Stack overflow doesn't throw an exception. It doesn't leave an error message. It silently overwrites other memory regions, causing the system to crash at unpredictable times.

This chapter teaches you how to analyze, estimate, and monitor stack usage—before the problem becomes a field failure.

Stack Basics

What is the Stack?

The stack is a contiguous memory region used to store:

Stack contents:
────────────────────────────────────────
1. Function return addresses
2. Local variables
3. Function parameters (some calling conventions)
4. Saved register values (caller/callee saved)
5. Interrupt handler context

Stack Growth Direction

Most architectures have stacks that grow downward:

High addr   ┌───────────────────────┐
            │  Stack start          │ ← Initial SP value
            ├───────────────────────┤
            │  main() frame         │
            ├───────────────────────┤
            │  func_a() frame       │
            ├───────────────────────┤
            │  func_b() frame       │ ← Current SP
            ├───────────────────────┤
            │                       │
            │  Unused stack space   │
            │                       │
            ├───────────────────────┤
            │  Stack guard/bottom   │
Low addr    └───────────────────────┘

Why Stack Size Is Hard to Estimate

Static footprint problem:
─────────────────────────────────────────────────────────
.text, .data, .bss    → Fully determined at compile time
Stack                  → Only know actual usage at runtime

Difficulties:
1. Call depth varies dynamically
2. Recursion depth
3. Function pointer calls
4. Interrupt nesting
5. Conditional local variables (different paths use different sizes)

Static Analysis Tools

GCC's -fstack-usage

GCC can generate stack usage reports for each function:

$ riscv64-unknown-elf-gcc -fstack-usage -Os -c main.c
$ cat main.su
main.c:10:5:main	64	static
main.c:25:6:process_data	256	static
main.c:50:6:handle_interrupt	128	static

Output format:

file:line:column:function_name    stack_usage(bytes)    type

Type descriptions:
- static: Determinable at compile time
- dynamic: Uses VLA or alloca, size unknown
- bounded: Has upper limit but cannot be precisely determined

Pro tip: Find the largest stack users

$ cat *.su | sort -t$'\t' -k2 -nr | head -10
drivers/usb.c:120:6:usb_parse_descriptor	1024	static
app/json.c:45:6:parse_json_object	512	static
app/protocol.c:80:6:decode_frame	384	static
...

Checkstack Script

The Linux kernel provides a checkstack script that analyzes stack usage directly from assembly:

# Generate assembly and analyze
$ riscv64-unknown-elf-objdump -d firmware.elf | ./checkstack.pl riscv

0x80001234 usb_parse_descriptor [firmware.elf]:	1024
0x80002468 parse_json_object [firmware.elf]:	512
0x80003690 decode_frame [firmware.elf]:	384

Limitations

Static analysis has its limits:

Static analysis can handle:
✅ Fixed-size local variables
✅ Direct function call chains
✅ Call depth known at compile time

Static analysis cannot handle:
❌ Recursion depth (unless explicitly bounded)
❌ Function pointer calls
❌ Interrupt nesting levels
❌ VLA (Variable Length Array)
❌ alloca()

Dynamic Measurement Methods

When static analysis is insufficient, we need dynamic measurement.

Stack Painting

This is the classic stack usage measurement method:

#define STACK_PATTERN  0xDEADBEEF

// Fill stack region with pattern at startup
void stack_paint(void) {
    extern uint32_t _stack_start;  // Defined in linker script
    extern uint32_t _stack_end;
    
    uint32_t *p = &_stack_start;
    while (p < &_stack_end) {
        *p++ = STACK_PATTERN;
    }
}

// Measure stack high water mark at any time
size_t stack_get_high_water_mark(void) {
    extern uint32_t _stack_start;
    extern uint32_t _stack_end;
    
    uint32_t *p = &_stack_start;
    while (p < &_stack_end && *p == STACK_PATTERN) {
        p++;
    }
    
    return ((size_t)&_stack_end - (size_t)p);
}

Usage:

int main(void) {
    stack_paint();  // Paint at startup
    
    // ... run application ...
    
    // Check periodically
    size_t used = stack_get_high_water_mark();
    printf("Stack high water mark: %zu bytes\n", used);
}

Important: This method only measures "maximum ever used"—it cannot guarantee future usage won't exceed this value.

Runtime Stack Monitoring

Add stack overflow detection in debug builds:

// Check stack at each function entry
void __attribute__((no_instrument_function))
stack_check(void) {
    extern uint32_t _stack_start;
    register uint32_t sp asm("sp");

    if (sp < (uint32_t)&_stack_start + STACK_GUARD_SIZE) {
        // Stack overflow detected!
        panic("Stack overflow!");
    }
}

// GCC's -finstrument-functions can auto-insert
void __attribute__((no_instrument_function))
__cyg_profile_func_enter(void *this_fn, void *call_site) {
    stack_check();
}

GCC compile option:

# Auto-insert hooks at function entry/exit
$ gcc -finstrument-functions -Os main.c -o firmware.elf

Note: This adds runtime overhead and code size—use only in debug builds.

RTOS Task Stack Sizing

In RTOS environments, each task has its own stack, making the problem more complex.

FreeRTOS Stack Configuration

// FreeRTOS task creation
#define TASK_STACK_SIZE  512  // words (2048 bytes on 32-bit)

xTaskCreate(
    task_function,
    "TaskName",
    TASK_STACK_SIZE,  // ← How to determine this?
    NULL,
    tskIDLE_PRIORITY + 1,
    &task_handle
);

Methods for Estimating Task Stack Size

Task Stack requirement =
    Context save size (RTOS framework)
  + Task function's own stack usage
  + Stack usage of all possible called functions
  + Interrupt handling (if sharing stack)
  + Safety margin (typically 25-50%)

FreeRTOS Context Save Size (ARM Cortex-M example):

Architecture          Context Size (bytes)
──────────────────────────────────────────
ARM Cortex-M0         64
ARM Cortex-M3/M4      64 (no FPU) / 200 (with FPU)
ARM Cortex-M7         64 (no FPU) / 232 (with FPU)
RISC-V RV32           64-128 (depends on ABI)

FreeRTOS Stack Monitoring

FreeRTOS provides built-in stack high water mark functionality:

// Enable stack overflow detection
#define configCHECK_FOR_STACK_OVERFLOW  2

// Get task's stack usage
UBaseType_t stack_remaining = uxTaskGetStackHighWaterMark(task_handle);
printf("Task stack remaining: %u words\n", stack_remaining);

// Stack overflow hook (configCHECK_FOR_STACK_OVERFLOW >= 1)
void vApplicationStackOverflowHook(TaskHandle_t xTask, char *pcTaskName) {
    // Handle stack overflow
    panic("Stack overflow in task: %s\n", pcTaskName);
}

Practical Recommendations

Task Stack Sizing Best Practices:
──────────────────────────────────────────────────────────

1. Initial configuration: Start with generous space (e.g., 2 KB)

2. Measure actual usage:
   - Run system, trigger all possible execution paths
   - Use uxTaskGetStackHighWaterMark() to check

3. Adjust configuration:
   - Actual usage + 25-50% safety margin
   - Example: Measured 800 bytes → Configure 1024-1200 bytes

4. Continuous monitoring:
   - Keep stack checking in production builds
   - Periodically log stack usage statistics

Worst-Case Stack Depth (WCSD) Analysis

In safety-critical systems (automotive, medical, aerospace), you need to calculate Worst-Case Stack Depth (WCSD).

Call Graph Analysis

Build call graph, calculate deepest path:

main()           [64 bytes]
  ├── init()     [32 bytes]
  │     └── hal_init()  [128 bytes]
  └── loop()     [48 bytes]
        ├── read_sensor()  [96 bytes]
        │     └── spi_transfer()  [64 bytes]
        └── process()  [256 bytes]
              └── filter()  [128 bytes]

Deepest path 1: main → init → hal_init
            = 64 + 32 + 128 = 224 bytes

Deepest path 2: main → loop → process → filter
            = 64 + 48 + 256 + 128 = 496 bytes

WCSD = max(224, 496) = 496 bytes

Interrupt Impact

Must consider interrupts when calculating WCSD:

Application stack usage:        496 bytes
+ Interrupt handler:            128 bytes
+ Nested interrupt (if any):    128 bytes
────────────────────────────────────────────
Total WCSD:                     752 bytes

Tool Support

WCSD analysis tools:
──────────────────────────────────────────────────────────
Tool              Platform      Features
Polyspace         Commercial    Formal verification
PC-lint Plus      Commercial    Static analysis
StackAnalyzer     Commercial    Professional WCSD analysis
                  (AbsInt)
GCC -fstack-usage Open source   Basic, requires manual call chain calc

Common Pitfalls

1. Large Local Variables

void process_packet(void) {
    char buffer[4096];  // 4 KB local variable!
    // ...
}

Solution: Use static buffer or dynamic allocation

// Use static (moves to .bss, doesn't use stack)
static char buffer[4096];

// Or use heap (if system allows)
char *buffer = malloc(4096);

2. Recursive Functions

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);  // Depth = n
}

// factorial(1000) needs ~64 KB stack!

Solution: Convert to loop

int factorial(int n) {
    int result = 1;
    for (int i = 2; i <= n; i++) {
        result *= i;
    }
    return result;
}

3. Complex Logic in Interrupts

void __attribute__((interrupt)) timer_isr(void) {
    char log_buffer[512];  // Large buffer in ISR
    sprintf(log_buffer, "...");
    process_complex_data();  // Call complex function
}

Solution: ISRs should be as short as possible; defer complex logic to main loop or task

volatile bool timer_flag = false;

void __attribute__((interrupt)) timer_isr(void) {
    timer_flag = true;  // Just set flag
}

void main_loop(void) {
    if (timer_flag) {
        timer_flag = false;
        process_complex_data();  // Handle in normal context
    }
}

4. VLA and alloca

void process(size_t n) {
    int data[n];  // VLA - stack usage depends on runtime value
    // ...
}

void another(size_t size) {
    void *buf = alloca(size);  // Same problem
    // ...
}

Solution: Avoid VLA and alloca; use static buffers or heap

Summary

Stack overflow is the hardest bug to detect in embedded systems—silent corruption, random crashes
Static analysis:
- GCC -fstack-usage: Generates per-function stack usage report
- Limitation: Cannot handle recursion, function pointers, interrupt nesting
Dynamic measurement:
- Stack painting: Fill with pattern, check high water mark later
- Runtime monitoring: Check stack pointer at function entry
RTOS environments:
- Each task has independent stack, estimate separately
- FreeRTOS provides uxTaskGetStackHighWaterMark()
WCSD analysis: Safety-critical systems need worst-case stack depth calculation
Best practices:
- Avoid large local variables
- Avoid recursion
- Keep ISRs short
- Configure safety margin (25-50%)

Performance and Benchmarking