Chapter 21: Stack Analysis and Estimation
Part VI: Embedded Constraints
"Stack overflow is the most insidious bug in embedded systems—it corrupts silently and strikes randomly." — Miro Samek
The Unpredictable Crash
This is a classic embedded nightmare story.
The product had shipped. It was running smoothly. Customers were happy. Then suddenly, one customer reported: "The system randomly reboots. No pattern whatsoever."
The team checked all logs—no error messages. Checked power—stable. Checked temperature—normal range. The problem was like a ghost, impossible to reproduce.
Three weeks later, the issue was narrowed down to a specific usage scenario: when a user rapidly pressed three buttons in sequence, the system had a 30% chance of rebooting.
An engineer finally captured a crash dump and found the program counter (PC) pointing to a completely meaningless address.
The answer: Stack overflow.
The button interrupt handler had nested calls three levels deep into decoding logic, combined with a 512-byte local variable buffer. This combination only occurred under a specific sequence of operations that was never tested during development.
Stack overflow doesn't throw an exception. It doesn't leave an error message. It silently overwrites other memory regions, causing the system to crash at unpredictable times.
This chapter teaches you how to analyze, estimate, and monitor stack usage—before the problem becomes a field failure.
Stack Basics
What is the Stack?
The stack is a contiguous memory region used to store:
Stack contents:
────────────────────────────────────────
1. Function return addresses
2. Local variables
3. Function parameters (some calling conventions)
4. Saved register values (caller/callee saved)
5. Interrupt handler context
Stack Growth Direction
Most architectures have stacks that grow downward:
High addr ┌───────────────────────┐
│ Stack start │ ← Initial SP value
├───────────────────────┤
│ main() frame │
├───────────────────────┤
│ func_a() frame │
├───────────────────────┤
│ func_b() frame │ ← Current SP
├───────────────────────┤
│ │
│ Unused stack space │
│ │
├───────────────────────┤
│ Stack guard/bottom │
Low addr └───────────────────────┘
Why Stack Size Is Hard to Estimate
Static footprint problem:
─────────────────────────────────────────────────────────
.text, .data, .bss → Fully determined at compile time
Stack → Only know actual usage at runtime
Difficulties:
1. Call depth varies dynamically
2. Recursion depth
3. Function pointer calls
4. Interrupt nesting
5. Conditional local variables (different paths use different sizes)
Static Analysis Tools
GCC's -fstack-usage
GCC can generate stack usage reports for each function:
$ riscv64-unknown-elf-gcc -fstack-usage -Os -c main.c
$ cat main.su
main.c:10:5:main 64 static
main.c:25:6:process_data 256 static
main.c:50:6:handle_interrupt 128 static
Output format:
file:line:column:function_name stack_usage(bytes) type
Type descriptions:
- static: Determinable at compile time
- dynamic: Uses VLA or alloca, size unknown
- bounded: Has upper limit but cannot be precisely determined
Pro tip: Find the largest stack users
$ cat *.su | sort -t$'\t' -k2 -nr | head -10
drivers/usb.c:120:6:usb_parse_descriptor 1024 static
app/json.c:45:6:parse_json_object 512 static
app/protocol.c:80:6:decode_frame 384 static
...
Checkstack Script
The Linux kernel provides a checkstack script that analyzes stack usage directly from assembly:
# Generate assembly and analyze
$ riscv64-unknown-elf-objdump -d firmware.elf | ./checkstack.pl riscv
0x80001234 usb_parse_descriptor [firmware.elf]: 1024
0x80002468 parse_json_object [firmware.elf]: 512
0x80003690 decode_frame [firmware.elf]: 384
Limitations
Static analysis has its limits:
Static analysis can handle:
✅ Fixed-size local variables
✅ Direct function call chains
✅ Call depth known at compile time
Static analysis cannot handle:
❌ Recursion depth (unless explicitly bounded)
❌ Function pointer calls
❌ Interrupt nesting levels
❌ VLA (Variable Length Array)
❌ alloca()
Dynamic Measurement Methods
When static analysis is insufficient, we need dynamic measurement.
Stack Painting
This is the classic stack usage measurement method:
#define STACK_PATTERN 0xDEADBEEF
// Fill stack region with pattern at startup
void stack_paint(void) {
extern uint32_t _stack_start; // Defined in linker script
extern uint32_t _stack_end;
uint32_t *p = &_stack_start;
while (p < &_stack_end) {
*p++ = STACK_PATTERN;
}
}
// Measure stack high water mark at any time
size_t stack_get_high_water_mark(void) {
extern uint32_t _stack_start;
extern uint32_t _stack_end;
uint32_t *p = &_stack_start;
while (p < &_stack_end && *p == STACK_PATTERN) {
p++;
}
return ((size_t)&_stack_end - (size_t)p);
}
Usage:
int main(void) {
stack_paint(); // Paint at startup
// ... run application ...
// Check periodically
size_t used = stack_get_high_water_mark();
printf("Stack high water mark: %zu bytes\n", used);
}
Important: This method only measures "maximum ever used"—it cannot guarantee future usage won't exceed this value.
Runtime Stack Monitoring
Add stack overflow detection in debug builds:
// Check stack at each function entry
void __attribute__((no_instrument_function))
stack_check(void) {
extern uint32_t _stack_start;
register uint32_t sp asm("sp");
if (sp < (uint32_t)&_stack_start + STACK_GUARD_SIZE) {
// Stack overflow detected!
panic("Stack overflow!");
}
}
// GCC's -finstrument-functions can auto-insert
void __attribute__((no_instrument_function))
__cyg_profile_func_enter(void *this_fn, void *call_site) {
stack_check();
}
GCC compile option:
# Auto-insert hooks at function entry/exit
$ gcc -finstrument-functions -Os main.c -o firmware.elf
Note: This adds runtime overhead and code size—use only in debug builds.
RTOS Task Stack Sizing
In RTOS environments, each task has its own stack, making the problem more complex.
FreeRTOS Stack Configuration
// FreeRTOS task creation
#define TASK_STACK_SIZE 512 // words (2048 bytes on 32-bit)
xTaskCreate(
task_function,
"TaskName",
TASK_STACK_SIZE, // ← How to determine this?
NULL,
tskIDLE_PRIORITY + 1,
&task_handle
);
Methods for Estimating Task Stack Size
Task Stack requirement =
Context save size (RTOS framework)
+ Task function's own stack usage
+ Stack usage of all possible called functions
+ Interrupt handling (if sharing stack)
+ Safety margin (typically 25-50%)
FreeRTOS Context Save Size (ARM Cortex-M example):
Architecture Context Size (bytes)
──────────────────────────────────────────
ARM Cortex-M0 64
ARM Cortex-M3/M4 64 (no FPU) / 200 (with FPU)
ARM Cortex-M7 64 (no FPU) / 232 (with FPU)
RISC-V RV32 64-128 (depends on ABI)
FreeRTOS Stack Monitoring
FreeRTOS provides built-in stack high water mark functionality:
// Enable stack overflow detection
#define configCHECK_FOR_STACK_OVERFLOW 2
// Get task's stack usage
UBaseType_t stack_remaining = uxTaskGetStackHighWaterMark(task_handle);
printf("Task stack remaining: %u words\n", stack_remaining);
// Stack overflow hook (configCHECK_FOR_STACK_OVERFLOW >= 1)
void vApplicationStackOverflowHook(TaskHandle_t xTask, char *pcTaskName) {
// Handle stack overflow
panic("Stack overflow in task: %s\n", pcTaskName);
}
Practical Recommendations
Task Stack Sizing Best Practices:
──────────────────────────────────────────────────────────
1. Initial configuration: Start with generous space (e.g., 2 KB)
2. Measure actual usage:
- Run system, trigger all possible execution paths
- Use uxTaskGetStackHighWaterMark() to check
3. Adjust configuration:
- Actual usage + 25-50% safety margin
- Example: Measured 800 bytes → Configure 1024-1200 bytes
4. Continuous monitoring:
- Keep stack checking in production builds
- Periodically log stack usage statistics
Worst-Case Stack Depth (WCSD) Analysis
In safety-critical systems (automotive, medical, aerospace), you need to calculate Worst-Case Stack Depth (WCSD).
Call Graph Analysis
Build call graph, calculate deepest path:
main() [64 bytes]
├── init() [32 bytes]
│ └── hal_init() [128 bytes]
└── loop() [48 bytes]
├── read_sensor() [96 bytes]
│ └── spi_transfer() [64 bytes]
└── process() [256 bytes]
└── filter() [128 bytes]
Deepest path 1: main → init → hal_init
= 64 + 32 + 128 = 224 bytes
Deepest path 2: main → loop → process → filter
= 64 + 48 + 256 + 128 = 496 bytes
WCSD = max(224, 496) = 496 bytes
Interrupt Impact
Must consider interrupts when calculating WCSD:
Application stack usage: 496 bytes
+ Interrupt handler: 128 bytes
+ Nested interrupt (if any): 128 bytes
────────────────────────────────────────────
Total WCSD: 752 bytes
Tool Support
WCSD analysis tools:
──────────────────────────────────────────────────────────
Tool Platform Features
Polyspace Commercial Formal verification
PC-lint Plus Commercial Static analysis
StackAnalyzer Commercial Professional WCSD analysis
(AbsInt)
GCC -fstack-usage Open source Basic, requires manual call chain calc
Common Pitfalls
1. Large Local Variables
void process_packet(void) {
char buffer[4096]; // 4 KB local variable!
// ...
}
Solution: Use static buffer or dynamic allocation
// Use static (moves to .bss, doesn't use stack)
static char buffer[4096];
// Or use heap (if system allows)
char *buffer = malloc(4096);
2. Recursive Functions
int factorial(int n) {
if (n <= 1) return 1;
return n * factorial(n - 1); // Depth = n
}
// factorial(1000) needs ~64 KB stack!
Solution: Convert to loop
int factorial(int n) {
int result = 1;
for (int i = 2; i <= n; i++) {
result *= i;
}
return result;
}
3. Complex Logic in Interrupts
void __attribute__((interrupt)) timer_isr(void) {
char log_buffer[512]; // Large buffer in ISR
sprintf(log_buffer, "...");
process_complex_data(); // Call complex function
}
Solution: ISRs should be as short as possible; defer complex logic to main loop or task
volatile bool timer_flag = false;
void __attribute__((interrupt)) timer_isr(void) {
timer_flag = true; // Just set flag
}
void main_loop(void) {
if (timer_flag) {
timer_flag = false;
process_complex_data(); // Handle in normal context
}
}
4. VLA and alloca
void process(size_t n) {
int data[n]; // VLA - stack usage depends on runtime value
// ...
}
void another(size_t size) {
void *buf = alloca(size); // Same problem
// ...
}
Solution: Avoid VLA and alloca; use static buffers or heap
Summary
- Stack overflow is the hardest bug to detect in embedded systems—silent corruption, random crashes
- Static analysis:
- GCC
-fstack-usage: Generates per-function stack usage report - Limitation: Cannot handle recursion, function pointers, interrupt nesting
- GCC
- Dynamic measurement:
- Stack painting: Fill with pattern, check high water mark later
- Runtime monitoring: Check stack pointer at function entry
- RTOS environments:
- Each task has independent stack, estimate separately
- FreeRTOS provides
uxTaskGetStackHighWaterMark()
- WCSD analysis: Safety-critical systems need worst-case stack depth calculation
- Best practices:
- Avoid large local variables
- Avoid recursion
- Keep ISRs short
- Configure safety margin (25-50%)