Chapter 19: Footprint Analysis Fundamentals

Part VI: Embedded Constraints


"In embedded systems, every byte counts—literally." — Jack Ganssle

The Missing 2 KB

Two weeks before mass production, our project war room reeked of espresso and anxiety.

The development team had just merged the final security patch, but the automated build server flashed an angry red warning:

region 'FLASH' overflowed by 2048 bytes

This was a 128 KB flash system, and our binary had grown to 130 KB. Those extra 2 KB stood like an insurmountable wall between us and product shipment.

Senior engineer Zhang immediately shouted: "Quick! Strip out all the debug strings from printf, and disable those unnecessary assert statements!"

Everyone scrambled through the source code, hunting for strings to delete. An hour later, the second build result arrived: only 400 bytes saved.

The team fell silent. Blind "intuition-based optimization" proved utterly powerless against hard memory constraints.

"We need data, not guesses." Junior performance engineer Ming broke the chaos.

Instead of rushing to delete code, he calmly ran size and nm --size-sort. In the detailed linker map file, he discovered the real "space killer" wasn't printf—it was a newly introduced third-party sensor driver.

That driver had inadvertently pulled in the floating-point emulation library, all because of a calibration routine that mistakenly used double for fewer than ten lines of data processing.

Through systematic analysis tools, the team fixed just two lines of code, converting floating-point to fixed-point arithmetic. The binary instantly shrank by 15 KB.

Optimizing footprint isn't a guessing game of "deleting code"—it's a precise science of measurement.


What is Footprint?

In embedded systems, footprint refers to the memory space a program occupies. Unlike desktop systems, embedded memory is a hard constraint—your firmware must fit into fixed-size flash and RAM.

Static vs Dynamic Footprint

Footprint can be categorized into two types:

┌─────────────────────────────────────────────────────────┐
│  Static Footprint (determined at compile time)          │
├─────────────────────────────────────────────────────────┤
│  .text    │ Machine code, instructions │ Stored in Flash │
│  .rodata  │ Constants, string literals │ Stored in Flash │
│  .data    │ Initialized globals        │ Flash → RAM     │
│  .bss     │ Uninitialized globals      │ RAM (zeroed)    │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  Dynamic Footprint (changes at runtime)                  │
├─────────────────────────────────────────────────────────┤
│  Stack    │ Local variables, call frames │ RAM           │
│  Heap     │ Dynamically allocated memory │ RAM           │
└─────────────────────────────────────────────────────────┘

Flash vs RAM Occupancy

Understanding the mapping between sections and memory is crucial:

Flash usage = .text + .rodata + .data (initial values)
RAM usage   = .data + .bss + stack + heap

Note that .data occupies both Flash (storing initial values) and RAM (runtime storage). This detail trips up many engineers.

Why "Won't Fit in Flash" Is So Common

Typical embedded system memory constraints:

Device Type       Flash      RAM
────────────────────────────────
Low-end MCU       32 KB      4 KB
Mid-range MCU     256 KB     64 KB
High-end MCU      1 MB       256 KB
Application CPU   Unlimited* 512 MB+

* Has file system and virtual memory

As features accumulate, code size easily grows unnoticed. A single "harmless" library reference might bring in tens of KB of hidden dependencies.


The Toolbox

Just as performance analysis needs profilers, footprint analysis requires specialized tools. Here are four essential tools for systems software engineers.

1. The size Command: Quick Overview

size is the most basic tool for quickly grasping a binary's overall structure:

$ riscv64-unknown-elf-size -A firmware.elf

section           size         addr
.text            0x4500   0x80000000
.rodata          0x0800   0x80004500
.data            0x0100   0x80004d00
.bss             0x0200   0x20000000
.stack           0x1000   0x20000200
Total            0x6000

Key metrics:

  • Flash usage: .text + .rodata + .data = 21,248 bytes
  • RAM usage: .data + .bss + .stack = 4,864 bytes

Pro tip: Use -A (System V format) instead of the default Berkeley format for more detailed section breakdown.

2. The nm Command: Symbol-Level Analysis

When you find a section is too large, dig into symbol level to find the culprit:

$ riscv64-unknown-elf-nm -S --size-sort -r firmware.elf | head -10

80001a20 000005d4 T core_process_loop
20000040 00000400 B network_buffer
800021f4 00000210 t parse_json_string
80002404 000001c8 T uart_send_buffer
...

Output interpretation:

  • Column 1: Symbol address
  • Column 2: Symbol size (bytes)
  • Column 3: Symbol type (T=text, B=bss, D=data)
  • Column 4: Symbol name

Pro tip: Filter by type, e.g., finding only large variables in RAM:

$ nm -S --size-sort -r firmware.elf | grep -E ' [BD] '

3. bloaty: Modern Footprint Analyzer

Bloaty McBloatface is an advanced footprint analysis tool from Google. It displays space distribution hierarchically and supports diff comparison between versions.

# Analyze by compile unit (source file)
$ bloaty firmware.elf -d compileunits

    VM SIZE                FILE SIZE
 --------------          --------------
  62.5%  5.15Ki tasks.c  62.5%  5.15Ki
  21.2%  1.75Ki queue.c  21.2%  1.75Ki
   8.5%    712B list.c    8.5%    712B
   7.8%    650B port.c    7.8%    650B

Version comparison (Diff)—bloaty's most powerful feature:

$ bloaty new_firmware.elf -- old_firmware.elf

     VM SIZE                     FILE SIZE
 --------------               --------------
  +15.2%   +2.1Ki .text       +15.2%   +2.1Ki
   [ = ]       0 .rodata       [ = ]       0
  +8.3%    +128B .bss         +8.3%    +128B
 --------------               --------------
  +12.1%  +2.2Ki TOTAL        +12.1%  +2.2Ki

This diff capability is especially useful in CI/CD—automatically compare footprint changes after each commit.

4. Linker Map File: The Ultimate Truth

The linker map file records how the compiler combines all object files into the final binary. It's the ultimate weapon for solving "where did the space go?" mysteries.

Generating a map file:

$ riscv64-unknown-elf-gcc main.o lib.o -Wl,-Map=output.map -o firmware.elf

Map file example:

.text.core_init
                0x0000000080000100       0x48 main.o
.text.uart_send
                0x0000000080000148       0x20 uart.o
 *fill*         0x0000000080000168       0x08
.text.process_data
                0x0000000080000170      0x120 process.o

Key observations:

  • *fill* indicates padding (alignment)—hidden space waste
  • You can trace each symbol back to its source object file
  • You can discover libraries that were accidentally linked in

Analysis Workflow

Establish a systematic analysis process instead of guessing by intuition:

Step 1: Baseline Measurement
        ↓
    $ size firmware.elf
    Record .text, .data, .bss sizes
        ↓
Step 2: Identify Heavy Hitters
        ↓
    $ nm -S --size-sort -r firmware.elf | head -20
    Find symbols consuming the most space
        ↓
Step 3: Trace Origins
        ↓
    Check linker map file
    Confirm which object files these symbols come from
        ↓
Step 4: Analyze Causes
        ↓
    - Is there an accidentally included library?
    - Are there unnecessary features being compiled in?
    - Are there oversized static buffers?
        ↓
Step 5: Verify Changes
        ↓
    $ bloaty new.elf -- old.elf
    Confirm changes actually reduced footprint

Common "Space Killers"

1. Floating-Point Library
   - Using float/double on MCUs without FPU
   - Even a single printf("%f") pulls in the entire float formatting library

2. Standard Library Functions
   - printf family: 10-20 KB
   - malloc/free: 1-5 KB
   - Consider newlib-nano or custom minimal versions

3. Oversized Static Buffers
   - char log_buffer[4096]; // Do you really need this big?

4. Unused Features
   - Referencing a library but only using a small part
   - Not enabling --gc-sections to remove dead code

Case Study: Tracing an Accidental Library Reference

Let's return to the opening story and reconstruct Ming's analysis process with tools.

Step 1: Discover the problem

$ size firmware_before.elf
   text    data     bss     dec     hex filename
 133120     256    4096  137472   21900 firmware_before.elf

Flash usage is 133,376 bytes (.text + .data), exceeding the 128 KB limit.

Step 2: Find the heavy hitters

$ nm -S --size-sort -r firmware_before.elf | head -5
80010000 00003a00 T __aeabi_ddiv
8000c600 00002800 T __aeabi_dmul
80009e00 00001c00 T __aeabi_dadd
80008200 00001c00 T __aeabi_dsub
80006600 00001400 T __aeabi_d2iz

These __aeabi_d* functions are software emulation for double floating-point operations! They total about 50 KB.

Step 3: Trace the origin

Search for these symbols' source in the linker map file:

$ grep -A1 "__aeabi_ddiv" output.map
__aeabi_ddiv
                0x80010000    0x3a00 libgcc.a(dp-bit.o)

It's libgcc's double-precision floating-point emulation.

Step 4: Find the caller

$ grep -r "double\|float" src/
src/drivers/sensor.c:42:    double calibrated = raw_value * 0.0125;

There it is! A simple calibration operation pulled in 50 KB of floating-point library.

Step 5: Fix it

Convert double operations to fixed-point:

// Before: pulls in 50 KB floating-point library
double calibrated = raw_value * 0.0125;

// After: fixed-point, 0 KB overhead
int32_t calibrated = (raw_value * 125) / 10000;

Step 6: Verify

$ bloaty firmware_after.elf -- firmware_before.elf

     VM SIZE                     FILE SIZE
 --------------               --------------
 -37.5%  -50.0Ki .text       -37.5%  -50.0Ki
   [ = ]       0 .data         [ = ]       0
   [ = ]       0 .bss          [ = ]       0
 --------------               --------------
 -37.5%  -50.0Ki TOTAL       -37.5%  -50.0Ki

Success—50 KB saved!


Summary

  • Footprint = memory space a program occupies, including code size (Flash) and data size (RAM)
  • Measurement tools:
    • size: Quick overview of section sizes
    • nm --size-sort: Find the largest symbols
    • bloaty: Hierarchical analysis and version comparison
    • Linker map file: Trace symbol origins
  • Analysis workflow: Baseline measurement → Identify heavy hitters → Trace origins → Analyze causes → Verify changes
  • Common pitfalls: Floating-point library, standard library functions, oversized static buffers, unremoved dead code
  • Core principle: Measure, don't guess