Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix F. Memory Model Quick Reference

RISC-V Weak Memory Ordering (RVWMO) Quick Reference


πŸ’‘ Usage Guide: This appendix is your β€œsafety manual” for multi-core synchronization. When you encounter mysterious bugs in lock-free code, check Memory Ordering here first.


πŸ”„ Producer-Consumer Synchronization Pattern (Copy-Paste Ready)

This is the most classic multi-core synchronization pattern, guaranteeing Consumer sees complete data written by Producer.

Producer (Core 0) - Write Side

# Producer: Write data, then set Flag
# s0 = data address, s1 = Flag address, t0 = data, t1 = Flag value

    sw      t0, 0(s0)       # 1. Write Data
    fence   w, w            # 2. Store-Store Fence: Ensure data written first
    sw      t1, 0(s1)       # 3. Write Flag (Ready = 1)

Explanation: fence w,w ensures β€œdata write” is visible to other cores before β€œFlag write”.

Consumer (Core 1) - Read Side

# Consumer: Wait for Flag, then read data
# s0 = data address, s1 = Flag address

wait_flag:
    lw      t1, 0(s1)       # 1. Read Flag
    beqz    t1, wait_flag   #    Wait for Flag to become Ready
    fence   r, r            # 2. Load-Load Fence: Ensure Flag seen before reading Data
    lw      t0, 0(s0)       # 3. Read Data

Explanation: fence r,r ensures β€œFlag read” completes before β€œData read”.

Complete C Example

// Shared variables
volatile int data = 0;
volatile int flag = 0;

// Producer (Core 0)
void producer(void) {
    data = 42;                          // Write data
    asm volatile ("fence w, w" ::: "memory");  // Store-Store Fence
    flag = 1;                           // Set Flag
}

// Consumer (Core 1)
int consumer(void) {
    while (flag == 0) { }               // Wait for Flag
    asm volatile ("fence r, r" ::: "memory");  // Load-Load Fence
    return data;                        // Read data (guaranteed to be 42)
}

πŸ“‹ FENCE Usage Quick Reference

ScenarioFENCE TypeDescription
Publish Datafence w, wEnsure data visible before Flag
Consume Datafence r, rEnsure Flag read before data
Release Lockfence rw, wEnsure Critical Section ops complete before Unlock
Acquire Lockfence r, rwEnsure ops after Lock don’t execute early
Full Barrierfence rw, rwStrongest Fence, no ops can cross
Self-Modify Codefence.iAfter modifying instructions, flush I-cache

πŸ” Spinlock Example (Using Atomics)

# acquire_lock: Use amoswap.w.aq to acquire lock
# a0 = lock address, t0 = 1 (locked), t1 = result
acquire_lock:
    li      t0, 1
retry:
    amoswap.w.aq t1, t0, (a0)   # Atomic swap with Acquire
    bnez    t1, retry           # If was 1 (locked), retry
    ret                         # Successfully acquired lock

# release_lock: Use amoswap.w.rl to release lock
# a0 = lock address
release_lock:
    amoswap.w.rl zero, zero, (a0)  # Atomic write 0 with Release
    ret

Explanation:

  • .aq (Acquire): Subsequent ops won’t be moved before Lock
  • .rl (Release): Previous ops won’t be moved after Unlock

⚠️ Common Pitfalls

Pitfall 1: Thinking volatile Is Enough

Misconception: C’s volatile guarantees Memory Ordering.

Truth: volatile only prevents compiler optimization, doesn’t guarantee CPU-level Memory Ordering.

// ❌ Wrong: Only volatile, may read stale data on multi-core
volatile int data = 0;
volatile int flag = 0;

// Producer
data = 42;
flag = 1;  // CPU may reorder so flag is visible first!

// βœ… Correct: Add fence
data = 42;
asm volatile ("fence w, w" ::: "memory");
flag = 1;

Pitfall 2: Fence in Wrong Position

Symptom: Spinlock looks correct, but still has Race Condition.

// ❌ Wrong: fence after unlock
critical_section();
unlock();
asm volatile ("fence rw, w" ::: "memory");  // Too late!

// βœ… Correct: fence before unlock (or use .rl)
critical_section();
asm volatile ("fence rw, w" ::: "memory");
unlock();

Pitfall 3: Forgetting fence.i

Symptom: JIT or Self-modifying code executes old instructions.

Cause: After modifying instructions, I-Cache still has old content.

// ❌ Wrong: No fence.i after code modification
memcpy(code_buffer, new_code, size);
((void (*)(void))code_buffer)();  // May execute old instructions!

// βœ… Correct: fence.i after modification
memcpy(code_buffer, new_code, size);
asm volatile ("fence.i" ::: "memory");
((void (*)(void))code_buffer)();  // Now executes new instructions

This appendix provides a quick reference for RISC-V’s memory model (RVWMO). Understanding memory ordering is essential for writing correct concurrent code on RISC-V.


F.1 Memory Ordering Basics

What Can Be Reordered?

RISC-V Weak Memory Ordering (RVWMO) allows extensive reordering:

ReorderingAllowed?Exception
Load β†’ Loadβœ“ YesSame address, or FENCE
Load β†’ Storeβœ“ YesSame address, or FENCE
Store β†’ Storeβœ“ YesSame address, or FENCE
Store β†’ Loadβœ“ YesSame address, or FENCE

Key Point: Almost everything can be reordered unless:

  1. Operations access the same address (overlapping)
  2. Operations are separated by a FENCE instruction
  3. Operations have data/control dependencies
  4. Operations use acquire/release atomics

Preserved Program Order (PPO)

Preserved Program Order is the subset of program order that MUST be respected:

  1. Overlapping addresses: SW to X, then LW from X β†’ always in order
  2. Explicit fences: Operations separated by FENCE β†’ always in order
  3. Acquire/Release: Atomic operations with .aq or .rl β†’ enforce ordering
  4. Dependencies: Data dependencies (e.g., LW then use result) β†’ always in order
  5. Control dependencies: Branch then dependent operation β†’ certain orderings preserved

F.2 FENCE Instruction Reference

FENCE Syntax

fence pred, succ
  • pred (predecessor): Operations before fence (r, w, or rw)
  • succ (successor): Operations after fence (r, w, or rw)

Common FENCE Variants

FENCEMeaningUse Case
fence rw, rwFull fenceStrongest barrier, orders everything
fence w, wStore-store fenceEnsure stores visible in order
fence r, rLoad-load fenceEnsure loads happen in order
fence r, rwAcquire fenceAfter acquiring lock
fence rw, wRelease fenceBefore releasing lock
fence.iInstruction fenceAfter code modification (JIT, self-modifying code)
fence.tsoTSO fencex86-compatible ordering

FENCE Examples

Full Fence (strongest):

sw a0, 0(s0)      # Store 1
fence rw, rw      # Full fence
lw t0, 0(s1)      # Load 1

All operations before fence complete before any operation after fence.

Store-Store Fence (publish pattern):

sw a0, 0(s0)      # Write data
fence w, w        # Ensure data written first
sw a1, 0(s1)      # Write flag

Ensures stores become visible in order.

Load-Load Fence (consume pattern):

lw t0, 0(s1)      # Read flag
fence r, r        # Ensure flag read first
lw t1, 0(s0)      # Read data

Ensures loads happen in order.

Acquire Fence (after lock acquisition):

lr.w.aq t0, (a0)  # Acquire lock (with .aq)
# OR
lw t0, 0(a0)      # Read lock
fence r, rw       # Acquire fence
# ... critical section ...

Prevents operations in critical section from moving before lock acquisition.

Release Fence (before lock release):

# ... critical section ...
fence rw, w       # Release fence
sw zero, 0(a0)    # Release lock

Prevents operations in critical section from moving after lock release.


F.3 Atomic Instructions

Load-Reserved / Store-Conditional (LR/SC)

Syntax:

lr.w rd, (rs1)           # Load-reserved word
lr.d rd, (rs1)           # Load-reserved doubleword
sc.w rd, rs2, (rs1)      # Store-conditional word
sc.d rd, rs2, (rs1)      # Store-conditional doubleword

Ordering Annotations:

  • .aq (acquire): No later operations can move before this
  • .rl (release): No earlier operations can move after this
  • .aqrl (both): Full ordering

Example: Atomic Increment

retry:
    lr.w t0, (a0)         # Load current value
    addi t0, t0, 1        # Increment
    sc.w t1, t0, (a0)     # Try to store
    bnez t1, retry        # Retry if failed (t1 != 0)

Example: Spinlock Acquire

acquire_lock:
    lr.w.aq t0, (a0)      # Load-reserved with acquire
    bnez t0, acquire_lock # If locked, retry
    li t1, 1
    sc.w.aq t2, t1, (a0)  # Try to acquire
    bnez t2, acquire_lock # Retry if failed

Atomic Memory Operations (AMO)

Syntax:

amoswap.w rd, rs2, (rs1)   # Atomic swap
amoadd.w rd, rs2, (rs1)    # Atomic add
amoand.w rd, rs2, (rs1)    # Atomic AND
amoor.w rd, rs2, (rs1)     # Atomic OR
amoxor.w rd, rs2, (rs1)    # Atomic XOR
amomax.w rd, rs2, (rs1)    # Atomic max (signed)
amomaxu.w rd, rs2, (rs1)   # Atomic max (unsigned)
amomin.w rd, rs2, (rs1)    # Atomic min (signed)
amominu.w rd, rs2, (rs1)   # Atomic min (unsigned)

Ordering Annotations: Same as LR/SC (.aq, .rl, .aqrl)

Example: Spinlock with AMOSWAP

acquire_lock:
    li t0, 1
    amoswap.w.aq t1, t0, (a0)  # Swap 1 into lock, get old value
    bnez t1, acquire_lock       # If old value != 0, retry

release_lock:
    amoswap.w.rl zero, zero, (a0)  # Swap 0 into lock (release)

F.4 Common Synchronization Patterns

Pattern 1: Message Passing

Problem: Producer writes data, then sets flag. Consumer waits for flag, then reads data.

Solution:

# Producer (Hart 0)
    sw a0, 0(s0)      # Write data
    fence w, w        # Ensure data written before flag
    sw a1, 0(s1)      # Write flag = 1

# Consumer (Hart 1)
loop:
    lw t0, 0(s1)      # Read flag
    beqz t0, loop     # Wait for flag
    fence r, r        # Ensure flag read before data
    lw t1, 0(s0)      # Read data

Why fences are needed:

  • Without fence w, w: Flag might be visible before data
  • Without fence r, r: Data might be read before flag is checked

Pattern 2: Spinlock (LR/SC)

Acquire:

acquire_lock:
    lr.w.aq t0, (a0)      # Load-reserved with acquire
    bnez t0, acquire_lock # If locked, retry
    li t1, 1
    sc.w.aq t2, t1, (a0)  # Try to set lock
    bnez t2, acquire_lock # Retry if SC failed
    # Lock acquired, critical section follows

Release:

    # Critical section
    amoswap.w.rl zero, zero, (a0)  # Release lock
    # OR
    fence rw, w
    sw zero, 0(a0)

Pattern 3: Spinlock (AMOSWAP)

Acquire:

acquire_lock:
    li t0, 1
    amoswap.w.aq t1, t0, (a0)  # Atomic swap
    bnez t1, acquire_lock       # If old value != 0, retry
    # Lock acquired

Release:

    amoswap.w.rl zero, zero, (a0)  # Release lock

Pattern 4: Dekker’s Algorithm (Mutual Exclusion)

Hart 0:

    li t0, 1
    sw t0, flag0       # flag0 = 1
    fence w, rw        # Ensure flag0 visible before reading flag1
    lw t1, flag1       # Read flag1
    bnez t1, wait      # If flag1 set, wait
    # Critical section
    fence rw, w        # Ensure critical section done
    sw zero, flag0     # flag0 = 0

Hart 1: (symmetric, swap flag0 and flag1)


Pattern 5: Producer-Consumer Queue

Producer:

    # Write data to queue[tail]
    sw a0, 0(s0)

    # Increment tail
    fence w, w         # Ensure data written before tail update
    addi s1, s1, 1
    sw s1, tail_ptr

Consumer:

    # Read tail
    lw t0, tail_ptr
    lw t1, head_ptr
    beq t0, t1, empty  # If tail == head, queue empty

    # Read data from queue[head]
    fence r, r         # Ensure tail read before data
    lw a0, 0(s2)

    # Increment head
    addi s2, s2, 1
    sw s2, head_ptr

Pattern 6: Barrier (N threads)

Barrier Wait:

barrier_wait:
    # Increment counter atomically
    li t0, 1
    amoadd.w.aq t1, t0, (a0)  # counter++, get old value
    addi t1, t1, 1             # t1 = new counter value

    # Check if all threads arrived
    li t2, N                   # N = number of threads
    bne t1, t2, spin           # If not all arrived, spin

    # Reset counter for next barrier
    amoswap.w.rl zero, zero, (a0)
    ret

spin:
    lw t3, 0(a0)
    bne t3, t2, spin
    ret

F.5 Memory Model Comparison

RVWMO vs Other Models

ModelStrengthReordering AllowedFence Overhead
Sequential ConsistencyStrongestNoneN/A (no reordering)
x86 TSOStrongStore→Load onlyLow (implicit)
ARMWeakExtensiveMedium
RISC-V RVWMOWeakExtensiveMedium
RISC-V RVTSOStrongStore→Load onlyLow

Ordering Guarantees

Operation PairRISC-V RVWMOx86 TSOARMSC
Load β†’ Loadβœ—βœ“βœ—βœ“
Load β†’ Storeβœ—βœ“βœ—βœ“
Store β†’ Storeβœ—βœ“βœ—βœ“
Store β†’ Loadβœ—βœ—βœ—βœ“

βœ“ = Ordered by default βœ— = Can be reordered (need fence)


F.6 FENCE Equivalents Across Architectures

RISC-Vx86ARMPurpose
fence rw, rwMFENCEDMB SYFull barrier
fence w, wSFENCEDMB STStore barrier
fence r, rLFENCEDMB LDLoad barrier
fence r, rw(implicit)DMB LDAcquire
fence rw, w(implicit)DMB STRelease
fence.i(implicit)ISBInstruction sync
fence.tso(implicit)-TSO ordering

F.7 Acquire/Release Semantics

Acquire Semantics

Meaning: No memory operations after the acquire can move before it.

Use Case: After acquiring a lock, before accessing protected data.

Implementation:

# Option 1: Atomic with .aq
lr.w.aq t0, (a0)

# Option 2: Load + fence
lw t0, 0(a0)
fence r, rw

Release Semantics

Meaning: No memory operations before the release can move after it.

Use Case: After accessing protected data, before releasing a lock.

Implementation:

# Option 1: Atomic with .rl
amoswap.w.rl zero, zero, (a0)

# Option 2: Fence + store
fence rw, w
sw zero, 0(a0)

Acquire-Release Pair

Complete Lock Example:

# Acquire
acquire:
    lr.w.aq t0, (a0)
    bnez t0, acquire
    li t1, 1
    sc.w.aq t2, t1, (a0)
    bnez t2, acquire

# Critical section
    # ... protected operations ...

# Release
    amoswap.w.rl zero, zero, (a0)

F.8 Common Pitfalls

Pitfall 1: Missing Fences

Wrong:

# Producer
sw a0, 0(s0)      # Write data
sw a1, 0(s1)      # Write flag

# Consumer
lw t0, 0(s1)      # Read flag
lw t1, 0(s0)      # Read data (might be stale!)

Correct:

# Producer
sw a0, 0(s0)
fence w, w        # Add fence!
sw a1, 0(s1)

# Consumer
lw t0, 0(s1)
fence r, r        # Add fence!
lw t1, 0(s0)

Pitfall 2: Wrong Fence Type

Wrong (using load-load fence for release):

# Critical section
fence r, r        # Wrong! Doesn't order stores
sw zero, 0(a0)    # Release lock

Correct:

# Critical section
fence rw, w       # Correct! Orders all ops before stores
sw zero, 0(a0)

Pitfall 3: Forgetting .aq/.rl on Atomics

Wrong:

lr.w t0, (a0)     # Missing .aq!
# Critical section
amoswap.w zero, zero, (a0)  # Missing .rl!

Correct:

lr.w.aq t0, (a0)
# Critical section
amoswap.w.rl zero, zero, (a0)

Pitfall 4: Data Race

Wrong (no synchronization):

# Hart 0
sw a0, 0(s0)      # Write shared variable

# Hart 1
lw t0, 0(s0)      # Read shared variable (DATA RACE!)

Correct (use lock or atomic):

# Hart 0
# ... acquire lock ...
sw a0, 0(s0)
# ... release lock ...

# Hart 1
# ... acquire lock ...
lw t0, 0(s0)
# ... release lock ...

F.9 Quick Decision Tree

Do I need a fence?

Are multiple harts accessing shared memory?
β”œβ”€ No β†’ No fence needed
└─ Yes β†’ Continue

Are the accesses synchronized (locks, atomics)?
β”œβ”€ Yes β†’ Fence included in lock/atomic
└─ No β†’ Continue

Are the accesses to the same address?
β”œβ”€ Yes β†’ No fence needed (hardware preserves order)
└─ No β†’ FENCE REQUIRED!

What type of fence?
β”œβ”€ Publishing data? β†’ fence w, w
β”œβ”€ Consuming data? β†’ fence r, r
β”œβ”€ Acquiring lock? β†’ fence r, rw (or .aq)
β”œβ”€ Releasing lock? β†’ fence rw, w (or .rl)
└─ Not sure? β†’ fence rw, rw (full fence)

F.10 References

  • RISC-V Memory Model Specification: Chapter 14 of RISC-V ISA Manual
  • RVWMO Formal Specification: https://github.com/riscv/riscv-isa-manual
  • Memory Model Tools: herd7, rmem (for verification)
  • Linux Kernel Memory Barriers: Documentation/memory-barriers.txt