Appendix F. Memory Model Quick Reference
RISC-V Weak Memory Ordering (RVWMO) Quick Reference
π‘ Usage Guide: This appendix is your βsafety manualβ for multi-core synchronization. When you encounter mysterious bugs in lock-free code, check Memory Ordering here first.
π Producer-Consumer Synchronization Pattern (Copy-Paste Ready)
This is the most classic multi-core synchronization pattern, guaranteeing Consumer sees complete data written by Producer.
Producer (Core 0) - Write Side
# Producer: Write data, then set Flag
# s0 = data address, s1 = Flag address, t0 = data, t1 = Flag value
sw t0, 0(s0) # 1. Write Data
fence w, w # 2. Store-Store Fence: Ensure data written first
sw t1, 0(s1) # 3. Write Flag (Ready = 1)
Explanation: fence w,w ensures βdata writeβ is visible to other cores before βFlag writeβ.
Consumer (Core 1) - Read Side
# Consumer: Wait for Flag, then read data
# s0 = data address, s1 = Flag address
wait_flag:
lw t1, 0(s1) # 1. Read Flag
beqz t1, wait_flag # Wait for Flag to become Ready
fence r, r # 2. Load-Load Fence: Ensure Flag seen before reading Data
lw t0, 0(s0) # 3. Read Data
Explanation: fence r,r ensures βFlag readβ completes before βData readβ.
Complete C Example
// Shared variables
volatile int data = 0;
volatile int flag = 0;
// Producer (Core 0)
void producer(void) {
data = 42; // Write data
asm volatile ("fence w, w" ::: "memory"); // Store-Store Fence
flag = 1; // Set Flag
}
// Consumer (Core 1)
int consumer(void) {
while (flag == 0) { } // Wait for Flag
asm volatile ("fence r, r" ::: "memory"); // Load-Load Fence
return data; // Read data (guaranteed to be 42)
}
π FENCE Usage Quick Reference
| Scenario | FENCE Type | Description |
|---|---|---|
| Publish Data | fence w, w | Ensure data visible before Flag |
| Consume Data | fence r, r | Ensure Flag read before data |
| Release Lock | fence rw, w | Ensure Critical Section ops complete before Unlock |
| Acquire Lock | fence r, rw | Ensure ops after Lock donβt execute early |
| Full Barrier | fence rw, rw | Strongest Fence, no ops can cross |
| Self-Modify Code | fence.i | After modifying instructions, flush I-cache |
π Spinlock Example (Using Atomics)
# acquire_lock: Use amoswap.w.aq to acquire lock
# a0 = lock address, t0 = 1 (locked), t1 = result
acquire_lock:
li t0, 1
retry:
amoswap.w.aq t1, t0, (a0) # Atomic swap with Acquire
bnez t1, retry # If was 1 (locked), retry
ret # Successfully acquired lock
# release_lock: Use amoswap.w.rl to release lock
# a0 = lock address
release_lock:
amoswap.w.rl zero, zero, (a0) # Atomic write 0 with Release
ret
Explanation:
.aq(Acquire): Subsequent ops wonβt be moved before Lock.rl(Release): Previous ops wonβt be moved after Unlock
β οΈ Common Pitfalls
Pitfall 1: Thinking volatile Is Enough
Misconception: Cβs volatile guarantees Memory Ordering.
Truth: volatile only prevents compiler optimization, doesnβt guarantee CPU-level Memory Ordering.
// β Wrong: Only volatile, may read stale data on multi-core
volatile int data = 0;
volatile int flag = 0;
// Producer
data = 42;
flag = 1; // CPU may reorder so flag is visible first!
// β
Correct: Add fence
data = 42;
asm volatile ("fence w, w" ::: "memory");
flag = 1;
Pitfall 2: Fence in Wrong Position
Symptom: Spinlock looks correct, but still has Race Condition.
// β Wrong: fence after unlock
critical_section();
unlock();
asm volatile ("fence rw, w" ::: "memory"); // Too late!
// β
Correct: fence before unlock (or use .rl)
critical_section();
asm volatile ("fence rw, w" ::: "memory");
unlock();
Pitfall 3: Forgetting fence.i
Symptom: JIT or Self-modifying code executes old instructions.
Cause: After modifying instructions, I-Cache still has old content.
// β Wrong: No fence.i after code modification
memcpy(code_buffer, new_code, size);
((void (*)(void))code_buffer)(); // May execute old instructions!
// β
Correct: fence.i after modification
memcpy(code_buffer, new_code, size);
asm volatile ("fence.i" ::: "memory");
((void (*)(void))code_buffer)(); // Now executes new instructions
This appendix provides a quick reference for RISC-Vβs memory model (RVWMO). Understanding memory ordering is essential for writing correct concurrent code on RISC-V.
F.1 Memory Ordering Basics
What Can Be Reordered?
RISC-V Weak Memory Ordering (RVWMO) allows extensive reordering:
| Reordering | Allowed? | Exception |
|---|---|---|
| Load β Load | β Yes | Same address, or FENCE |
| Load β Store | β Yes | Same address, or FENCE |
| Store β Store | β Yes | Same address, or FENCE |
| Store β Load | β Yes | Same address, or FENCE |
Key Point: Almost everything can be reordered unless:
- Operations access the same address (overlapping)
- Operations are separated by a FENCE instruction
- Operations have data/control dependencies
- Operations use acquire/release atomics
Preserved Program Order (PPO)
Preserved Program Order is the subset of program order that MUST be respected:
- Overlapping addresses:
SWto X, thenLWfrom X β always in order - Explicit fences: Operations separated by
FENCEβ always in order - Acquire/Release: Atomic operations with
.aqor.rlβ enforce ordering - Dependencies: Data dependencies (e.g.,
LWthen use result) β always in order - Control dependencies: Branch then dependent operation β certain orderings preserved
F.2 FENCE Instruction Reference
FENCE Syntax
fence pred, succ
- pred (predecessor): Operations before fence (r, w, or rw)
- succ (successor): Operations after fence (r, w, or rw)
Common FENCE Variants
| FENCE | Meaning | Use Case |
|---|---|---|
fence rw, rw | Full fence | Strongest barrier, orders everything |
fence w, w | Store-store fence | Ensure stores visible in order |
fence r, r | Load-load fence | Ensure loads happen in order |
fence r, rw | Acquire fence | After acquiring lock |
fence rw, w | Release fence | Before releasing lock |
fence.i | Instruction fence | After code modification (JIT, self-modifying code) |
fence.tso | TSO fence | x86-compatible ordering |
FENCE Examples
Full Fence (strongest):
sw a0, 0(s0) # Store 1
fence rw, rw # Full fence
lw t0, 0(s1) # Load 1
All operations before fence complete before any operation after fence.
Store-Store Fence (publish pattern):
sw a0, 0(s0) # Write data
fence w, w # Ensure data written first
sw a1, 0(s1) # Write flag
Ensures stores become visible in order.
Load-Load Fence (consume pattern):
lw t0, 0(s1) # Read flag
fence r, r # Ensure flag read first
lw t1, 0(s0) # Read data
Ensures loads happen in order.
Acquire Fence (after lock acquisition):
lr.w.aq t0, (a0) # Acquire lock (with .aq)
# OR
lw t0, 0(a0) # Read lock
fence r, rw # Acquire fence
# ... critical section ...
Prevents operations in critical section from moving before lock acquisition.
Release Fence (before lock release):
# ... critical section ...
fence rw, w # Release fence
sw zero, 0(a0) # Release lock
Prevents operations in critical section from moving after lock release.
F.3 Atomic Instructions
Load-Reserved / Store-Conditional (LR/SC)
Syntax:
lr.w rd, (rs1) # Load-reserved word
lr.d rd, (rs1) # Load-reserved doubleword
sc.w rd, rs2, (rs1) # Store-conditional word
sc.d rd, rs2, (rs1) # Store-conditional doubleword
Ordering Annotations:
.aq(acquire): No later operations can move before this.rl(release): No earlier operations can move after this.aqrl(both): Full ordering
Example: Atomic Increment
retry:
lr.w t0, (a0) # Load current value
addi t0, t0, 1 # Increment
sc.w t1, t0, (a0) # Try to store
bnez t1, retry # Retry if failed (t1 != 0)
Example: Spinlock Acquire
acquire_lock:
lr.w.aq t0, (a0) # Load-reserved with acquire
bnez t0, acquire_lock # If locked, retry
li t1, 1
sc.w.aq t2, t1, (a0) # Try to acquire
bnez t2, acquire_lock # Retry if failed
Atomic Memory Operations (AMO)
Syntax:
amoswap.w rd, rs2, (rs1) # Atomic swap
amoadd.w rd, rs2, (rs1) # Atomic add
amoand.w rd, rs2, (rs1) # Atomic AND
amoor.w rd, rs2, (rs1) # Atomic OR
amoxor.w rd, rs2, (rs1) # Atomic XOR
amomax.w rd, rs2, (rs1) # Atomic max (signed)
amomaxu.w rd, rs2, (rs1) # Atomic max (unsigned)
amomin.w rd, rs2, (rs1) # Atomic min (signed)
amominu.w rd, rs2, (rs1) # Atomic min (unsigned)
Ordering Annotations: Same as LR/SC (.aq, .rl, .aqrl)
Example: Spinlock with AMOSWAP
acquire_lock:
li t0, 1
amoswap.w.aq t1, t0, (a0) # Swap 1 into lock, get old value
bnez t1, acquire_lock # If old value != 0, retry
release_lock:
amoswap.w.rl zero, zero, (a0) # Swap 0 into lock (release)
F.4 Common Synchronization Patterns
Pattern 1: Message Passing
Problem: Producer writes data, then sets flag. Consumer waits for flag, then reads data.
Solution:
# Producer (Hart 0)
sw a0, 0(s0) # Write data
fence w, w # Ensure data written before flag
sw a1, 0(s1) # Write flag = 1
# Consumer (Hart 1)
loop:
lw t0, 0(s1) # Read flag
beqz t0, loop # Wait for flag
fence r, r # Ensure flag read before data
lw t1, 0(s0) # Read data
Why fences are needed:
- Without
fence w, w: Flag might be visible before data - Without
fence r, r: Data might be read before flag is checked
Pattern 2: Spinlock (LR/SC)
Acquire:
acquire_lock:
lr.w.aq t0, (a0) # Load-reserved with acquire
bnez t0, acquire_lock # If locked, retry
li t1, 1
sc.w.aq t2, t1, (a0) # Try to set lock
bnez t2, acquire_lock # Retry if SC failed
# Lock acquired, critical section follows
Release:
# Critical section
amoswap.w.rl zero, zero, (a0) # Release lock
# OR
fence rw, w
sw zero, 0(a0)
Pattern 3: Spinlock (AMOSWAP)
Acquire:
acquire_lock:
li t0, 1
amoswap.w.aq t1, t0, (a0) # Atomic swap
bnez t1, acquire_lock # If old value != 0, retry
# Lock acquired
Release:
amoswap.w.rl zero, zero, (a0) # Release lock
Pattern 4: Dekkerβs Algorithm (Mutual Exclusion)
Hart 0:
li t0, 1
sw t0, flag0 # flag0 = 1
fence w, rw # Ensure flag0 visible before reading flag1
lw t1, flag1 # Read flag1
bnez t1, wait # If flag1 set, wait
# Critical section
fence rw, w # Ensure critical section done
sw zero, flag0 # flag0 = 0
Hart 1: (symmetric, swap flag0 and flag1)
Pattern 5: Producer-Consumer Queue
Producer:
# Write data to queue[tail]
sw a0, 0(s0)
# Increment tail
fence w, w # Ensure data written before tail update
addi s1, s1, 1
sw s1, tail_ptr
Consumer:
# Read tail
lw t0, tail_ptr
lw t1, head_ptr
beq t0, t1, empty # If tail == head, queue empty
# Read data from queue[head]
fence r, r # Ensure tail read before data
lw a0, 0(s2)
# Increment head
addi s2, s2, 1
sw s2, head_ptr
Pattern 6: Barrier (N threads)
Barrier Wait:
barrier_wait:
# Increment counter atomically
li t0, 1
amoadd.w.aq t1, t0, (a0) # counter++, get old value
addi t1, t1, 1 # t1 = new counter value
# Check if all threads arrived
li t2, N # N = number of threads
bne t1, t2, spin # If not all arrived, spin
# Reset counter for next barrier
amoswap.w.rl zero, zero, (a0)
ret
spin:
lw t3, 0(a0)
bne t3, t2, spin
ret
F.5 Memory Model Comparison
RVWMO vs Other Models
| Model | Strength | Reordering Allowed | Fence Overhead |
|---|---|---|---|
| Sequential Consistency | Strongest | None | N/A (no reordering) |
| x86 TSO | Strong | StoreβLoad only | Low (implicit) |
| ARM | Weak | Extensive | Medium |
| RISC-V RVWMO | Weak | Extensive | Medium |
| RISC-V RVTSO | Strong | StoreβLoad only | Low |
Ordering Guarantees
| Operation Pair | RISC-V RVWMO | x86 TSO | ARM | SC |
|---|---|---|---|---|
| Load β Load | β | β | β | β |
| Load β Store | β | β | β | β |
| Store β Store | β | β | β | β |
| Store β Load | β | β | β | β |
β = Ordered by default β = Can be reordered (need fence)
F.6 FENCE Equivalents Across Architectures
| RISC-V | x86 | ARM | Purpose |
|---|---|---|---|
fence rw, rw | MFENCE | DMB SY | Full barrier |
fence w, w | SFENCE | DMB ST | Store barrier |
fence r, r | LFENCE | DMB LD | Load barrier |
fence r, rw | (implicit) | DMB LD | Acquire |
fence rw, w | (implicit) | DMB ST | Release |
fence.i | (implicit) | ISB | Instruction sync |
fence.tso | (implicit) | - | TSO ordering |
F.7 Acquire/Release Semantics
Acquire Semantics
Meaning: No memory operations after the acquire can move before it.
Use Case: After acquiring a lock, before accessing protected data.
Implementation:
# Option 1: Atomic with .aq
lr.w.aq t0, (a0)
# Option 2: Load + fence
lw t0, 0(a0)
fence r, rw
Release Semantics
Meaning: No memory operations before the release can move after it.
Use Case: After accessing protected data, before releasing a lock.
Implementation:
# Option 1: Atomic with .rl
amoswap.w.rl zero, zero, (a0)
# Option 2: Fence + store
fence rw, w
sw zero, 0(a0)
Acquire-Release Pair
Complete Lock Example:
# Acquire
acquire:
lr.w.aq t0, (a0)
bnez t0, acquire
li t1, 1
sc.w.aq t2, t1, (a0)
bnez t2, acquire
# Critical section
# ... protected operations ...
# Release
amoswap.w.rl zero, zero, (a0)
F.8 Common Pitfalls
Pitfall 1: Missing Fences
Wrong:
# Producer
sw a0, 0(s0) # Write data
sw a1, 0(s1) # Write flag
# Consumer
lw t0, 0(s1) # Read flag
lw t1, 0(s0) # Read data (might be stale!)
Correct:
# Producer
sw a0, 0(s0)
fence w, w # Add fence!
sw a1, 0(s1)
# Consumer
lw t0, 0(s1)
fence r, r # Add fence!
lw t1, 0(s0)
Pitfall 2: Wrong Fence Type
Wrong (using load-load fence for release):
# Critical section
fence r, r # Wrong! Doesn't order stores
sw zero, 0(a0) # Release lock
Correct:
# Critical section
fence rw, w # Correct! Orders all ops before stores
sw zero, 0(a0)
Pitfall 3: Forgetting .aq/.rl on Atomics
Wrong:
lr.w t0, (a0) # Missing .aq!
# Critical section
amoswap.w zero, zero, (a0) # Missing .rl!
Correct:
lr.w.aq t0, (a0)
# Critical section
amoswap.w.rl zero, zero, (a0)
Pitfall 4: Data Race
Wrong (no synchronization):
# Hart 0
sw a0, 0(s0) # Write shared variable
# Hart 1
lw t0, 0(s0) # Read shared variable (DATA RACE!)
Correct (use lock or atomic):
# Hart 0
# ... acquire lock ...
sw a0, 0(s0)
# ... release lock ...
# Hart 1
# ... acquire lock ...
lw t0, 0(s0)
# ... release lock ...
F.9 Quick Decision Tree
Do I need a fence?
Are multiple harts accessing shared memory?
ββ No β No fence needed
ββ Yes β Continue
Are the accesses synchronized (locks, atomics)?
ββ Yes β Fence included in lock/atomic
ββ No β Continue
Are the accesses to the same address?
ββ Yes β No fence needed (hardware preserves order)
ββ No β FENCE REQUIRED!
What type of fence?
ββ Publishing data? β fence w, w
ββ Consuming data? β fence r, r
ββ Acquiring lock? β fence r, rw (or .aq)
ββ Releasing lock? β fence rw, w (or .rl)
ββ Not sure? β fence rw, rw (full fence)
F.10 References
- RISC-V Memory Model Specification: Chapter 14 of RISC-V ISA Manual
- RVWMO Formal Specification: https://github.com/riscv/riscv-isa-manual
- Memory Model Tools: herd7, rmem (for verification)
- Linux Kernel Memory Barriers: Documentation/memory-barriers.txt