Chapter 13. SoC Integration

Part VIII — System Design, Platform Spec & SoC Integration

🎯 Learning Objectives

After reading this chapter, you will be able to:

Understand PMP’s Role: Grasp how Physical Memory Protection limits access at the hardware level
Distinguish TOR vs NAPOT: Understand the configuration differences and use cases for each Address Matching Mode
Configure PMP Entries: Set up read-only regions and intercept illegal writes
Understand PLIC Architecture: Grasp how the Platform-Level Interrupt Controller operates
Integrate SoC Components: Understand how CPU, Memory, and Peripherals connect via Interconnect

💡 Scenario: The Museum’s Red Barrier Poles

Scene: Junior stares at a “Store Access Fault” exception code on the screen, looking confused.

Junior: “Architect, this is so strange. I already turned off the MMU (virtual memory) and I’m using physical addresses directly to write to this variable. Why is the CPU still blocking me? Is the board broken?”

Architect: “The board is fine. You just hit PMP (Physical Memory Protection)’s ‘red barrier poles.’

Imagine memory is a museum:

Mechanism	Analogy	Function
MMU (Page Table)	Tour map	Tells you where exhibits are (VA → PA)
PMP	Red barrier poles + bulletproof glass	Hardware security, limits who can touch what

Even if you bypass the tour guide (turn off MMU) and rush straight to the Mona Lisa, the hardware security (PMP Checker) will still stop you, because your ID (Privilege Mode) says you’re just an ordinary visitor (S-mode/U-mode), and this area is only accessible to the museum director (M-mode).“

Junior: “So how do I set up these barrier poles? Do I need start and end addresses?”

Architect: “There are two common ways to set up the barriers:

TOR (Top of Range): Like stretching a rope. You need two poles (two PMP Entries), and the area between them is the controlled region. Good for arbitrary-sized regions.
NAPOT (Naturally Aligned Power of Two): Like placing a fixed-size dome (4KB, 2MB…) over exhibits. You only need to set the center point and dome size—more resource-efficient (uses only one Entry).

Today let’s try using NAPOT to cover a 4KB region and make it ‘read-only,’ and see what happens to your program.“

A RISC-V processor core doesn’t operate in isolation. To build a complete system-on-chip (SoC), the core must integrate with memory controllers, interrupt controllers, I/O devices, and system interconnects. This integration determines how software accesses hardware, how devices communicate, and how the system maintains security and performance.

RISC-V provides a modular approach to SoC design. Unlike monolithic architectures that prescribe specific peripheral implementations, RISC-V defines standard interfaces while allowing flexibility in implementation. The Physical Memory Protection (PMP) unit controls memory access in machine mode. The Platform-Level Interrupt Controller (PLIC) routes interrupts from devices to cores. Memory-mapped I/O (MMIO) provides a uniform mechanism for device access. System interconnects like TileLink and AXI connect components together.

This chapter explores how RISC-V cores integrate into complete SoCs. We’ll examine the essential components—PMP, IOMMU, PLIC, MMIO, memory maps, interconnects, and DMA—and see how they work together to create functional systems. Understanding SoC integration is crucial for system designers, firmware developers, and anyone working with RISC-V hardware platforms.

13.1 Physical Memory Protection (PMP)

The Need for Memory Isolation

In systems without virtual memory, how do we prevent untrusted code from accessing sensitive memory regions? A bare-metal application might need to protect its firmware from buggy drivers. An embedded RTOS might need to isolate tasks from each other. Machine-mode firmware must protect itself from supervisor-mode operating systems.

Physical Memory Protection (PMP) provides hardware-enforced memory access control using physical addresses. Unlike virtual memory’s page tables (which operate in S-mode), PMP operates in M-mode and applies to all lower privilege levels. This makes PMP essential for systems without MMUs and useful for protecting M-mode resources even in systems with MMUs.

PMP Architecture

PMP uses a set of configuration registers to define protected memory regions:

pmpcfg0-pmpcfg15: Configuration registers (RV32 has 4, RV64 has 16)

pmpaddr0-pmpaddr63: Address registers (up to 64 regions)

Each PMP entry consists of:

An address register (pmpaddr) defining the region
A configuration byte (in pmpcfg) specifying permissions and matching mode

PMP Configuration Format

pmpcfg format (8 bits per entry):
  7     6:5   4:3   2     1     0
  L     0 0   A     X     W     R

L: Lock bit (prevents further modification)
A: Address matching mode (OFF, TOR, NA4, NAPOT)
X: Execute permission
W: Write permission
R: Read permission

Address Matching Modes

PMP supports four address matching modes:

OFF (A=0): Region is disabled, no protection applied

TOR (A=1): Top-of-Range. Region is [pmpaddr[i-1], pmpaddr[i])

NA4 (A=2): Naturally Aligned 4-byte region

NAPOT (A=3): Naturally Aligned Power-Of-Two region

The most commonly used modes are TOR and NAPOT:

// Example: Protect 64KB region at 0x80000000 using NAPOT
// NAPOT encoding: address = base | (size/2 - 1)
// For 64KB: 0x80000000 | 0x7FFF = 0x80007FFF
pmpaddr0 = 0x80007FFF >> 2;  // Right shift by 2 (PMP addresses are >> 2)
pmpcfg0 = 0x1F;  // L=0, A=3 (NAPOT), X=1, W=1, R=1

// Example: Protect range [0x80000000, 0x80010000) using TOR
pmpaddr0 = 0x80000000 >> 2;  // Start address
pmpaddr1 = 0x80010000 >> 2;  // End address
pmpcfg0 = 0x09;  // Entry 0: A=1 (TOR), X=0, W=0, R=1

PMP Priority and Matching

When an access occurs, PMP checks entries from lowest to highest index. The first matching entry determines the access permissions. If no entry matches, the access is denied (for M-mode, this behavior is implementation-defined).

Access check algorithm:
1. For i = 0 to N-1:
   - If address matches pmpaddr[i] region:
     - Check permissions in pmpcfg[i]
     - If allowed: grant access
     - If denied: raise access fault
2. If no match: deny access (or allow for M-mode)

Lock Bit

The lock bit (L) prevents further modification of a PMP entry until the next reset. This is crucial for protecting M-mode firmware from being disabled by compromised S-mode code:

// Lock the firmware region
pmpaddr0 = 0x80000000 >> 2;
pmpaddr1 = 0x80010000 >> 2;
pmpcfg0 = 0x89;  // L=1, A=1 (TOR), X=0, W=0, R=1

// Any subsequent write to pmpcfg0 or pmpaddr0/1 is ignored
pmpcfg0 = 0x00;  // This write has no effect!

PMP Use Cases

Firmware Protection: M-mode firmware protects itself from S-mode OS
Device Memory Protection: Prevent unauthorized access to MMIO regions
Task Isolation: Embedded RTOS isolates tasks without MMU
Secure Boot: Protect boot ROM and secure storage

13.2 IOMMU for RISC-V

The DMA Problem

Direct Memory Access (DMA) allows devices to access memory without CPU intervention, improving performance for I/O-intensive workloads. But DMA creates a security problem: devices use physical addresses and bypass the CPU’s virtual memory protection. A malicious or buggy device could read sensitive data or corrupt kernel memory.

An Input-Output Memory Management Unit (IOMMU) solves this by providing address translation and access control for devices. Just as the MMU translates virtual addresses for the CPU, the IOMMU translates device addresses for peripherals. This enables:

Device isolation: Each device sees only its own memory
Virtualization: Virtual machines can safely pass through devices
Large address spaces: 32-bit devices can access >4GB memory

RISC-V IOMMU Architecture

The RISC-V IOMMU specification defines a standard interface for device address translation. The IOMMU sits between devices and memory, intercepting device memory requests and translating them through device page tables.

Device → IOMMU → Memory
         ↓
    Device Context
    Device Page Tables

Key components:

Device Context: Per-device configuration (page table pointer, permissions)

Device Directory Table (DDT): Maps device IDs to device contexts

I/O Page Tables: Similar to CPU page tables, but for device addresses

Command Queue: Software sends commands to IOMMU (invalidate TLB, etc.)

Fault Queue: IOMMU reports translation faults to software

Device Address Translation

When a device issues a memory request:

IOMMU extracts device ID from the request
Looks up device context in DDT
Walks device page tables to translate address
Checks permissions (read/write/execute)
Forwards translated request to memory or reports fault

// Simplified IOMMU translation
struct device_context {
    uint64_t page_table_root;  // Root page table address
    uint32_t permissions;       // Device permissions
    uint32_t address_width;     // Device address width
};

// Device issues read from device address 0x1000
device_addr = 0x1000;
device_id = 0x42;

// IOMMU lookup
ctx = ddt[device_id];
physical_addr = walk_page_table(ctx.page_table_root, device_addr);
if (physical_addr && (ctx.permissions & READ)) {
    forward_to_memory(physical_addr);
} else {
    report_fault(device_id, device_addr);
}

IOMMU Page Table Formats

RISC-V IOMMU supports multiple page table formats:

Sv39/Sv48/Sv57: Same format as CPU page tables (for simplicity)
MSI Page Tables: Special format for Message Signaled Interrupts

Using the same format as CPU page tables simplifies software—the OS can reuse existing page table code for device mappings.

IOMMU vs ARM SMMU

ARM’s System Memory Management Unit (SMMU) provides similar functionality:

Feature	RISC-V IOMMU	ARM SMMU
Page Table Format	Sv39/Sv48/Sv57	LPAE (Long descriptor)
Device Identification	Device ID (configurable)	Stream ID
Command Interface	Command queue	CMDQ (Command Queue)
Fault Reporting	Fault queue	Event queue
Virtualization	Two-stage translation	Stage 1 + Stage 2
Complexity	Simpler, modular	More complex, feature-rich

RISC-V IOMMU emphasizes simplicity and reuse of existing CPU MMU concepts, while ARM SMMU has evolved through multiple generations with extensive features.

13.3 Platform-Level Interrupt Integration

Interrupt Routing in SoCs

A typical RISC-V SoC has dozens of interrupt sources: UARTs, timers, GPIOs, network controllers, storage devices. Each device needs to signal the CPU when it requires attention. The Platform-Level Interrupt Controller (PLIC) manages this complexity by:

Collecting interrupts from all devices
Routing interrupts to appropriate cores
Managing interrupt priorities
Providing claim/complete mechanism

PLIC Architecture

Interrupt Sources (1-1023)
    ↓
PLIC Gateway (per source)
    ↓
PLIC Core (priority arbitration)
    ↓
Interrupt Targets (cores × contexts)

Key concepts:

Interrupt Source: A device that can generate interrupts (numbered 1-1023, source 0 is reserved)

Interrupt Gateway: Converts device interrupt signal to PLIC internal format

Interrupt Target: A CPU context (M-mode or S-mode on each core)

Priority: Each source has a priority (0 = never interrupt, 1-7 = increasing priority)

Threshold: Each target has a threshold (only interrupts with priority > threshold are delivered)

PLIC Memory Map

The PLIC is accessed through memory-mapped registers:

Base Address: 0x0C000000 (typical)

Interrupt Priorities:
  0x0C000000 + 4*source_id: Priority for source (0-7)

Interrupt Pending:
  0x0C001000: Pending bits (1024 bits, read-only)

Interrupt Enable:
  0x0C002000 + 0x80*context: Enable bits for context

Priority Threshold:
  0x0C200000 + 0x1000*context: Threshold for context

Claim/Complete:
  0x0C200004 + 0x1000*context: Claim and complete register

Interrupt Handling Flow

Device asserts interrupt: Device raises interrupt line
PLIC gateway captures: Gateway sets pending bit
PLIC arbitration: PLIC selects highest priority pending interrupt for each target
CPU notification: PLIC asserts external interrupt to CPU
Software claim: Interrupt handler reads claim register (returns source ID, clears pending)
Software handling: Handler services the device
Software complete: Handler writes source ID to complete register

// PLIC interrupt handler
void plic_handler(void) {
    uint32_t source = plic_claim();  // Read claim register

    if (source == UART0_IRQ) {
        uart0_interrupt_handler();
    } else if (source == TIMER_IRQ) {
        timer_interrupt_handler();
    }
    // ... handle other sources

    plic_complete(source);  // Write to complete register
}

uint32_t plic_claim(void) {
    volatile uint32_t *claim = (uint32_t*)(PLIC_BASE + 0x200004);
    return *claim;  // Reading claim atomically claims the interrupt
}

void plic_complete(uint32_t source) {
    volatile uint32_t *complete = (uint32_t*)(PLIC_BASE + 0x200004);
    *complete = source;  // Writing complete releases the interrupt
}

Multi-Core Interrupt Routing

In a multi-core system, each core has separate M-mode and S-mode contexts. The PLIC can route interrupts to specific cores:

// Configure UART interrupt to route to core 0 S-mode
#define PLIC_ENABLE_BASE 0x0C002000
#define UART0_IRQ 10
#define CORE0_S_MODE_CONTEXT 1

// Enable UART interrupt for core 0 S-mode
uint32_t *enable = (uint32_t*)(PLIC_ENABLE_BASE + 0x80 * CORE0_S_MODE_CONTEXT);
enable[UART0_IRQ / 32] |= (1 << (UART0_IRQ % 32));

// Set priority
uint32_t *priority = (uint32_t*)(PLIC_BASE + 4 * UART0_IRQ);
*priority = 5;  // Priority 5

// Set threshold
uint32_t *threshold = (uint32_t*)(PLIC_BASE + 0x200000 + 0x1000 * CORE0_S_MODE_CONTEXT);
*threshold = 0;  // Accept all priorities > 0

13.4 Memory-Mapped I/O (MMIO)

Unified Address Space

RISC-V uses memory-mapped I/O: devices appear as memory locations. Reading from a device register uses the same load instruction as reading from RAM. Writing to a device register uses the same store instruction. This unifies the programming model—no special I/O instructions needed.

# Read UART status register
li   a0, 0x10000000      # UART base address
lw   a1, 0(a0)           # Read status register

# Write to UART data register
li   a2, 'A'             # Character to send
sw   a2, 4(a0)           # Write to data register

MMIO Address Regions

A typical RISC-V SoC memory map divides address space into regions:

0x00000000 - 0x0FFFFFFF: Debug/Boot ROM
0x10000000 - 0x1FFFFFFF: Peripherals (UART, SPI, GPIO, etc.)
0x20000000 - 0x2FFFFFFF: PLIC
0x30000000 - 0x3FFFFFFF: Reserved
0x40000000 - 0x7FFFFFFF: More peripherals
0x80000000 - 0xFFFFFFFF: DRAM

Each peripheral gets a block of addresses for its registers:

UART0: 0x10000000 - 0x10000FFF
  0x10000000: Status register
  0x10000004: Data register
  0x10000008: Control register
  0x1000000C: Baud rate register

MMIO Ordering Requirements

MMIO accesses have ordering requirements that differ from normal memory:

Device registers may have side effects: Reading a status register might clear an interrupt flag
Write order matters: Writing control registers in wrong order can cause device malfunction
Read/write dependencies: A write must complete before a subsequent read sees the result

RISC-V provides fence instructions to enforce ordering:

# Ensure MMIO write completes before continuing
li   a0, 0x10000000
li   a1, 0x1
sw   a1, 0(a0)           # Write to control register
fence iorw, iorw         # Ensure write completes
lw   a2, 4(a0)           # Read status register

The FENCE instruction takes two operands specifying predecessor and successor operations:

i: Device input (MMIO read)
o: Device output (MMIO write)
r: Memory read
w: Memory write

Common patterns:

fence iorw, iorw: Full fence (all operations)
fence ow, ow: Ensure MMIO writes complete in order
fence ir, ir: Ensure MMIO reads complete in order

Uncached vs Cached MMIO

MMIO regions must be marked as uncached in page tables. Caching device registers would cause:

Stale data: Cache might return old value instead of current device state
Lost writes: Write to cached location might not reach device
Side effect loss: Reading cached value doesn’t trigger device side effects

Page table entries for MMIO use special attributes:

// Mark MMIO region as uncached and unbuffered
pte = (physical_addr >> 12) << 10;  // PPN
pte |= PTE_V | PTE_R | PTE_W;       // Valid, readable, writable
pte |= PTE_A | PTE_D;                // Accessed, dirty
// Do NOT set PTE_C (cacheable) for MMIO

13.5 SoC Memory Map

Typical RISC-V SoC Layout

A complete RISC-V SoC memory map includes ROM, RAM, peripherals, and reserved regions. Here’s a typical layout for a 32-bit SoC:

Address Range          Size      Description
0x00000000-0x00000FFF  4 KB      Debug ROM
0x00001000-0x00000FFF  60 KB     Reserved
0x00010000-0x0001FFFF  64 KB     Boot ROM (mask ROM)
0x00020000-0x00FFFFFF  ~16 MB    Reserved
0x01000000-0x01FFFFFF  16 MB     CLINT (Core-Local Interruptor)
0x02000000-0x0BFFFFFF  160 MB    Reserved
0x0C000000-0x0FFFFFFF  64 MB     PLIC
0x10000000-0x1000FFFF  64 KB     UART0
0x10010000-0x1001FFFF  64 KB     SPI0
0x10020000-0x1002FFFF  64 KB     GPIO
0x10030000-0x1FFFFFFF  ~256 MB   Other peripherals
0x20000000-0x3FFFFFFF  512 MB    Reserved
0x40000000-0x7FFFFFFF  1 GB      External devices
0x80000000-0xFFFFFFFF  2 GB      DRAM

Address Decode and Routing

The SoC interconnect decodes addresses and routes requests to appropriate components:

CPU issues load/store
    ↓
Address decode
    ↓
    ├─ 0x00000000-0x0FFFFFFF → Boot ROM / CLINT / PLIC
    ├─ 0x10000000-0x1FFFFFFF → Peripheral bus
    ├─ 0x20000000-0x7FFFFFFF → Reserved / External
    └─ 0x80000000-0xFFFFFFFF → DRAM controller

Memory Map Examples

Different RISC-V platforms use different memory maps:

SiFive FU540 (HiFive Unleashed):

0x00001000: Boot ROM
0x02000000: CLINT
0x0C000000: PLIC
0x10000000: UART0
0x10010000: QSPI0
0x10040000: GPIO
0x80000000: DDR (8 GB)

Kendryte K210:

0x00000000: SRAM (6 MB)
0x40000000: Peripherals
0x50000000: AI accelerator
0x80000000: Flash (16 MB)

QEMU virt machine:

0x00001000: Boot ROM
0x02000000: CLINT
0x0C000000: PLIC
0x10000000: UART0
0x10001000: VirtIO devices
0x80000000: DRAM (configurable)

13.6 System Interconnects

The Need for Interconnects

A modern SoC has multiple masters (CPU cores, DMA controllers, GPUs) and multiple slaves (memory, peripherals, accelerators). An interconnect fabric connects these components, handling:

Address routing: Directing requests to correct destination
Arbitration: Managing concurrent accesses
Data width conversion: Connecting 32-bit devices to 64-bit buses
Clock domain crossing: Bridging different clock frequencies

AXI (Advanced eXtensible Interface)

ARM’s AMBA AXI is widely used in RISC-V SoCs due to its maturity and IP availability. AXI4 provides:

Separate read/write channels: Independent read and write transactions
Burst transfers: Efficient multi-beat transfers
Out-of-order completion: Transactions can complete in any order
Quality of Service (QoS): Priority-based arbitration

AXI signals:

Write Address Channel: AWADDR, AWLEN, AWSIZE, AWVALID, AWREADY
Write Data Channel:    WDATA, WSTRB, WLAST, WVALID, WREADY
Write Response:        BRESP, BVALID, BREADY
Read Address Channel:  ARADDR, ARLEN, ARSIZE, ARVALID, ARREADY
Read Data Channel:     RDATA, RRESP, RLAST, RVALID, RREADY

AHB (Advanced High-performance Bus)

AHB is simpler than AXI, suitable for lower-performance peripherals:

Single channel: Address and data share the same channel
Pipelined: Two-stage pipeline (address, data)
Simpler protocol: Easier to implement
Lower performance: No out-of-order, limited bursts

TileLink

TileLink is a RISC-V-native interconnect developed at UC Berkeley:

Designed for RISC-V: Matches RISC-V memory model
Scalable: From simple embedded to complex multi-core
Cache coherence: Built-in support for coherent caches
Three conformance levels:
- TL-UL: Uncached Lightweight (simple peripherals)
- TL-UH: Uncached Heavyweight (DMA, accelerators)
- TL-C: Cached (coherent caches)

TileLink advantages for RISC-V:

Native support for RISC-V atomics (LR/SC, AMO)
Efficient cache coherence protocol
Open specification (no licensing)

Interconnect Comparison

Feature	AXI4	AHB	TileLink
Channels	5 independent	1 shared	3 (A, D, optional C/E)
Burst Support	Yes (up to 256 beats)	Yes (limited)	Yes
Out-of-Order	Yes	No	Yes
Cache Coherence	No (needs ACE)	No	Yes (TL-C)
Complexity	High	Low	Medium
Performance	High	Medium	High
RISC-V Atomics	Requires extensions	Requires extensions	Native support
Licensing	ARM (free for use)	ARM (free for use)	Open (BSD)
Ecosystem	Mature, extensive IP	Mature, simple IP	Growing, RISC-V focused

Choosing an Interconnect

AXI: Best for high-performance SoCs, extensive IP ecosystem, industry standard
AHB: Best for simple embedded systems, low-cost peripherals
TileLink: Best for RISC-V-native designs, cache coherence, open ecosystem

13.7 DMA and Coherency

DMA Controller Integration

A DMA controller transfers data between memory and peripherals without CPU intervention. This frees the CPU for other tasks while large data transfers proceed in the background.

Typical DMA use cases:

Disk I/O: Transfer data between storage and memory
Network I/O: Move packets between NIC and memory
Audio/Video: Stream data to/from media devices
Memory-to-memory: Fast memory copy operations

DMA controller architecture:

CPU configures DMA
    ↓
DMA reads source (memory or device)
    ↓
DMA writes destination (device or memory)
    ↓
DMA signals completion (interrupt)

Cache Coherency Considerations

DMA creates coherency problems when the CPU has caches:

Problem 1: Stale cache data

1. CPU writes data to memory (data in cache, not yet in RAM)
2. DMA reads from memory (gets old data, not cached data)
3. DMA sends wrong data to device

Problem 2: Stale memory data

1. DMA writes data to memory
2. CPU reads data (gets old cached data, not new DMA data)
3. CPU processes wrong data

Solutions:

Software cache management (simple, lower performance):

// Before DMA read (device → memory)
dma_start(device, buffer, size);
dma_wait_complete();
cache_invalidate(buffer, size);  // Discard cached data
// Now CPU can read fresh data

// Before DMA write (memory → device)
cache_flush(buffer, size);       // Write cached data to memory
dma_start(buffer, device, size);
dma_wait_complete();

Hardware cache coherence (complex, higher performance):

DMA controller participates in cache coherence protocol
DMA snoops CPU caches or uses coherent interconnect
Requires coherent interconnect (ACE for AXI, TL-C for TileLink)

DMA and Virtual Memory

DMA controllers typically use physical addresses, but software uses virtual addresses. This creates challenges:

Problem: Virtual address buffer might span non-contiguous physical pages

Virtual:  [0x1000-0x2FFF] (8 KB contiguous)
Physical: [0x80000000-0x80000FFF] + [0x85000000-0x85000FFF] (non-contiguous!)

Solutions:

Scatter-Gather DMA: DMA controller accepts list of physical address/length pairs

struct sg_entry {
    uint64_t addr;   // Physical address
    uint32_t len;    // Length in bytes
};

struct sg_entry sg_list[] = {
    {0x80000000, 4096},
    {0x85000000, 4096},
};
dma_start_sg(device, sg_list, 2);

IOMMU: Translate device addresses to physical addresses (see Section 13.2)
Physically contiguous buffers: Allocate DMA buffers from reserved physical memory

🛠️ Hands-on Lab: Lab 13.1 — Memory Firewall (PMP Shield)

This lab demonstrates PMP’s core functionality: setting protection rules in M-mode, then switching to S-mode to attempt a violation.

Lab Objectives

Configure PMP Entry 0 as Read-Only (R=1, W=0) to protect target variable
Configure PMP Entry 1 as Allow-All (R=1, W=1, X=1) to let other code run normally
Switch to S-mode and attempt a write to trigger Store Access Fault

NAPOT Encoding Principle

NAPOT encoding can be abstract for beginners. The key formula is:

pmpaddr = (base_addr >> 2) | ((size >> 3) - 1)

Example: Protect a 4KB region starting at 0x80200000
- Base = 0x80200000, Size = 4KB (0x1000)
- 0x80200000 >> 2 = 0x20080000
- (0x1000 >> 3) - 1 = 0x1FF
- pmpaddr = 0x20080000 | 0x1FF = 0x200801FF

Encoding Rules:

pmpaddr low bits	Corresponding region size
`...aaaaa0`	8 bytes
`...aaaa01`	16 bytes
`...aaa011`	32 bytes
`...a01111`	128 bytes
`...0111111111`	4KB (what we use)

Code (pmp_lab.S)

.section .text
.global _start

_start:
    # ---------------------------------------------------
    # 1. Set up Trap Handler (catch Access Fault later)
    # ---------------------------------------------------
    la t0, trap_handler
    csrw mtvec, t0

    # ---------------------------------------------------
    # 2. Configure PMP (in M-mode)
    # ---------------------------------------------------

    # [Target] Protect a 4KB region at 0x80200000
    # Using NAPOT mode
    li t0, 0x200801FF
    csrw pmpaddr0, t0

    # Entry 0: Enable + NAPOT + Read Only (R=1, W=0, X=0)
    # PMP_R(1) | PMP_A_NAPOT(0x18) = 0x19

    # Entry 1: Open other memory (Allow All)
    # pmpaddr1 set to all 1s (max address), mode set to TOR
    # PMP_R(1) | PMP_W(1) | PMP_X(1) | PMP_A_TOR(0x08) = 0x0F

    # pmpcfg0 = (pmp1cfg << 8) | pmp0cfg = 0x0F19
    li t0, -1
    csrw pmpaddr1, t0

    li t0, 0x0F19
    csrw pmpcfg0, t0        # Firewall activated!

    # ---------------------------------------------------
    # 3. Drop to S-mode
    # ---------------------------------------------------

    # Set mstatus.MPP = 01 (Supervisor)
    li t0, (3 << 11)
    csrc mstatus, t0        # Clear MPP
    li t0, (1 << 11)
    csrs mstatus, t0        # Set MPP to 01 (S-mode)

    la t0, s_mode_entry
    csrw mepc, t0
    mret                    # Jump! Identity becomes Supervisor

s_mode_entry:
    # ---------------------------------------------------
    # 4. Trigger Attack (S-mode Attempt)
    # ---------------------------------------------------
    li a0, 0x80200000       # Address protected by PMP0
    li t1, 0xDEADBEEF

    # Attempt write! Should trigger Exception 7 (Store Access Fault)
    sw t1, 0(a0)

    # If we survive, experiment failed
    li a0, 0
    j stop

stop:
    j stop

trap_handler:
    # Read mcause to check exception type
    csrr t0, mcause

    # Exception 7 = Store Access Fault
    li t1, 7
    bne t0, t1, unexpected

    # SUCCESS: PMP blocked the illegal write!
    li a0, 1                # Return success code
    j stop

unexpected:
    li a0, -1               # Unexpected exception
    j stop

Compile and Run

# Assemble
riscv64-unknown-elf-as -march=rv64g -o pmp_lab.o pmp_lab.S

# Link (ensure _start is entry point)
riscv64-unknown-elf-ld -T link.ld -o pmp_lab.elf pmp_lab.o

# Run on QEMU
qemu-system-riscv64 -machine virt -nographic -bios pmp_lab.elf

Expected Behavior

M-mode: PMP entries configured, firewall activated
mret: Privilege drops to S-mode
sw instruction: Triggers Store Access Fault (Exception 7)
Trap handler: Confirms PMP did its job

danieRTOS Reference: A real RTOS would use PMP to isolate kernel data from user tasks, preventing task corruption.

⚠️ Common Pitfalls

Pitfall 1: PMP Priority Order Error

Error Scenario: Put “deny rule” in pmp15, put “allow rule” in pmp0.

Consequence: pmp0 matches first, allowing all access. The deny rule never takes effect.

// ❌ Wrong: Order reversed
pmp0: Allow All (RWX)      // Matches first, permits everything
pmp1: Deny 0x80200000      // Never gets checked

// ✅ Correct: Write specific Deny first, generic Allow last
pmp0: Read-Only 0x80200000 // Check sensitive region first
pmp1: Allow All            // Allow other regions

💡 Memory aid: Like firewall rules, write exceptions first, default last.

Pitfall 2: Forgetting the Default Deny Rule

Error Scenario: Only set one PMP Entry to protect the key area, forgot to open other memory.

Consequence: Code region not matched by any PMP Entry, S/U-mode can’t even fetch the next instruction.

# ❌ Wrong: Only one Entry
csrw pmpaddr0, t0       # Protect key
csrw pmpcfg0, 0x19      # Read-Only
mret                    # Jump to S-mode then immediately Crash!

# ✅ Correct: Add Allow All Entry
csrw pmpaddr0, t0       # Protect key
csrw pmpaddr1, t1       # Max address
csrw pmpcfg0, 0x0F19    # pmp0=RO, pmp1=RWX

Pitfall 3: Lock Bit Irreversibility

Error Scenario: Setting Lock bit (L=1) during development.

Consequence: PMP Entry locked. Only hardware Reset can unlock. Cannot modify rules during debug.

// ❌ Dangerous: Setting Lock during development
pmpcfg0 = 0x99;  // L=1, A=NAPOT, R=1

// ✅ Recommended: Only set Lock in Production
#ifdef PRODUCTION
    pmpcfg0 = 0x99;  // Locked
#else
    pmpcfg0 = 0x19;  // Dev mode, not locked
#endif

💡 Reminder: Lock bit is meant to prevent malicious modification of M-mode Firmware. Don’t use it during development.

Summary

SoC integration connects RISC-V cores with the rest of the system. This chapter covered seven essential components that make a complete RISC-V system-on-chip.

Physical Memory Protection (PMP) provides hardware-enforced memory access control using physical addresses. PMP operates in M-mode and protects memory regions from untrusted code. With up to 64 configurable regions, four address matching modes (OFF, TOR, NA4, NAPOT), and lockable entries, PMP enables firmware protection, device memory isolation, and task separation in systems without MMUs.

IOMMU extends memory protection to devices by translating device addresses and enforcing access control. The RISC-V IOMMU uses the same page table format as the CPU MMU, simplifying software implementation. This enables device isolation, safe device passthrough for virtualization, and protection against malicious or buggy devices.

Platform-Level Interrupt Controller (PLIC) manages interrupt routing in multi-core systems. The PLIC collects interrupts from up to 1023 sources, arbitrates by priority, and routes them to appropriate CPU contexts. The claim/complete mechanism ensures atomic interrupt handling, while per-context enable masks and thresholds provide flexible interrupt management.

Memory-Mapped I/O (MMIO) provides a uniform mechanism for device access using standard load and store instructions. MMIO regions must be marked uncached in page tables, and fence instructions ensure proper ordering of device accesses. This unified address space simplifies the programming model compared to architectures with separate I/O instructions.

SoC memory maps organize address space into regions for ROM, RAM, peripherals, and reserved areas. Different RISC-V platforms use different layouts, but all follow the principle of address decode and routing through the interconnect fabric. Understanding the memory map is essential for firmware development and device driver programming.

System interconnects connect multiple masters and slaves in the SoC. AXI provides high performance with extensive IP ecosystem, AHB offers simplicity for embedded systems, and TileLink provides RISC-V-native features including cache coherence. The choice depends on performance requirements, IP availability, and coherence needs.

DMA and coherency enable efficient data transfers but require careful management of cache coherence. Software can use cache flush and invalidate operations, or hardware can provide coherent DMA through snooping or coherent interconnects. IOMMU or scatter-gather DMA solves the virtual-to-physical address translation problem for DMA transfers.

Together, these components form the foundation of RISC-V SoC design, enabling everything from simple microcontrollers to complex multi-core application processors.

Keyboard shortcuts

See RiscV Run: Fundamentals