Classroom Glossary Public page

Week 8: PMP and W^X Enforcement

1,313 words

In CSA-101 Chapter 12, section §12.11, you noted that Virtus OS v1 had no W^X policy: a program could write to any address and execute anything it wrote. This week you close that.


Reading

Required. Petzold, CODE, Ch 16 ("An Assemblage of Memory") final sections. Petzold traces memory protection from the 8086 segment descriptors to the 386 protected mode. The 386's descriptor table added privilege and type bits to each segment: ring 0 segments could only be accessed from ring 0 code; execute-only segments could be fetched but not read as data. PMP is the RISC-V answer to the same problem at physical-address granularity.

Required. Waterman and Asanovic, RISC-V ISA Manual Volume II: Privileged Architecture, section 3.7 (Physical Memory Protection). Read the entire section (it is short). Pay attention to: the priority order (entry 0 is highest priority), the TOR/NA4/NAPOT addressing modes, the L bit (locked -- blocks M-mode too), and the default-deny behavior in U-mode when no entry matches.


Lecture: Physical Memory Protection

Two enforcement layers

After Module 7, your CPU has two layers of address-space enforcement:

  1. MMU (Sv32): Virtual-to-physical translation. The OS controls which physical pages appear in each process's virtual address space. An access to an unmapped page faults before it reaches the bus.

  2. PMP: Physical address enforcement. Even after the MMU translates an address, PMP checks whether the physical address is allowed with the required permission (R/W/X) for the current privilege level. A store to a code page (physical address in the text segment) fails PMP before it reaches DRAM.

These layers compose: a physical address that the MMU maps correctly can still be blocked by PMP. The OS uses both. The MMU provides isolation between processes (each process's virtual space is distinct). PMP provides invariants at the physical level that hold even if the MMU is misconfigured.

PMP register layout

CSA-201 implements 8 PMP entries. Each entry has two CSRs:

pmpaddr0-7 (CSR addresses 0x3B0-0x3B7): 32-bit physical address configuration (interpretation depends on addressing mode).

pmpcfg0, pmpcfg1 (CSR addresses 0x3A0, 0x3A1): 8-bit configuration per entry, packed four per CSR:

pmpcfg0 = [cfg3 | cfg2 | cfg1 | cfg0]   (each 8 bits)
pmpcfg1 = [cfg7 | cfg6 | cfg5 | cfg4]

Each 8-bit cfg field:

bit 7: L  (locked: enforced in M-mode; once set, only hart reset can clear)
bit 6: 0  (reserved)
bit 5: 0  (reserved)
bit 4: A1 (addressing mode, high bit)
bit 3: A0 (addressing mode, low bit)
bit 2: X  (execute permission)
bit 1: W  (write permission)
bit 0: R  (read permission)

Addressing modes

OFF (A=00). Entry disabled; never matches.

TOR (A=01). Top of range. Entry i matches if pmpaddr[i-1] << 2 <= addr < pmpaddr[i] << 2. Entry 0 uses 0 as the lower bound. TOR is useful for contiguous regions.

NA4 (A=10). Naturally-aligned 4-byte region. Entry matches if the physical address equals pmpaddr << 2. Rarely used; covers exactly one word.

NAPOT (A=11). Naturally-aligned power-of-2 region, 8 bytes or larger. The size is encoded by trailing 1 bits in pmpaddr: T trailing 1s means a 2^(T+3) byte range. Examples:

  • 0x...0000 (no trailing 1s): 8-byte region.
  • 0x...0001 (1 trailing 1): 16-byte region.
  • 0x...0003 (2 trailing 1s): 32-byte region.
  • 0x1FFFFFFF (29 trailing 1s): 2 GiB region (covers 0x00000000-0x7FFFFFFF).

The W^X policy

W^X (write XOR execute): a page may be writable or executable, but not both. This policy prevents the classic attack where shellcode is written to a data buffer and then transferred to via a corrupted return address.

PMP implementation of W^X: configure two regions for user code.

Code segment (read + execute, not writable):

pmpcfg0 entry 0: R=1, W=0, X=1, A=TOR, L=0
pmpaddr0 = <code_end_physical_addr> >> 2

(with pmpaddr[-1] implicitly zero, this covers 0x00000000 to code_end)

Data segment (read + write, not executable):

pmpcfg0 entry 1: R=1, W=1, X=0, A=TOR, L=0
pmpaddr1 = <data_end_physical_addr> >> 2

With these two entries, a user-mode store to the code segment triggers a store PMP fault (mcause=7); a user-mode fetch from the data segment triggers an instruction-access fault (mcause=1).

Priority cascade

The reference implementation (pmp.v) evaluates entries from 0 to 7 in priority order. The first matching entry's permissions are used. Entries after the first match are not consulted. If no entry matches: M-mode is permitted (default allow); U-mode is denied (default deny).

Three PMP unit instances exist in cpu.v: u_pmp_fetch (checks instruction fetches), u_pmp_load (checks loads), u_pmp_store (checks stores). All three are evaluated simultaneously in a single combinational path.

The L bit and the locked footgun

When L=1, the PMP entry is enforced even in M-mode and cannot be modified until reset. L=1 is used by security-critical systems to protect regions that even the hypervisor cannot reach. For CSA-201, do not set L=1 during development: a misconfigured locked entry requires a full power cycle to clear.

Architecture Comparison Sidebar: Privilege rings and memory protection

Architecture Rings / modes W^X mechanism Notes
Linux on x86_64 (current) Ring 0 (kernel) / Ring 3 (user) NX bit in PTE + x86_64 SMEP/SMAP NX (no-execute) bit per PTE; SMEP prevents kernel executing user pages
Linux on AArch64 EL0 (user) / EL1 (kernel) PXN/UXN bits in page table descriptor PXN = privileged execute never; UXN = user execute never
Windows ring 0/3 Ring 0 (kernel) / Ring 3 (user) DEP (Data Execution Prevention) via NX bit Same x86_64 NX bit; exposed as "DEP" to end users
RISC-V with PMP M/S/U PMP R/W/X per region Per-region not per-page; PMP is below the MMU; both can be active
CSA-101 Virtus OS v1 M only (flat) None (§12.11 omission) This week's closure

The key distinction: PMP is at the physical address level (below the MMU), while the x86_64 NX bit and AArch64 PXN/UXN bits are at the virtual address level (in the page table). For a fully-featured system, both are used: the MMU controls per-process virtual memory layout; PMP provides physical invariants that hold across all virtual mappings.


Lab exercises

See labs/lab-8-pmp-wx.md for the full specification.

Lab 8.1: PMP-defended stack-smash. You will configure PMP W^X enforcement, then demonstrate that the Ch 12 §12.11 exploit (a write to a code page followed by a branch to the written address) is now trapped before either step completes.

The lab has three parts: (A) instantiate pmp.v from the reference implementation and configure two TOR entries (code=RX, data=RW); (B) write a test program that attempts a write to the code segment and verify mcause=7 fires before the write reaches DRAM; (C) write a test program that attempts to branch to a data-segment address and verify mcause=1 fires before the fetch.


Independent practice

  1. NAPOT encoding: compute the pmpaddr value for a NAPOT region covering physical addresses 0x20000000 to 0x3FFFFFFF (512 MiB). Show the calculation: size in bytes → number of trailing 1s → pmpaddr encoding.

  2. An OS configures PMP entry 0 as: TOR, R=1, W=0, X=0, L=0, pmpaddr0 = 0x3FFFFFFF. What physical addresses does this entry cover? What access types are permitted in U-mode? In M-mode (since L=0)?

  3. A kernel misconfigures its PMP entries: the code segment and stack overlap (both covered by the same NAPOT entry with R=1, W=1, X=1). Why is W^X violated? Write the corrected PMP configuration.

  4. Toolchain Diary entry: SignalTap II Logic Analyzer. Record how to add a SignalTap tap to a signal in Quartus, how to trigger on a condition (e.g., "fire when pmp_fault is asserted"), and how to read the captured waveform.


Reflection prompts

  1. The L bit makes a PMP entry locked: M-mode code cannot modify it until reset. What threat model justifies locking a PMP entry? Name a real-world product where this feature is used (hint: secure boot on embedded devices).

  2. PMP entries are evaluated in priority order (entry 0 first). What happens if entry 0 covers the entire physical address space as RWX and entry 1 covers the stack as R-only? Does entry 1 have any effect? Why does this matter for how you structure the entry table?

  3. The reference implementation instantiates three PMP units (fetch, load, store) separately. Why not use a single shared PMP unit? Under what microarchitectural condition would a single shared unit be sufficient?


What's next

Modules 9-11 are OS software modules. The hardware is in place. Module 9 adds stack canaries and control-flow integrity (CFI) -- compiler-emitted and OS-supported defenses that build on the PMP foundation from this week. Module 10 adds tracing garbage collection to Virtus OS v2's Memory service. Module 11 adds a preemptive scheduler.