PMP prevents writes to code pages. A canary detects the write to the return address on the stack before it reaches the processor. CFI stops the ret from going anywhere the compiler didn't intend. Defense in depth.
Reading
Required. Bryant and O'Hallaron, CSAPP, Chapter 3, section 3.10 ("Combining Control and Data in Machine-Level Programs"), subsections 3.10.3 (memory referencing bugs) and 3.10.4 (thwarting buffer overflow attacks). This is the canonical introduction to stack canaries, address randomization, and non-executable stacks. Read 3.10.3-3.10.4 fully before the lab.
Required. Petzold, CODE, Ch 22 ("The Operating System"). Return to the section on what the OS protects and at what level. Petzold establishes that the OS's core contract is to prevent one process from interfering with another; the compiler-side defenses this week extend that contract to prevent a process from interfering with itself (specifically, with its own control flow).
Recommended. The RISC-V Zicfilp and Zicfiss extension specs (available at github.com/riscv/riscv-zicfi). Skim the overview sections; you do not need to implement these, but understanding the hardware model informs the software CFI you will implement.
Lecture: Stack Canaries and CFI
The stack-smash attack
A stack-smash begins with a buffer overflow. The attacker provides more data than a fixed-size stack buffer can hold. The excess data overwrites adjacent stack memory, including (crucially) the saved return address. When the function returns, the CPU loads the attacker's chosen address into PC and begins executing attacker-controlled code.
The exploit the Virtus OS v1 left open in Ch 12 §12.11: Math.multiply uses a local buffer for its iterative shift-add; a caller that passes a specially crafted pair of arguments could overflow it. Module 8's PMP W^X prevents the written shellcode from executing on the data stack. But an attacker can also redirect to existing code (ROP -- Return-Oriented Programming) without writing new instructions.
Stack canaries
A stack canary is a secret value placed on the stack between the local variables and the saved return address at function entry. At function exit, before the ret, the compiler emits code to verify the canary is unchanged. If it is modified, the return address has been clobbered, and the OS kills the process.
Compiler-emitted canary prologue (in every function with local buffers):
# function entry -- canary prologue
lw t0, canary_location # load global canary value (set at OS boot)
sw t0, canary_offset(fp) # store between locals and saved ra
Compiler-emitted canary epilogue (in every function before ret):
# function exit -- canary check
lw t0, canary_offset(fp) # load canary from stack
lw t1, canary_location # load expected value
bne t0, t1, canary_failure # if mismatch: fault
# ... restore registers, ret ...
canary_failure:
li a7, 93 # SYS_EXIT
li a0, -1 # exit code -1
ecall
The canary value is set at OS boot from a random seed (or a fixed value in the CSA-201 prototype). The attacker cannot overwrite the return address without corrupting the canary, because the canary sits between them on the stack.
Canary placement diagram:
+--------------------+ <- stack grows down
| ... locals ... |
|--------------------|
| canary value | <- canary_offset(fp)
|--------------------|
| saved ra | <- return address (target of smash)
|--------------------|
| saved fp |
+--------------------+ <- old sp (caller's frame)
How canaries interact with PMP
With both defenses active: the attacker overflows the local buffer, overwrites the canary and the return address. The canary check fires at function exit, before the ret executes. The process is killed. The modified return address is never loaded into PC. PMP W^X provides the backup: even if the canary is bypassed (an exact-length overflow that skips the canary), the shellcode on the stack cannot execute because the data stack is mapped non-executable.
Return-Oriented Programming (ROP)
PMP W^X blocks shellcode on the stack. A more sophisticated attacker does not write new code: they chain together "gadgets" -- short sequences of existing instructions ending with a ret -- to perform arbitrary computation. The return address is overwritten to point to a gadget; that gadget's ret goes to the next gadget; and so on.
Canaries alone do not prevent ROP: the attacker must still overwrite the canary, but if the canary value is leaked (via a read vulnerability), they can overwrite both canary and return address without triggering the check.
Control-Flow Integrity (CFI)
CFI restricts which addresses a ret can legally jump to. The canonical implementation uses a shadow stack: a separate hardware-protected stack that mirrors only return addresses. At function call: the shadow stack receives a copy of the return address. At return: the CPU compares the return address in the main stack against the top of the shadow stack; if they differ, it faults.
Software CFI shadow stack for CSA-201. The shadow stack lives in a PMP-protected region (R only, not W, not X -- the OS writes to it only via M-mode shadow-stack maintenance code). The compiler emits:
# function call -- shadow push
csrr t0, shadowsp # CSR holding shadow stack pointer
addi t0, t0, -4
csrw shadowsp, t0
sw ra, 0(shadow_phys) # write ra to shadow stack (via OS syscall or M-mode)
# function return -- shadow check
csrr t0, shadowsp
lw t1, 0(shadow_phys) # read expected return address
lw t2, canary_offset(fp) # also check canary
bne t1, ra, cfi_fault # if shadow mismatch: fault
bne t2, canary, cfi_fault
addi t0, t0, 4
csrw shadowsp, t0
The shadow stack physical address is visible only to M-mode; user code cannot forge writes to it. An attacker who overwrites the main-stack return address cannot simultaneously overwrite the shadow stack (it is in a separate PMP-protected region).
Zicfilp/Zicfiss. The RISC-V Zicfilp extension adds a hardware landing-pad instruction (lpad) that must appear at the target of every indirect branch. Zicfiss adds a hardware shadow stack (sspush/sspop CSR operations). These are forward-looking extensions; CSA-201 implements software CFI using PMP-protected memory, which is equivalent in security model if not in hardware performance.
Architecture Comparison Sidebar: CFI across architectures
| Architecture | W^X mechanism | Stack canary | Shadow stack |
|---|---|---|---|
| x86_64 (Intel CET) | NX bit in PTE | gcc -fstack-protector | SHSTK (hardware shadow stack; Intel CET 2020) |
| AArch64 (ARMv8.3 PAC) | PXN/UXN in PTE | gcc -fstack-protector | Pointer authentication (PA signs return addresses) |
| RISC-V (Zicfilp + Zicfiss) | PMP W^X (this module) | gcc -fstack-protector | Zicfiss shadow stack (hardware; extension pending ratification) |
| CSA-201 Virtus OS v2 | PMP W^X (Module 8) | Compiler-emitted canary | PMP-protected software shadow stack |
Intel CET's hardware shadow stack (Shadow Stack Pointer register, RSTORSSP/SAVEPREVSSP/SETSSBSY instructions) is the x86_64 equivalent of what you are building this week in software. It landed in Intel Tiger Lake (2020) and is enabled by glibc on Linux since 2024.
AArch64's Pointer Authentication (ARMv8.3) takes a different approach: it signs the return address with a cryptographic MAC before storing it and verifies the signature at return. A forged return address cannot have a valid signature. This is more powerful than a shadow stack because it protects against return-address leaks (you cannot forge the signature without the key).
Lab exercises
See labs/lab-9-stack-canaries-cfi.md for the full specification.
Lab 9.1: Stack canary detects return-address overwrite; CFI shadow stack catches ROP. You will add canary-emitting prologue/epilogue code to your compiler and implement a software CFI shadow stack.
Part A: Add canary prologue/epilogue to all functions in your CSA-201 compiler. Verify that a test program with a simulated buffer overflow (manual stack write that corrupts the canary position) triggers the canary failure handler before the corrupted ret executes.
Part B: Set up a PMP-protected shadow stack in a physical memory region that U-mode cannot write. Implement shadow-push at function call and shadow-pop-and-compare at function return. Verify that a forged return address (one that matches neither the canary nor the shadow stack) triggers a CFI fault.
Independent practice
-
The canary check uses a global canary value loaded from
canary_location. What happens if an attacker reads the canary value via a memory-read vulnerability before exploiting the overflow? How does gcc's-fstack-protector-strongmitigate this? (Look up the__stack_chk_guardvariable and its initialization.) -
ROP chains require gadgets that end with a
ret. Write a 3-gadget ROP chain in RISC-V assembly that, starting from an overwritten return address, calls SYS_WRITE via ecall. What registers must each gadget set up? -
Toolchain Diary entry:
addr2line. Record how to useriscv32-unknown-elf-addr2lineto map a faulting PC (from mcause + mepc) to a source line number. -
A shadow stack prevents return-address forgery but does not prevent call-site spoofing: an attacker who controls a function pointer can redirect an indirect call to an unintended target. What mechanism in Zicfilp addresses this? (Landing-pad instructions.)
Reflection prompts
-
Canaries are placed between locals and the saved return address. If a function has no local variables (it is a leaf function that only saves ra and uses callee-saved registers), should the compiler still emit a canary? What is the overhead, and what threat does it protect against?
-
The shadow stack in CSA-201 is write-protected via PMP with M-mode maintenance. A production OS would use a dedicated supervisor-managed shadow stack CSR (Zicfiss). What are the two main advantages of hardware shadow stack support (Zicfiss) over the PMP-based software implementation?
-
Lab 9.1 implements canaries and a shadow stack. An attacker has the following capabilities: (a) can read any U-mode memory address; (b) can write to one contiguous buffer. Describe the attack that bypasses canaries only, and explain why the shadow stack stops it.
What's next
Module 10 adds tracing garbage collection to the Memory service. The allocator in Virtus OS v1 was manual-only (Ch 12 §12.5.4 noted the omission explicitly). The tracing GC adds automatic reclamation: after a GC cycle, unreachable objects are freed. Module 10 measures the cycle cost of GC relative to the manual allocator. Module 11 then adds the preemptive scheduler that the GC's stop-the-world phase depends on.