Classroom Glossary Public page

Lab 9.1: Stack Canaries and CFI Shadow Stack

806 words

Total points: 25
Estimated time: 3.5 hours
Prerequisites: Labs 7-8 complete; PMP W^X policy active; compiler pipeline from Labs 3-5


Overview

This lab adds compiler-emitted stack canaries and a PMP-protected software CFI shadow stack to Virtus OS v2. You will verify that both defenses are active on DE10-Nano by demonstrating a simulated exploit that is blocked at each layer.


Part A: Compiler-emitted stack canaries (10 pts)

A1: Canary global and initialization (2 pts)

Add a global variable uint32_t __stack_chk_guard to the Virtus OS v2 kernel. Initialize it at boot with a value that is:

  • Non-zero (zero canaries are trivially bypassable)
  • Not a valid instruction encoding (to prevent partial overwrites that happen to create a valid-looking canary)
  • For this lab: use the fixed value 0xDEADC0DE (not random; the lab verifies detection, not randomness)

For production: replace with a value from a hardware entropy source or a LFSR seeded at boot.

A2: Canary prologue/epilogue in the compiler (6 pts)

Modify your CSA-201 compiler (from Labs 3-5) to emit canary code in every function that has a local buffer (any function with stack-allocated arrays or variables of size > 8 bytes).

Prologue (after function entry, before locals are used):

la      t0, __stack_chk_guard
lw      t0, 0(t0)               # load canary value
sw      t0, CANARY_OFFSET(fp)   # store below saved ra

Epilogue (before function return, after locals are done):

la      t0, __stack_chk_guard
lw      t0, 0(t0)               # load expected canary
lw      t1, CANARY_OFFSET(fp)   # load canary from stack
bne     t0, t1, __stack_chk_fail  # if mismatch: call fail handler

Where CANARY_OFFSET is the word just below the saved ra in the stack frame:

frame layout (grows downward):
[saved fp] [saved ra] [canary] [locals...]

CANARY_OFFSET = -4 relative to saved ra = -(fp_offset + 8).

Add __stack_chk_fail as a kernel function that calls SYS_EXIT(-1) and logs "stack smash detected" to the OLED.

A3: Verify canary detection (2 pts)

Write a test program with a char buf[16] local buffer. Manually corrupt the canary word (write a different value to CANARY_OFFSET(fp) using inline assembly) before the function returns. Verify that __stack_chk_fail is called and the process is killed.


Part B: CFI shadow stack (10 pts)

B1: PMP-protect a shadow stack region (3 pts)

Reserve a physical memory region for the shadow stack: 4 KiB starting at physical address 0x90000000 (or a suitable address in your DE10-Nano DDR3 layout). Configure a PMP entry to make this region accessible only to M-mode:

# PMP entry 2: shadow stack region (M-mode only, U-mode denied)
li      t0, ((0x90001000) >> 2)   # TOR upper bound = 0x90001000
csrw    pmpaddr2, t0
# A=TOR, R=1, W=1, X=0, L=0 (not locked; M-mode still enforced but can be reconfigured)
li      t0, (0b01 << 11) | (1 << 9) | (1 << 8)  # cfg2 in byte 2 of pmpcfg0
csrrs   zero, pmpcfg0, t0

Verify: a U-mode load from 0x90000000 causes mcause=5 (load-access fault). M-mode access succeeds.

B2: Shadow stack push and pop (5 pts)

Add a shadow stack pointer CSR stub: use mscratch as a dual-purpose register (it holds both the kernel stack pointer during trap entry and the shadow stack pointer outside of trap entry; at function-call time, the shadow push runs in M-mode via a dedicated syscall).

Simpler approach for the lab: implement the shadow stack as a kernel-managed array. The user program calls SYS_SHADOW_PUSH (a7=200) with the return address in a0; the kernel writes it to the shadow stack. At function return, the user program calls SYS_SHADOW_POP (a7=201); the kernel pops the top of the shadow stack and verifies it matches the current ra. If mismatch: kill the process with "CFI violation detected."

Modify your compiler to emit shadow-push at every function call and shadow-pop at every function return.

B3: Demonstrate shadow stack catches ROP (2 pts)

Write a test program that simulates a ROP gadget redirect:

  1. The program calls a function normally; the shadow stack records the correct return address.
  2. Before the function returns, it overwrites ra with a different address (the address of a gadget elsewhere in the code).
  3. The function "returns" to the gadget address.
  4. The shadow-pop syscall fires and verifies: ra != shadow_top. The process is killed.

Demonstrate on DE10-Nano with the OLED showing "CFI violation: expected 0xXXXX got 0xYYYY".


Part C: Integration test (5 pts)

C1: Both defenses active simultaneously (3 pts)

Write a test that exercises both defenses in sequence:

  1. Overflow a buffer to corrupt the canary AND the return address.
  2. The canary check fires first (before the corrupted ret executes).
  3. If the canary is somehow bypassed (the overflow happens to write the correct canary value), the shadow stack catches the forged return address.

Use a fixed canary (0xDEADC0DE) so you can construct a test overflow that writes both the canary value and a different return address.

C2: Measurement (2 pts)

Measure the overhead of both defenses on a hot loop that calls a small function 10,000 times:

  1. Without canary or shadow stack: baseline cycle count.
  2. With canary prologue/epilogue only: cycle count.
  3. With canary + shadow stack: cycle count.

Calculate the per-call overhead in cycles for each defense.


Grading

Part Criteria Points
A1 __stack_chk_guard initialized with non-zero, non-trivial value 2
A2 Canary prologue/epilogue emitted for functions with local buffers 6
A3 Canary corruption triggers __stack_chk_fail 2
B1 Shadow stack region PMP-protected; U-mode access faults 3
B2 Shadow push/pop via M-mode syscall; compiler emits calls 5
B3 ROP redirect caught by shadow stack; OLED shows CFI violation 2
C1 Combined test: canary catches first; shadow stack is backup 3
C2 Per-call overhead measured for both defenses 2
Total 25