Total points: 25
Estimated time: 3.5 hours
Prerequisites: Labs 7-8 complete; PMP W^X policy active; compiler pipeline from Labs 3-5
Overview
This lab adds compiler-emitted stack canaries and a PMP-protected software CFI shadow stack to Virtus OS v2. You will verify that both defenses are active on DE10-Nano by demonstrating a simulated exploit that is blocked at each layer.
Part A: Compiler-emitted stack canaries (10 pts)
A1: Canary global and initialization (2 pts)
Add a global variable uint32_t __stack_chk_guard to the Virtus OS v2 kernel. Initialize it at boot with a value that is:
- Non-zero (zero canaries are trivially bypassable)
- Not a valid instruction encoding (to prevent partial overwrites that happen to create a valid-looking canary)
- For this lab: use the fixed value
0xDEADC0DE(not random; the lab verifies detection, not randomness)
For production: replace with a value from a hardware entropy source or a LFSR seeded at boot.
A2: Canary prologue/epilogue in the compiler (6 pts)
Modify your CSA-201 compiler (from Labs 3-5) to emit canary code in every function that has a local buffer (any function with stack-allocated arrays or variables of size > 8 bytes).
Prologue (after function entry, before locals are used):
la t0, __stack_chk_guard
lw t0, 0(t0) # load canary value
sw t0, CANARY_OFFSET(fp) # store below saved ra
Epilogue (before function return, after locals are done):
la t0, __stack_chk_guard
lw t0, 0(t0) # load expected canary
lw t1, CANARY_OFFSET(fp) # load canary from stack
bne t0, t1, __stack_chk_fail # if mismatch: call fail handler
Where CANARY_OFFSET is the word just below the saved ra in the stack frame:
frame layout (grows downward):
[saved fp] [saved ra] [canary] [locals...]
CANARY_OFFSET = -4 relative to saved ra = -(fp_offset + 8).
Add __stack_chk_fail as a kernel function that calls SYS_EXIT(-1) and logs "stack smash detected" to the OLED.
A3: Verify canary detection (2 pts)
Write a test program with a char buf[16] local buffer. Manually corrupt the canary word (write a different value to CANARY_OFFSET(fp) using inline assembly) before the function returns. Verify that __stack_chk_fail is called and the process is killed.
Part B: CFI shadow stack (10 pts)
B1: PMP-protect a shadow stack region (3 pts)
Reserve a physical memory region for the shadow stack: 4 KiB starting at physical address 0x90000000 (or a suitable address in your DE10-Nano DDR3 layout). Configure a PMP entry to make this region accessible only to M-mode:
# PMP entry 2: shadow stack region (M-mode only, U-mode denied)
li t0, ((0x90001000) >> 2) # TOR upper bound = 0x90001000
csrw pmpaddr2, t0
# A=TOR, R=1, W=1, X=0, L=0 (not locked; M-mode still enforced but can be reconfigured)
li t0, (0b01 << 11) | (1 << 9) | (1 << 8) # cfg2 in byte 2 of pmpcfg0
csrrs zero, pmpcfg0, t0
Verify: a U-mode load from 0x90000000 causes mcause=5 (load-access fault). M-mode access succeeds.
B2: Shadow stack push and pop (5 pts)
Add a shadow stack pointer CSR stub: use mscratch as a dual-purpose register (it holds both the kernel stack pointer during trap entry and the shadow stack pointer outside of trap entry; at function-call time, the shadow push runs in M-mode via a dedicated syscall).
Simpler approach for the lab: implement the shadow stack as a kernel-managed array. The user program calls SYS_SHADOW_PUSH (a7=200) with the return address in a0; the kernel writes it to the shadow stack. At function return, the user program calls SYS_SHADOW_POP (a7=201); the kernel pops the top of the shadow stack and verifies it matches the current ra. If mismatch: kill the process with "CFI violation detected."
Modify your compiler to emit shadow-push at every function call and shadow-pop at every function return.
B3: Demonstrate shadow stack catches ROP (2 pts)
Write a test program that simulates a ROP gadget redirect:
- The program calls a function normally; the shadow stack records the correct return address.
- Before the function returns, it overwrites ra with a different address (the address of a gadget elsewhere in the code).
- The function "returns" to the gadget address.
- The shadow-pop syscall fires and verifies:
ra != shadow_top. The process is killed.
Demonstrate on DE10-Nano with the OLED showing "CFI violation: expected 0xXXXX got 0xYYYY".
Part C: Integration test (5 pts)
C1: Both defenses active simultaneously (3 pts)
Write a test that exercises both defenses in sequence:
- Overflow a buffer to corrupt the canary AND the return address.
- The canary check fires first (before the corrupted ret executes).
- If the canary is somehow bypassed (the overflow happens to write the correct canary value), the shadow stack catches the forged return address.
Use a fixed canary (0xDEADC0DE) so you can construct a test overflow that writes both the canary value and a different return address.
C2: Measurement (2 pts)
Measure the overhead of both defenses on a hot loop that calls a small function 10,000 times:
- Without canary or shadow stack: baseline cycle count.
- With canary prologue/epilogue only: cycle count.
- With canary + shadow stack: cycle count.
Calculate the per-call overhead in cycles for each defense.
Grading
| Part | Criteria | Points |
|---|---|---|
| A1 | __stack_chk_guard initialized with non-zero, non-trivial value | 2 |
| A2 | Canary prologue/epilogue emitted for functions with local buffers | 6 |
| A3 | Canary corruption triggers __stack_chk_fail | 2 |
| B1 | Shadow stack region PMP-protected; U-mode access faults | 3 |
| B2 | Shadow push/pop via M-mode syscall; compiler emits calls | 5 |
| B3 | ROP redirect caught by shadow stack; OLED shows CFI violation | 2 |
| C1 | Combined test: canary catches first; shadow stack is backup | 3 |
| C2 | Per-call overhead measured for both defenses | 2 |
| Total | 25 |