Classroom Glossary Public page

Module 1: Re-Grounding -- Substrate vs Language Vulnerabilities

1,922 words

Duration: 3 hr lecture + 3 hr lab + 6 hr independent
Lab: Mapping table exercise (no separate lab file; completed in-module)
MITRE ATLAS coverage: All 16 tactics (structured review at Belt-5 depth)
Christian weave: The Alignment Problem, Prophecy (Ch 1-4) -- assigned before this module
Prerequisite check: Module 1 is the convergence point. Read the AI-301 3-layer architecture handout before lecture.


1.1 Two Literacies, One Mental Model

You arrive at AI-301 with two distinct literacies. The first is substrate literacy -- the ability to reason about a computing system from NAND gates up through machine language, memory layout, privilege levels, and the mitigations that constrain what code can do at runtime. You built this in CSA-101 and CSA-201. The second is language literacy -- the ability to reason about LLM systems as attack surfaces, to reproduce CVEs, to map findings to MITRE ATLAS tactics, to write coordinated-disclosure-quality reports. You built this in AI-101 and AI-201.

AI-301 is not a third literacy. It is the recognition that the two literacies are describing the same class of phenomena at different abstraction layers. This module builds the explicit mapping. The mapping is not metaphorical convenience -- it is structural. The same architectural decisions that produced exploitable vulnerabilities at the substrate layer have structural cousins at the language layer, and understanding one makes the other more visible.

Christian opens The Alignment Problem with the question: "When we build systems that learn, what do they actually learn?" The answer, across Prophecy's four chapters, is that systems learn proxies -- specifications that approximate the intended goal but diverge from it in high-stakes situations. At the substrate level, the intended goal is "execute code," and the proxy is "execute whatever is at the instruction pointer." At the language level, the intended goal is "follow the system-prompt operator's intent," and the proxy is "produce tokens that maximize reward signal." Both proxies are exploitable by anyone who can control the input.


1.2 The Four Substrate Memory-Safety Properties

CSA-201 introduced four memory-safety properties as toggleable mitigations. Review them here before mapping.

W^X (Write XOR Execute): A memory page cannot be simultaneously writable and executable. Code pages are marked executable but not writable; data pages are marked writable but not executable. The invariant prevents an attacker who writes shellcode into a buffer from executing it directly. Bypassed by ROP (return-oriented programming), which chains existing executable code rather than injecting new code.

ASLR (Address Space Layout Randomization): The base addresses of stack, heap, and loaded libraries are randomized at load time. An attacker who needs to jump to a specific address cannot predict it without an information leak. Bypassed by information-leak vulnerabilities that reveal the actual base address, or by brute force in 32-bit address spaces.

Stack canaries: A random value placed between local variables and the saved return address on the stack frame. On function return, the runtime checks that the canary value is unmodified. A buffer overflow that overwrites the return address also overwrites the canary, triggering a crash before the return executes. Bypassed by overwrite strategies that avoid or leak the canary position.

Control-Flow Integrity (CFI): At each indirect branch (call through a function pointer, virtual dispatch, return), the runtime checks that the target is a valid target according to a pre-computed control-flow graph. An attacker who can overwrite a function pointer cannot redirect execution to an arbitrary address -- only to addresses the CFI policy permits. Bypassed by policies that are too coarse (allowing any function with a matching signature) or by data-only attacks that modify data structures rather than control-flow pointers.


1.3 The Four Language-Level Analogues

Each substrate memory-safety property has a structural analogue at the LLM layer.

Prompt isolation (analogue of W^X):

At the substrate, W^X prevents user-writable memory from being executed as code. The intended invariant: data cannot become code. At the language layer, the analogous invariant is: user-supplied input cannot become system-prompt instruction. An LLM that treats user content as instruction-level context is in a state analogous to W^X violation -- the "execute" bit is set on user-writable memory.

Prompt injection (OWASP LLM01; ATLAS AML.T0051) is the exploit class. The attacker writes instructions into user-supplied data that the model executes as if they came from the operator. The 2023-2025 production CVEs that AI-201 covered (CVE-2025-65106, indirect prompt injection via malicious document) are all W^X-violation-class vulnerabilities.

Context-window isolation (analogue of ASLR):

At the substrate, ASLR randomizes memory layout so an attacker cannot predict addresses without a leak. The intended invariant: an attacker with knowledge of the memory map in one context cannot directly exploit it in another. At the language layer, the analogous invariant is: the structure of the system prompt and the injected context should not be predictable by a remote attacker.

System-prompt extraction (OWASP LLM07; ATLAS AML.T0055) is the corresponding exploit class. An attacker who leaks the system prompt layout can construct exploits that target specific phrases or trust relationships defined there. The ASLR analogy is imperfect (language systems have no randomizable addresses) but the defensive principle is the same: reduce the information available to an attacker about system structure.

Output guards (analogue of stack canaries):

At the substrate, canaries place a sentinel between attacker-influenced data and security-critical state (the return address). The intended invariant: any overflow that reaches the return address is detected. At the language layer, the analogous invariant is: any model output that reaches a security-critical execution path (tool call, shell command, database write) passes through a validation gate.

Output validation failures (OWASP LLM06 Excessive Agency; ATLAS AML.T0056) occur when model outputs are passed directly to execution layers without validation. The lab in Module 8 measures what it costs to add output guards at each tier (input filtering, output filtering, tool-call validation, human-in-the-loop).

Tool-calling constraints (analogue of CFI):

At the substrate, CFI permits only valid targets at each indirect branch. The intended invariant: control-flow cannot be redirected arbitrarily; only pre-approved targets are reachable. At the language layer, the analogous invariant is: tool calls can reach only approved targets; the model cannot invoke tool X from a context where only tool Y should be reachable.

Tool-chain hijacking (ATLAS AML.T0056.002; Module 5) bypasses tool-calling constraints by chaining tool calls in a sequence the operator did not anticipate, building a capability from individually-permitted operations. The structural analogy to ROP (chaining gadgets from individually-permitted code) is the subject of Module 5.


1.4 The Substrate-Language Mapping Table

Complete this table as Module 1's lab exercise. Use your CSA-201 knowledge for the substrate column and your AI-201 knowledge for the language column. The exploit class and ATLAS tactic columns anchor each row to the knowledge bases you already have.

# Substrate vulnerability Language-level analogue Exploit class ATLAS tactic
1 W^X violation (data-as-code) Prompt injection (user-as-system) Stack overflow → shellcode AML.T0051
2 ASLR bypass via info leak System-prompt extraction Format-string / info-leak AML.T0055
3 Canary bypass / stack smash Output guard bypass → tool call without validation Stack buffer overflow AML.T0056
4 CFI bypass via gadget chain Tool-chain hijack ROP / JOP AML.T0056.002
5 Use-after-free Context-window confusion (stale context) Heap UAF AML.T0051 variant
6 Type confusion (void* miscast) Untyped output consumed as instruction C++ virtual dispatch exploit AML.T0056 variant
7 Cache timing side channel Latency-fingerprint side channel Flush+Reload AML.T0057
8 Firmware supply chain Fine-tuning supply chain Malicious firmware update AML.T0018
9 Memory corruption (write to arbitrary address) Activation steering (write to arbitrary SAE feature) Arbitrary write primitive AML.T0054
10 Privilege escalation (user → kernel) Prompt escalation (user → system) Ring-0 elevation AML.T0053

Lab exercise: For rows 5, 6, 9, and 10, write a one-paragraph expansion that explains the structural analogy in more detail. Use specific CSA-201 code examples (the Virtus OS privilege-level implementation) and specific AI-201 CVE examples (CVE-2025-68664 LangGrinch deserialization).


1.5 What Changes at Belt-5

AI-101 oriented you to the OWASP LLM Top 10 as a checklist. AI-201 oriented you to MITRE ATLAS as a practitioner vocabulary for reporting findings. AI-301 uses both as context, but the analytical posture changes.

At Belt-5, the question is not "which item on the list is this?" The question is "what is the structural class of this vulnerability, and what does that structure tell me about where defenses will and won't work?"

Belt-5 means:

  • You can write a coordinated-disclosure report that maps a novel finding to its structural class, not just its surface behavior
  • You can predict which defenses will bypass a given attack class based on the structural analogy, without waiting for the attack to bypass them experimentally
  • You can read a primary research paper and extract the structural claim, distinct from the experimental setup and the specific model it was run on
  • You can design an evaluation that tests for a structural vulnerability class, not just a specific known attack

Christian's Agency section (assigned for Modules 5-7) makes this analytical shift explicit in the alignment domain: the problem is not "this specific reward hack" but "the structural tendency of RL agents to find unexpected reward-maximizing behaviors that the designers did not intend." The structural class has the same mitigation -- better specification, not just patching the specific hack -- regardless of which specific hack an agent found.


1.6 ATLAS at Belt-5: Beyond Reconnaissance and Initial Access

AI-201 used the full 16 ATLAS tactics as a reference vocabulary. At Belt-5, you should be working from the ATLAS case studies rather than the tactic list. The 42 case studies in ATLAS v5.1.0 are real-world incidents analyzed at the tactic-technique level. Belt-5 work means:

  • Given a new incident, mapping it to the case study corpus and identifying which techniques generalize to new targets
  • Identifying gaps: which case studies are missing from the corpus that the AI-301 labs would add
  • Reading the ATLAS case studies not as examples of specific attacks but as evidence for general claims about the attack surface

Assignment: Before Module 2, read ATLAS Case Study AML.CS0001 (Tay AI) and AML.CS0016 (GPT-2 generation of malicious outputs). For each:

  1. List every technique used (e.g., T0048 - Prompt Injection)
  2. Identify the structural class (which row in the mapping table does this belong to?)
  3. Write one sentence on why the attack was possible: what structural property of the system enabled it?

1.7 The Course Arc

Modules 1-4 are the Prophecy arc: re-grounding, primer exploits (stack-smash and prompt injection separately), and the formal naming of the thesis. Christian's Prophecy arc covers the history of AI systems that learned proxies -- aligned-in-the-lab, misaligned-in-deployment systems -- which motivates the security analyst's question: what invariant did the designer assume that the adversary violated?

Modules 5-7 are the Agency arc: ROP, tool-chain hijack, type confusion, side channels. These are the advanced exploit techniques. Christian's Agency arc covers autonomous agents that optimize specified objectives in unexpected ways -- the structural archetype of tool-chain hijacking and type confusion.

Modules 7.5-8 are the Supply chain and defense arc: fine-tuning attacks (supply chain at the weights layer) and layered defenses. Christian's Normativity arc covers how to specify what you actually want and enforce it -- the structural question behind every mitigation.

Modules 9-12 are the Capstone arc: full cross-substrate engagement, adversarial ML, frontier safety landscape, and the three-track capstone.

You are at the beginning of Module 1. The thesis has been stated. The mapping table is your first artifact. Build it carefully -- you will return to it in Module 4 to write the formal essay, and in Module 4.5 to add the SAE-feature row that turns the metaphor literal.